PDProxy - Overview =================== PDProxy stands for Personal Data Proxy. It is built as a temporary solution for a problem with the Publication Lists of Tilburg University. 1 Background ++++++++++++ 1.1 Publication Lists --------------------- The Publication Lists at Tilburg University were built as a spin-off of the first Economists Online project. The goal of the Publications Lists is to provide publication lists of scholars from Tilburg University, based on the contents of the Institutional Repository. The Publication Lists web application is built using Apache Cocoon. The publications from the Institutional Repository of Tilburg University are available via a search protocol called SRU. The Publication Lists application uses SRU to retrieve publications from scholars in a standard XML format (MODS) and transforms these MODS documents into HTML, where each publication is presented in APA reference style. A publication list for a scholar can be retrieved using an URL containing the mail name of the scholar, like http://publications.uvt.nl/repository/c.j.c.f.fijnaut/publications.html. In this example, 'c.j.c.f.fijnaut' is the mail name of the scholar. This mail name cannot be used to search for publications in the Institutional Repository. We need another ID for that, the so-called ANR, so this mail name must be translated into an ANR somehow. Furthermore we want to present the full name of the scholar in the header of the publication list. To translate the mail name into an ANR and to get the full name to print in the header, the Publication Lists application contacts an application called Experts and Expertise. Experts and Expertise can return an XML description of a person. Experts and Expertise uses a combination of data from LDAP and from its own database. The data in the Experts and Expertise database is maintained by the scholar. The following diagram describes the current situation. .. image:: ../images/current.png :align: left :height: 6cm :width: 14cm Experts and Expertise itself uses Publication Lists to present the five most recent publications of a scholar. This is not shown in the diagram. 1.2 Dependency on LDAP ---------------------- When a scholar leaves Tilburg University, their entry in LDAP is removed. Since Experts and Expertise uses both LDAP data and data entered by the scholar, the scholar is not anymore available in that application. This has also implications for the Publication Lists. The mail name cannot be translated anymore into an ANR and we do not know the full name of a person. This means we cannot show publication lists of people that have left Tilburg University. Tilburg University also wants to have the publication lists available of scholars that have left the university. 1.3 Economists Online --------------------- In the Economists Online project all partners (universities) must provide a description of their organisation, their repositories and also of their scholars. The organization, each department and each scholar is described in a format called FOAF (Friend of a Friend). Every partner is responsible for their own description. Every day this description is fetched, validated and - if valid - compared to the current description. If it is valid and different from the current description, the new description replaces the current description. It is allowed to replace a description of a person with a URL to a FOAF description of that person. This follows a standard called XInclude. When XInclude is used, the Economists Online portal replaces the URL with the contents of the document referred to by the URL. The personal descriptions in Experts and Expertise could be XInclude'd in the Economists Online description for Tilburg University if it had an FOAF interface. 2 Goal +++++++ The primary goal is to make it possible to show publication lists too of scholars that have left the university. A secondary goal is to embed the personal descriptions for Economists Online scholars as XInclude references to FOAF documents, coming from Experts and Expertise. 3 PDProxy +++++++++ 3.1 New component ----------------- The following diagram shows the new situation. .. image:: ../images/components.png :align: left :height: 8cm :width: 16cm The publication Lists now get the personal data from a new service called PDProxy. For existing users, PDProxy retrieves data from Experts and Expertise and for scholars that have left Tilburg University, data is retrieved from a Store. This process is explained in more detail below. The Economists Online description used XInclude references to PDProxy. 3.2 How a request for personal data is handled by PDProxy --------------------------------------------------------- PDProxy first checks a local Store for the person. The Store is the place where descriptions are stored of scholars who have left Tilburg University. This Store can be implemented as a directory with files, where each file is an FOAF document. In the software, a DAO interface will be made, that can be implemented in different ways without changing the code. A directory with files is one possible implementation. If a person is not found in the Store, the Cache is checked. The Cache is a temporary store for personal data retrieved from Experts and Expertise and converted into FOAF. If Experts and Expertise is not available, the Cache can return previously generated data. If a person document in the Cache is old (older than x days), a new version is automatically retrieved from Experts and Expertise, transformed into FOAF and stored in the Cache to replace the previous version. If a person document in the Cache is older than a certain age and the person cannot be generated anymore from Expert and Expertise data, PDProxy assumes this person has left Tilburg University. The document is then automatically moved to the Store and the administrators of the Publication Lists application receive an e-mail message. Documents in the Store are maintained by the administrators of the Publication Lists application.