THIS DOCUMENTATION IS OUT OF DATE, sorry

Updating the Economists Online database. Documentation

The Economists Online database consists of two collections: There are separate procedures for updating the two collections. A common procedure is the ConfigMaker.

ConfigMaker

The ConfigMaker is a python script that generates the configuration files for the harvester. There are 5 diffent configuration file types: The ConfigMaker has also its own config file with general info such as: This general info is used during the initialisation of a ConfigMaker object. For each collection a separate object with its own general config file is instantiated. During initialisation the target file is generated. There are separate methods for generating the collection specific config files. For the generation of the cf file there is a common method.

eo-repo

The config files for the harvester are generated with the ConfigMaker

Starting the update of the eo-repo collection

https://svn.non-gnu.uvt.nl/uvt-dev/trunk/sources/eo-updater/bin/start-eo-repo-update.sh
This script should be run each 24 h, e.g., by Cron. The only thing it does is starting the following python script with the correct config file.

https://svn.non-gnu.uvt.nl/uvt-dev/trunk/sources/eo-updater/bin/start-eo-repo-update.py
This script contains the workflow for updating the eo-repo collection.
The script is started in the directory in which the harvester will also be started. In this directory the harvester expects to find its cf-file which is a dynamically generated file by the makeCfFile method of ConfigMaker.
After generating the cf-file, the configuration files are generated for the target of the harvester, the domain (collection), the repositoryGroups and the repositories (see above). These configuration files are generated by the makeEoRepoFiles of MakeConfig. This method returns a

repec

repec gateway

Between the oai server of RePEc (http://oai.repec.org) and the harvester, we run a gateway (http://radix-21.uvt.nl:4080/repec/sitemap.xmap). The function of the gateway is

Generating config files for the harvester with the ConfigMaker

The repec specific config files are generated in a separate directory that is first cleared. The files in this directory are compared with the files in use by the harvester. The files in use by the harvester but not generated anymore are candidates for removal; not only the files themselves but also the related database content. After the determination of the removal candidates, the newly generated config files are copied to the directory of the harvester.

For the repec collection there is just one repositoryGroup 'repec-org'. This repositoryGroup contains more than 4000 repositories. These repositories correspond to the RePEc series. A list with the identifiers (handles) of the series is provided by the repec gateway (see above). The repository identifiers are added to the repositoryGroup file.

For each repository, a config file is generated. This applies also to the repositories that are candidates for removal; the harvester is instructed to clear the repository from the database (the harvester ask the database server to deleted the records).

In the repec-update.cfg file it is possible to list the series that will not be included. The listing is done by the handles of the series. It is also possible to list the handle of an archive for skipping all the series of that archive.

Starting the update of the repec collection

https://svn.non-gnu.uvt.nl/uvt-dev/trunk/sources/eo-updater/bin/start-repec-update.sh

https://svn.non-gnu.uvt.nl/uvt-dev/trunk/sources/eo-updater/bin/start-repec-update.py