"Meresco Harvester" is the OAI-Harvester from the Meresco Suite 2005-2010 http://www.meresco.org ********** 1. Installation --------------- Installation can be done with python distutils. $ python setup.py install 2. Configuration ---------------- Furthermore you will need to configure: - apache2 - sitecustomize.py !! - webcontrolpanel administrator (admin) - harvesting (user) 2.1 Apache Configuration ------------------------ An example apache configuration is given in the example directory. Please change it to your needs. 2.1.1 Apache Worker (Threaded Model) ------------------------------------ Meresco Harvester depends on Slowfoot (see deps.txt), which on its turn depends on mod_python (an apache module). For correct working of Meresco Harvester you will need the apache2-mpm-worker, which is the high speed threaded model of Apache2. 2.1.2 Accessible directories. ---------------------------- The DocumentRoot of your apache configuration should be writable for the web user, e.g. www-data The directories specified in the PythonOption logDir and stateDir should be readable for the www-data user. The file specified in the PythonOption usersfile should be writable by the web user. (See 2.2) 2.1.3 SSL Certificate creation ------------------------------ If you don't have an SSL Certificate for your server you can create your own self-signed certificate. Create a server key. $ openssl genrsa -out server.pem 2096 Create a certificate. Fill in some information. BEWARE: As Common Name (CN) fill in your domainname, e.g. harvester.example.org $ openssl req -new -key server.pem -out server.csr Now create a self-signed certificate. $ openssl x509 -req -days 60 -in server.csr -signkey server.pem -out server.crt 2.1.4 Module configuration -------------------------- Take care that the correct modules are enabled in apache. For the apache example configuration you should enable: - python.load - rewrite.load - ssl.load - ssl.conf - proxy.load - cgid.conf - cgid.load - authz_host.load - mime.load - alias.load - proxy_http.load Disable the following modules: - cache.load (and other cache modules) - deflate.conf - deflate.load 2.1.5 Ports confguration ------------------------ The example configuration acts on ports 80 and 443, make sure apache listens to these ports. 2.2 Sitecustomize.py !! ----------------------- "Meresco Harvester" requires 'utf-8' as defaultencodig. This can be done with a file sitecustomize.py somewhere in your pythonpath. A usual place for your sitecustomize.py is /usr/lib/python2.5/site-packages/ The contents should be: import sys sys.setdefaultencoding('utf-8') The webcontrolpanel does not work if the encoding is different. 2.3 Webcontrolpanel Configuration --------------------------------- The webcontrolpanel has one special user called 'admin', with extra privileges. They are documentend at http://meresco.org Amongst these privileges is the right to create users. This is all done in the webcontrolpanel. The created users are stored in the file specified in the apache configuration (PythonOption usersfile) Each user has it's own line in this file in the format: : To start using the "Meresco Harvester" you will need to provide it with one admin user, a default usersfile is in the example directory. It contains the admin user with the password 'admin'. 2.3.1 Changing the admin password --------------------------------- To create a password for the user admin you can do: $ echo "admin:$(echo -n | md5sum | awk '{print $1}')" > users.txt 2.4 Harvesting configuration ----------------------------- Harvesting is done with the startharvester.py script which is in the example directory. The user that will run the startharvester.py script will need to have read/write permissions in the stateDir and logDir specified in the apache configuration. 3. Further Reading ------------------ For more information take a look at http://meresco.org or at the documentation in the doc directory.