DoDoCo - Database ================= Fred Vos, Tilburg University 2009-08-26 1. Introduction +++++++++++++++ This document describes the database where DoDoCo stores the download event data. 2. Datamodel ++++++++++++ The image below shows the datamodel. .. image:: ../../images/datamodel.png :height: 8cm :width: 15cm :align: left 3. Tables in detail +++++++++++++++++++ Publications ------------ Primary key: pub_identifier Stores unique publication identifiers. For download events where the pub_identifier is not yet known in this table, a new record is added. For known publications, we just need to add new download events to table 'DownloadEvents'. Estimated size: 1.000-3.000 records (end 2009), growth 1.000/y, recordsize 50 byte (150K + 50K/y) PublicationPartners ------------------- Primary key: pub_identifier, prt_identifier Foreign key: pub_identifier -> Publication.pub_identifier Partners that are associated with a publication. The data is found in the Meresco backend by searching for the publication as it was provided by the partners' Institutional Repository. In the current setup for EO, this is a 1 : 1 relationship. Estimated size: 1.000-3.000 records (end 2009), growth 1.000/y, recordsize 80 byte (240K + 80K/y) PublicationScholars ------------------- Primary key: pub_identifier, sch_identifier Foreign key: pub_identifier -> Publication.pub_identifier Scholars that are associated with a publication. The data is found in the Meresco backend by searching for the publication as it was provided by the partners' Institutional Repository. There's should be least one record for each publication. With expected missing data there will be about 1-1.5 scholars per publication. Estimated size: 1.000-4.000 records (end 2009), growth 1.000/y, recordsize 100 byte (400K + 100K/y) DownloadEvents -------------- Primary key: ctx_identifier Foreign key: pub_identifier -> Publication.pub_identifier Each download event is stored in this table. The ctx_identifier is the unique identifier for the download event. All fields come from the Context Object. Estimated size: 10.000-100.000 records (end 2009), growth 20.000/y, recordsize 100 byte (10M + 2M/y)