ANDS Logo

Project promotion materials:

Project Homepage:

http://www.itee.uq.edu.au/eresearch/projects/ands/uq-dcr

Software is available at:

https://github.com/uq-eresearch/miletus

https://github.com/uq-eresearch/thales

https://github.com/uq-eresearch/ods-sru-interface

https://github.com/uq-eresearch/miletus-partyTester

Software categories:

Microscopes

Sensor Networks

Citizens of Sciences

Observational Instruments (e.g. from telescope

camera

etc.)

Fields(e.g. applications for archaeologists or for researchers who collect survey data.)

DBMSs/Files

Manual Entry Forms (usually web-based metadata entry interfaces)

Integration metadata from various systems which are internal to an institution

Integration collection records

party records and activity records which may be external to an institution

Metadata Store Solutions

Metadata Feed/Harvest/Publish

Project Members:

H. Sue (Project Contact, h.sue@uq.edu.au)

Professor Jane Hunter (Project Director, jane@itee.uq.edu.au)

Tim Dettrick (Software Engineer, t.dettrick@uq.edu.au)

ANDS Contact:

Andrew White (andrew.white@ands.org.au)

Project Status:

Completed

UQ Data Collections Registry

University of Queensland

Project Description:

The UQ Data Collections Registry (UQ-DCR) project has developed an institutional metadata store for The University of Queensland. An institutional metadata store gathers metadata records from across the institution, to collate and publish them as a single set of metadata records from the institution.

The UQ-DCR focuses on metadata records that describe research data collections. Research data is data collected and used in research, such as data obtained from instruments and simulations. It exists in many forms, including spreadsheets, images and instrument collected data; and can be stored in many forms, including simple files or in specialized research databases. A collection is a set of research data that has research value. The collection records contain metadata to manage, discover and use the research data collection.

The UQ-DCR also processes metadata records for entities associated with the research data collections. The party records describe people and organisational units, who manage or publish the collection. The activity records describe research projects and programs, which created the collection. The service records describe methods to access the collection. These other records provide context for the collections, making the collection records more meaningful to manage, discover and use.

Benefits

The UQ Data Collections registry project has contributed to development of the research infrastructure at The University of Queensland by:

• Creating a central registry where research data collection metadata records from across UQ can be stored/accessed.
• Developing standards for supplying collection records to the registry, so additional research databases can be added in a standardized format.
• Improving the quality of the metadata records by using information from authoritative databases at UQ.
• Improving the external visibility of its researchers by issuing NLA party identifiers for them.
• Publishing the collection records into Research Data Australia to promote UQ’s research data collections.
• Providing a feed of collection records into the UQ Data Hub to permit reporting and future integration projects involving research data collections (e.g. UQ reSEARCHers and staff profiles).
• Raising the profile of research data management through the collaboration of the groups involved in the project Reference Group: UQ Office of the Deputy-Vice Chancellor (Research), UQ Office of the Pro-Vice Chancellor (Research and International), UQ Library, UQ Information Technology Services, UQ Research Computing Centre and the UQ eResearch Lab.
• Developing the Research Data Management Options for UQ draft document for the UQ Pro Vice-Chancellor (Research and International), which proposes a strategy and future activities to support research data management at UQ.

Data Type:

Metadata Stores - UQ Data Collections Registry

High Level Software Functionality:

Functions

The UQ Data Collections Registry performs four main functions:

• Aggregation of metadata from sources across the university;
• Allocation of NLA party identifiers to parties from the university;
• Alignment of the metadata with information sourced from the university’s sources of truth; and
• Publication of the metadata records.

AGGREGATION OF METADATA

The UQ-DCR aggregates metadata records from sources across the university. Research data collections are distributed across the university in faculties and research groups.

The main sources of research data collection records for the UQ-DCR are research databases. These are systems designed to store, process and manage the research data. These systems are used directly by the researcher to manage and use the actual research data, so the collection metadata can be kept consistent with the research data and created as a part of the research process.

The UQ-DCR automatically harvests metadata records from these research databases:

• Anthropology Museum
A research database that contains digital records of anthropological and archaeological artefacts from The University of Queensland Anthropology Museum.
• Diffraction Image Repository (DIMER)
A research database that contains diffraction images from the UQ Remote Operation Crystallization and X-Ray Diffraction Facility (UQROCX).
• Spatially Integrated Social Science (SISS)
A research database that contains geospatial and statistical analysis of Australian Bureau of Statistics census data, Australian Electoral Commission voting data and simulations from the National Centre for Social and Economic Modelling.
• Microscopy Image Repository (MIRAGE)

A research database that contains microscopy images obtained from electron microscopes and other instruments from the UQ Centre for Microscopy and Microanalysis (CMM).

The OzTrack research database is currently undergoing redevelopment and was not available for harvesting by the UQ-DCR, but it will be added when it becomes available. OzTrack is research database that contains animal tracking data collections.

Additional research databases can be added to the UQ-DCR as they become available. Standard formats and protocols for harvesting are used by the UQ-DCR to harvest from the research databases: RIF-CS, Atom-RDC, OAI PMH and Atom-PMH.

There are also collection records that do not come from research databases. These collection records have been manually created by researchers and librarians. The UQ-DCR currently contains manually entered metadata records from:

• The UQ Seeding the Commons project.
These are collection records created by an ANDS funded Seeding the Commons project at UQ. These collection records were stored in the UQ DataSpace system, but have now been copied into the UQ-DCR.
• The Urban Water Research Security Alliance (UWSRA) pilot project.
These are collections records created by the Urban Water Research Security Alliance (UWSRA), of which the UQ Advanced Water Management Centre (AWMC) was a member of.

On 6 March 2013, there were 166 metadata records, comprising of 63 collection records, 91 party records, 9 activity records and 3 service records. Additional records will be automatically added by the harvesting process when they are created in the source research databases.

ALLOCATION OF NLA PARTY IDENTIFIERS
The UQ-DCR allocates NLA party identifiers to people and organisational units from UQ that do not yet have them.

NLA party identifiers are identifiers issued by the National Library of Australia (NLA) Trove and People Australia systems. Identifiers are needed to reliably identify people, since names are not always unique.

The NLA party identifiers are used in the party records to ensure reliable matching and attribution outside UQ. This is important for a service like the ANDS Research Data Australia, which aggregates metadata from multiple institutions across Australia. For example, when a researcher moves from one institution to another, the identifier is used to positively identify them as the same person; or when two different researchers have similar names, they can be distinguished by their different identifiers.

The UQ-DCR ensures that party records for UQ people and organisation units include an NLA party identifier. This allows the ANDS Research Data Australia to positively identify the parties and to correctly associate those parties with the collection records.

On 6 March 2013, 69% of the party records (63 records out of 91 people and group party records) have NLA party identifies associated with them. There were 9% (8 records) that have correctly not been assigned an NLA party identifier, because they are not UQ researchers. The remaining 22% (20 records) do not have sufficient information provided by the research databases to match them to people in the Human Resources database. More than half of these unmatched people are from research institutes at UQ, so in the records they are associated with the institute rather than any identifier in the UQ Human Resources database.

ALIGNMENT WITH INSTITUTIONAL DATA
The UQ-DCR aligns the harvested metadata records with authoritative information from the university using the UQ Data Hub from the UQ Information Technology Services. The UQ Data Hub obtains information from sources of truth in the university, such as the Human Resources database and Research Master. The UQ Data Hub was previously called the Operational Data Store (ODS).

This alignment improves the quality of the metadata records by enhancing them with more accurate information. For example, the HR database can supply the correct names and Thompson Reuters ResearcherID for a researcher’s party record; or Research Master can supply grant titles and collaborators for an activity record. The effectiveness of this depends on the data quality of the HR database, so there is currently only 18 party records that have Thompson Reuters ResearchIDs.

PUBLICATION OF METADATA RECORDS
The UQ-DCR publishes the metadata records as a machine-readable feed that other systems can harvest. The metadata records are represented using RIF-CS and published using OAI-PMH.

The feed is harvested by the National Library of Australia (NLA) Trove system. This is a part of the process for allocating the NLA party identifiers. Trove harvests the party records from the UQ-DCR, and allocates NLA party identifiers to them. The UQ-DCR then retrieves the NLA party identifiers from Trove, and adds them to the party records. Trove also creates an entry for the party in its system.

The feed is also harvested by the ANDS Research Data Australia (RDA) system, which is a national metadata store. It aggregates metadata records from institutions across Australia, to publish and make them discoverable.

The project source code and documentation are available on GitHub. These are the source code projects:
• Miletus
This software harvests metadata records from research databases, queries SRU data sources, merges metadata records and publishes them as an OAI-PMH feed. It is used to harvest the research databases. It also harvests a deployment of the Thales software (see below). It aligns them with information from NLA Trove and the UQ Data Hub. And it publishes the metadata records on the feed.
https://github.com/uq-eresearch/miletus
• Thales
This software is an editor for manually created metadata records. A deployment of this software holds the manually created metadata records.
https://github.com/uq-eresearch/thales
• ODS-SRU interface
This software provides a Search/Retrieval via URL (SRU) interface to the data provided by a relational database backend. It presents the data from the UQ Data Hub as a SRU service for deployment of Miletus to query.
https://github.com/uq-eresearch/ods-sru-interface
• Party tester
This software was developed for testing the behavior of the NLA Trove harvester. It includes a test plan based on the documented behavior of the Trove automatic matching rules.
During testing, it creates test party records, which were harvested by a test deployment of Miletus, which was then harvested by NLA Trove.
https://github.com/uq-eresearch/miletus-partyTester