Integration metadata from various systems which are internal to an institution

Philippa Broadley

Lance De Vine

Andrew White

QUT RDA Gold Standard Record Exemplars

Queensland University of Technology

Project Description

The RDA Gold Standard Record Exemplars project (SC37) was designed to address the issues of richness - connectivity and quality - with respect to research data description records, as many institutions are contributing data descriptions to Research Data Australia (RDA) of varying levels of quality. Until now, there have been no examples of good practice, guidelines for the economical production of high quality, information rich data descriptions, or investigations of the return on investment of effort and cost of producing such records.

During the first two phases of the project, eleven data interviews were conducted in collaboration with Professor Kerry Raymond (QUT School of Electrical Engineering and Computer Science). These data interviews were a mix of unstructured, conversational consultations and structured interviews using an interview template developed by project staff. Subsequent to the interviews, the project team decided to create new records, instead of updating or enhancing existing records, so that an assessment of the record creation process from beginning to end could be undertaken. At the completion of interviews and the publication of a manually-created record (using the ANDS Online Services web interface), project staff resolved that for a record to be classed as ‘gold standard’ it must comply with the Content Providers Guide and meet the Minimum Metadata Requirements.

In addition, a number of metadata elements must be included, such as citation, relatedInformation, title, subjects, description, as well as mandatory elements such as identifier. The existenceDates and rights elements which were new with the 2011 release of the RIF-CS 1.3 schema must also be present. By ensuring that as much descriptive information as possible was included (but only insofar as the information added value or refined an association) and no metadata elements left blank, QUT was able to enrich records to a gold standard. For institutions updating or creating their own gold standard records, it is advisable that they also follow this strategy, however, this is not a one size fits all approach. While this strategy worked for QUT, it may not be applicable at other institutions.

Nine highly connected collection records were developed, which were made available to RDA via an OAI-PMH feed. All ten (total) gold standard records can be found by visiting and then selecting ‘Queensland University of Technology’ from the list of contributing organisations. Of the ten collections, five are openly accessible and the other five are available via mediated access. Six of the ten data sets are licensed with a Creative Commons license.

The ten gold standard collection records added to RDA are:

1. Analysis of rainfall and stream hydrology at Samford Valley -
2. Academic authorship, publishing agreements and open access: a survey
3. Australian Election Study survey, 2001
4. Australian Election Study survey, 2004
5. Musician mortality data, 1956-2007
6. Performance judgements in the Idol series
7. Pharmacokinetic data for equine medications
8. Public views on carbon sequestration
9. Survey of M-Services
10. Wikipedia CJK Corpora

While actual harvesting is fast, the manual process of creating records for automatic harvest is labour intensive, at an average cost of $196.51 per record. Conversely, creating records using the web interface is much quicker; entering data into a web form takes considerable less time, roughly one day or 7.25 hours. It is estimated that one record costs $60.47 to create. This return on investment figure may be adjusted after usability testing of the data input interface of QUT Research Data Finder (QUT’s institutional data registry) which will occur in the coming months (and which is a requirement of the Metadata Stores project). A detailed document (‘Enhancing RIF-CS Records to “Gold Standard” Project Report’) outlining strategies and requirements for improved record quality was also produced during the project and is available for public release.

Following the completion of the project, QUT staff now have an increased understanding of the process of identifying, creating, managing and storing research data sets. With the knowledge of what constitutes a highly connected, rich metadata record, QUT staff will use the outputs of this project to inform future data management practices.

The project team consisted of a Data Librarian/Project Manager from Library Services and one Research Support Specialist from the High Performance Computing (HPC) and Research Support group.