ANDS Logo

Project promotion materials:

Project Homepage:

http://www.itee.uq.edu.au/eresearch/projects/ands/stc/

Software is available at:

https://github.com/uq-eresearch/dataspace

Programming language(s):

Java

Software categories:

Manual Entry Forms (usually web-based metadata entry interfaces)

Integration metadata from various systems which are internal to an institution

Integration collection records

party records and activity records which may be external to an institution

Data Repository Solutions

Project Members:

Dr. Nigel Ward (Project Manager, n.ward4@uq.edu.au)

Prof. Jane Hunter (Data Source Administrator, jane@itee.uq.edu.au)

ANDS Contact:

Andrew White (andrew.white@ands.org.au)

Project Status:

Completed

Seeding the Commons@UQ

University of Queensland

Project Description:

The UQ Seeding the Commons project aimed to improve discovery and re-use of University of Queensland research data. In particular, the project aimed to:

1. Develop data management policy, practice and infrastructure at UQ;
2. Contribute research data descriptions to Research Data Australia and other discovery systems;
3. Contribute to the national discussion on research data management policy, practice and infrastructure.

Approach

The project was one of the first ANDS-funded projects undertaken at UQ and was implemented within a rapidly changing environment in which: ANDS was still developing best practice advice for creating metadata descriptions; the UQ ANDS data capture projects were under development; and UQ was starting to consider its data management options. For these reasons, the project adopted:

• A consultative approach to understand the changing environment
• An agile approach to adjust to changes and take advantage of opportunities as they arise.

This project, although led by UQ eResearch Group, involved other relevant groups across the University of Queensland, including the UQ Library, Office of DVC-R, IT Services and a Data Management Policy Reference Group.

Subsequent project

Substantive work on the Seeding the Commons @ UQ project ended in April 2012 with the establishment of the ANDS MS06 UQ Data Collections Registry project that had similar infrastructure and policy goals to this project but was more scalable. Where appropriate, this report references the outcomes of that project.

Outcomes

Research data management policy and practice

Development of UQ policy on research data management began during this project, and remains an ongoing activity within UQ. Work on a UQ Research data management policy began in late 2010 under the stewardship of Suzanne Morris, UQ Research Integrity Officer within the Office of the DVC-Research. In mid 2011 the DVC-Research office established a UQ Reference Group for Data Management Policy, chaired by Alan Lawson (PVC Research & International). The Seeding the Commons project participated in this reference group along with staff from the UQ Library, UQ Information & Technical Services and senior researchers from select UQ faculties and institutes. The group was tasked with finding a UQ approach to the data management requirements outlined in Australian Code for the Responsible Conduct of Research (2007) and other pertinent legislation. The reference group disbanded in early 2012 when the ANDS MS06 UQ Data Collections Registry project reference group became the forum for data management discussions at UQ.

Outputs of the Reference Group for Data Management Policy include:

• A number of drafts of a UQ policy on Research Data and Primary Materials Management. The draft policy is still under development, has not been discussed at the UQ Senate and so has not been made public.

• A survey of the UQ Faculty of Health Sciences, Institute for Social Science Research and Faculty of Engineering, Architecture and Information Technology on current research data holdings, their research data management practices, and perceived requirements for institutional support for research data storage and management.

• Training on research data management, delivered at UQ eResearch Week and Graduate Student Week in September 2011, and then periodically to interested stakeholders.

• A research data planning checklist for assisting UQ projects in planning for data management and sharing.

Research data management infrastructure

The Seeding the Commons @ UQ project developed and deployed DataSpace , a registry of University of Queensland's research data assets. The registry supports creation, storage and management of metadata records describing UQ research data holdings, the researchers and projects that created the data, and any online services that allow access or manipulation of the data.

The registry syndicates these descriptions to the ANDS Research Data Australia service, allowing national promotion and discovery of UQ research data holdings.

The registry reuses descriptions from existing UQ authoritative sources of information to simplify the creation of metadata (see Figure 2). UQ staff information is sourced from the UQ LDAP Directory and research collection information is sourced from infrastructure created through ANDS Data Capture funding:

• UQ Anthropology Museum Catalogue: a catalogue of anthropological and archaeological artefacts from The University of Queensland Anthropology Museum.
• Diffraction Image Repository (DIMER): a research database that contains diffraction images from the UQ Remote Operation Crystallization and X-Ray Diffraction Facility (UQROCX).
• Spatially Integrated Social Science (SISS) tools: portals that allow geospatial and statistical analysis of Australian Bureau of Statistics census data, Australian Electoral Commission voting data and simulations from the National Centre for Social and Economic Modelling.
• OzTrack: a portal containing animal tracking data collections and analysis tools.

During the timeframe of the Seeding the Commons project, the UQ Microscopy Image Repository (MIRAGE) Data Capture project did not syndicate metadata records to the UQ registry. Instead, MIRAGE employs the community-developed MyTardis codebase and syndicates metadata directly to ANDS Research Data Australia. (However MIRAGE now syndicates to the UQ registry as part of the subsequent MS06 project).

Descriptions of UQ research data contributed to Research Data Australia

As well as collecting metadata from UQ Data Capture infrastructure the project team actively interviewed UQ research groups about their data holdings. The interviews were structured around a template of data-oriented questions, some of which were used to stimulate thinking about data management practice, and some of which were used to hand-write research data collection descriptions. These collection descriptions were then manually entered into DataSpace using a Web-based form interface before being automatically syndicated to Research Data Australia.

The resulting 30 handwritten records represent a breadth of UQ research, and includes datasets generated by researchers in the following UQ organisational units:

• School of Civil Engineering
• School of Architecture
• Advanced Water Management Centre
• The Fryer Library
• School of Earth Sciences
• School of English, Media Studies and Art History
• School of Biological Sciences
• School of Social Sciences

Contributions to national discussion of research data management policy, practice and infrastructure

Given the project was undertaken within such a changing environment (in which ANDS was still developing data management best practice), the project actively participated in and learnt from national research data management discussions. Discussion forums included:

• ANDS community event, QUT, February 2011
• Research Data Semantic Web Interest group, QUT, February 2011 (organized by this project)
• G08 Summit on Research Data Storage Management, University of Sydney, June 2011
• ANDS Projects BoF, eResearch Australasia, Melbourne, November 2011
• “Joining the Dots, how best to support research data management within a university?” BoF, eResearch Australasia, Melbourne, November 2011 (organized by this project).
• VIVO Community Day, University of Melbourne, February 2012

Additionally, Nigel Ward (SC03 project manager) has chaired the ANDS RIF-CS Advisory Board (RAB) since July 2011. The RIF-CS schema is a data interchange format that supports the electronic exchange of metadata. The RAB is the consultative forum to consider proposed changes to the RIF-CS XML schema and make recommendations for change.

Data Type:

DataSpace is written in Java and built on the technology stack shown in Figure 4. Technologies used include:  Database: The system is developed so that it can sit on top of any relational database. It has been tested on both PostgreSQL and MySQL .  Spring : The system uses Spring 2.5 Framework for connecting components of the application.  Hibernate is used as an object-relational mapping framework. We use Hibernate annotations for defining the schema, relationships and constraints.  Apache Abdera : is an implementation of the Atom Syndication Format and Atom Publishing Protocol. We use it to generate atom records from database records.  XSLT 2.0 : is used to transform Atom output into different formats as shown in the diagram. The default format is HTML. The user can ask for specific format by adding an accept header or "repr" parameter to their request.  Apache Solr : is the search engine we use to index the database of metadata records.  Quartz : is to schedule Solr indexes of the database.  HTTP Clients: The system is built so it is a HTTP service that can be communicated with by any HTTP clients. In theory if you have a valid ATOM entry that passes our requirements, you can send POST, PUT, GET and DELTE HTTP requests to perform actions with the records.  JavaScript: is used in Browsers to add interactivity to the HTML versions of the records. For example we use JQuery to make the editing forms interactive. We also use OpenLayers and Google Maps JavaScript APIs to show maps displays of any geospatial locations in the metadata records.

High Level Software Functionality:

DataSpace Architecture

As shown in Figure 3, DataSpace sources metadata via the following:

• A manual Web data entry interface;
• ANDS Data Capture repositories at UQ; and
• the UQ LDAP Staff Directory.

Inspired by the SWORD repository ingest initiative in the UK, the Web interface and the data capture projects communicate with the registry using the Atom Publishing Protocol . The metadata is represented using a profile of the Atom format that we called Atom-RDC (Atom Research Data Context) that was inspired by the OAI ORE Atom representation . The Atom-RDC specification can represent most of the semantics of the ANDS RIF-CS format, but in a simpler more compact format. For a rationale on why the registry used AtomPub and Atom for internal UQ syndication needs, see our “Ingest (or how do you get things in there?)” blog post .

The registry syndicates metadata to ANDS Research Data Australia in RIF-CS format over the OAI-PMH protocol. It also exposes the metadata to the Web as Atom feeds and to the linked data Web as RDF/XML.

DataSpace Technologies

DataSpace is written in Java and built on the technology stack shown in Figure 4. Technologies used include:

 Database: The system is developed so that it can sit on top of any relational database. It has been tested on both PostgreSQL and MySQL .
 Spring : The system uses Spring 2.5 Framework for connecting components of the application.
 Hibernate is used as an object-relational mapping framework. We use Hibernate annotations for defining the schema, relationships and constraints.
 Apache Abdera : is an implementation of the Atom Syndication Format and Atom Publishing Protocol. We use it to generate atom records from database records.
 XSLT 2.0 : is used to transform Atom output into different formats as shown in the diagram. The default format is HTML. The user can ask for specific format by adding an accept header or "repr" parameter to their request.
 Apache Solr : is the search engine we use to index the database of metadata records.
 Quartz : is to schedule Solr indexes of the database.
 HTTP Clients: The system is built so it is a HTTP service that can be communicated with by any HTTP clients. In theory if you have a valid ATOM entry that passes our requirements, you can send POST, PUT, GET and DELTE HTTP requests to perform actions with the records.
 JavaScript: is used in Browsers to add interactivity to the HTML versions of the records. For example we use JQuery to make the editing forms interactive. We also use OpenLayers and Google Maps JavaScript APIs to show maps displays of any geospatial locations in the metadata records.