Data collections can be seen on:

Software is available at:

Software categories:


Project Members:

Dominique Gorse (Data Source Administrator,

Troy Sadkowsky (Lead Developer,

Felicity Newell (Developer,

Pierre-Alain Chaumeil (Developer,

Dominique Gorse (Data Source Administrator,

ANDS Contact:

Andrew White (

Project Status:


Linking the EMBL Australia EBI Mirror with the ARDC - Component A

University of Queensland

Collaborator(s): QFAB

Project Description:

The Aims and Objectives of this project are to populate RDA with collection descriptions of Australian-related DNA/RNA/protein sequence data held in the EBI databanks.

For the past 30 years, data in molecular bioscience have routinely been collected from researchers, organised and made freely available via international data facilities. The European Bioinformatics Institute (EBI, part of the European Molecular Biology Laboratory, EMBL) is one of the major international data facilities providing such services. On the 21 June 2011, the mirror facility of EBI was launched at the University of Queensland in Brisbane This means that scientists from Australia can now access important research data faster and mine them in novel ways. This also provides the opportunity for linking data of Australian interest deposited at the EBI to the Australian Research Data Commons.

In this project, more than 13,000 collection records describing Australian-related content of the EBI nucleotide and protein sequence databases (i.e. data from Australian dwelling animal and plant species or data submitted to the EBI from Australian Research Institutions) were created. A large effort was made to divide and thoughtfully describe the content of large databases into many smaller datasets that are of potential interest to a wide and varied range of researchers. The link between the RDA and the EBI is provided through the use of landing pages that are simple to use and contain structured information useful to non-domain specialists who are unfamiliar with the content of the EBI databases. Molecular data of Australian interest that is present on the EBI are now more easily found, accessible and re-usable.

Data Type:

DNA, RNA and protein sequence data that are "Australian-associated".

High Level Software Functionality:

Features: Creating and updating collections:

- Automatically define collections that are of Australian importance through the use and customisation of third party taxonomies or structured vocabularies.

- Query the EBI data, extract and store metadata to build collection level descriptions

- Automatically produced a summary outlining each collection for the non-domain expert

- Assign a persistent identifier for the collection descriptions

- Provide RIF-CS compliant collections to the RDA

- Identify new collections as new data become available

- Modify collections from updated content or taxonomy

Navigating from the RDA to the EBI Mirror:

- Provide a landing page for each collection at the Australian EBI mirror

- Navigate to the primary data through an EBI searches or links

- Enable a user to extend the search (where relevant) across the entire EBI data, beyond the Australian-associated focus.

- The landing page will:

- provide a summary of the basic metadata for the collection

- allow navigation to the primary data

- allow navigation to related collections

- offer the ability to extend the search where relevant


06 Biological Sciences