Linking the EMBL Australia EBI Mirror with the Australian Research Data Commons

University of Queensland

Collaborator(s): QFAB

Project Description:

"The Aims and Objectives of this project are:

Component A: to populate RDA with collection descriptions of data held in the EBI databanks.

For the past 30 years, data in molecular bioscience have routinely been collected from researchers, organised and made freely available via international data facilities. The European Bioinformatics Institute (EBI, part of the European Molecular Biology Laboratory, EMBL) is one of the major international data facilities providing such services. On the 21 June 2011, the mirror facility of EBI was launched at the University of Queensland in Brisbane. This means that scientists from Australia can now access important research data faster and mine them in novel ways. This also provides the opportunity for linking data of Australian interest deposited at the EBI to the Australian Research Data Commons.
In this project, more than 13,000 collection records describing Australian-related content of the EBI nucleotide and protein sequence databases were created. A large effort was made to divide and thoughtfully describe the content of large databases into many smaller datasets that are of potential interest to a wide and varied range of researchers. The link between the RDA and the EBI is provided through the use of landing pages that are simple to use and contain structured information useful to non-domain specialists who are unfamiliar with the content of the EBI databases. Molecular data of Australian interest that is present on the EBI are now more easily found, accessible and re-usable.

Component B: to enable submission of descriptions to RDA for data associated with secondary analyses performed using the Australian EBI mirror.

This will specify and implement automated systems to extract metadata from secondary analysis of data (either EBI data or user-uploaded data) performed by researchers using the NCI-SF in Bioinformatics instance of Bioflow, to generate appropriate RIF-CS collection and service descriptions for datasets generated through the secondary analysis, make these metadata discoverable within the RDA, and enable navigation from RDA to the corresponding Bioflow workflow and EBI data in the EBI mirror. This will enable complex bioinformatics workflows that can be applied in the re-analysis of molecular data, to be found by a wide audience and re-used.

Component C: to deliver collection descriptions for EBI data aligned with BioPlatforms Australia (BPA) themes to RDA.

This will engage BPA and other relevant RandD communities to help conceptualise and specify views of Australian EBI Mirror data within the context of BPA theme projects, and organise data within the Mirror to constitute these views and make them discoverable through RDA. This will enable the data produced through the NCRIS/Super Science BioPlatforms Australia investments to be presented through the EBI mirror and RDA. "

Data Type:

"Bio-molecular data that are ""Australian-associated"" Seconary analysis of data (either EBI data or user-uploaded data) "

High Level Software Functionality:

Features: "Creating and updating collections:
? Automatically define collections that are of Australian importance through the use and customisation of third party taxonomies or structured vocabularies.
? Query the EBI data, extract and store metadata to build collection level descriptions
? Automatically produced a summary outlining each collection for the non-domain expert
? Assign a persistent identifier for the collection descriptions
? Provide RIF-CS compliant collections to the RDA
? Identify new collections as new data become available
? Modify collections from updated content or taxonomy

Navigating from the RDA to the EBI Mirror:
? Provide a landing page for each collection at the Australian EBI mirror
? Navigate to the primary data through an EBI searches or links
? Enable a user to extend the search (where relevant) across the entire EBI data, beyond the Australian-associated focus.
? The landing page will:
? provide a summary of the basic metadata for the collection
? allow navigation to the primary data
? allow navigation to related collections
? offer the ability to extend the search where relevant


06 Biological Sciences