ANDS Logo

Project promotion materials:

Project Homepage:

http://www.unicarb-db.org

Data collections can be seen on:

https://researchdata.ands.org.au/search/#!/rows=15/sort=score%20desc/class=collection/q=%22glycosuitedb-a-glycan-structure-repository-catalogue%22/p=1/

Instruments where data are captured/transferred from:

Mass Spectrometer

Software is available at:

http://code.google.com/p/unicarb-db/

Programming language(s):

Java

Categories:

Microscopes

Integration metadata from various systems which are internal to an institution

Integration collection records, party records and activity records which may be external to an institution

Project Members:

Prof Nicolle Packer (nicki.packer@mq.edu.au)

Dr Matthew Campbell (matthew.campbell@mq.edu.au)

Prof. Marc Wilkins ( )

ANDS Contact:

Alan Glixman (Alan.Glixman@ands.org.au)

Project Status:

Completed

Glycomics Repository

Macquarie University

Collaborator(s): University of New South Wales

Project Description

Macquarie University’s Biomolecular Frontiers Research Centre is a part of the UniCarbKB initiative (unicarb-db.org), an international collaborative project that promotes the creation of an information storage and search platform for glycomics and glycobiology research.

The ANDS-supported component of this initiative: ‘Linking Glycomics Repository with Mass Spectrometer Data Capture’ has seeded the infrastructure to capture, collate and disseminate the metadata on glycomics knowledge to the Australian and International research community. By leveraging the technical developments and services deployed by ANDS, over 1000 records from the GlycoSuiteDB database of glycan structures attached to proteins have been migrated to Research Data Australia.

These records provide access to well-managed bibliographic references and rich metadata descriptions of glycan structures and their biological context. The data flow that that connects mass spectrometry data acquisition from the Australian Proteome Analysis Facility (APAF) is now also integrating ANDS components and vocabulary services to ensure analytical data is linked to the glycan structure repository, UniCarbKB, which is now part of a new National eResearch Tools and Resources (NeCTAR) project. This initiative will allow biological and medical researchers to build upon existing efforts and will enhance research and subsequent new discoveries in glycobiology.

Project Information
Glycomics is the study of the role of biologically active carbohydrate chains, which may be bound to proteins or lipids in the body and play a crucial role in mediating biological interactions. Glycomics is a relatively new field compared to genomics or proteomics, and there is no well-established and maintained repository for the glycome as there are for the genome and proteome.

It is well acknowledged that glycomics lacks accessible and curated knowledge platforms that (in part) hinders research. The situation is characterised by distributed data collections, lack of internationally agreed standards and communication between disparate resources. Additionally, the sparseness of databases in the glycoscience domain hampers the realization of computational tools that enable the interpretation of experimental data.

The ANDS-supported ‘Linking Glycomics Repository with Mass Spectrometry Data Capture’ activity has seeded the infrastructure required to capture, collate and disseminate glycomics knowledge to the Australian and International research community. A centralised repository of international significance is now being established to store data captured from mass spectrometers linked with other data already in the repository. Specifically, the importance of this project is:

1. to bring the GlycoSuiteDB reference glycan structure repository to the Australian Proteome Analysis Facility (APAF) from the Swiss Institute of Bioinformatics. Once completed, two world leading repositories (EUROCarbDB, Glycobase) will be interfaced with GlycoSuiteDB to create a single internationally recognised glycan reference repository, known as UniCarbKB.

2. to create an experimental data and meta-data capture system at APAF. This system captures descriptive meta-data and analytical data acquired from two different mass spectrometers used to study glycan structures as part of the proteomics analysis services provided by APAF.

3. an OAI-PMH feed was created to allow Research Data Australia (RDA) to harvest details of the collections stored within the UniCarbKB repository, by implementing the ANDS endorsed RIF-CS format.

By leveraging the technical developments and services deployed by ANDS, the task of this activity was to transform existing meta-data collections into a standardised format that is more amenable to data interchange and discovery. ANDS expertise in designing services to exchange information provided the ideal platform to migrate existing data collections into a managed and accessible research data collection. Our partnership with ANDS as provided an opportunity to transfer knowledge, and open new research opportunities between bioinformatic and experimental areas.

A significant component of the work focused on understanding and integrating the advantageous RIF-CS format into the UniCarbKB framework. Over 1000 records from the GlycoSuiteDB database (http://glycosuitedb.expasy.org/) have been migrated to Research Data Australia. These records provide access to well-managed bibliographic references and rich metadata descriptions of glycan structures and their biological context. The data flow that connects mass spectrometry data acquisition from the Australian Proteome Analysis Facility (APAF) is now also integrating ANDS components and vocabulary services to ensure analytical data is linked to the glycan structure repository. This integration will ensure future distribution of experimental data descriptions that can be harvested by RDA, ultimately enhancing data discovery and re-use.

Here, the activity has advanced the state of data capture and management within the glycomics discipline. This will not only benefit the direct collaborators, Macquarie University and the University of New South Wales, but it will also serve to make Australia a world leading source of reference glycan structures and associated mass spectrometric evidence. The ability to exchange data with researchers in glycomics and beyond will increase the exposure of this important and emerging discipline, but ultimately this initiative will allow biological and medical researchers to build upon existing efforts and will enhance research and subsequent new discoveries in glycobiology.

Project website URL
The inclusion of data into Research Data Australia is making glycomics more discoverable and accessible, allowing users to search and/or browse across the corpus of curated glycomics data. Users can find all data shared and publically accessible via the Research Data Australia (http://researchdata.ands.org.au/), UniCarb-DB (http://unicarb-db.org) and UniCarbKB (http://unicarbkb.org) websites. In addition we are part of a recent international activity to promote data standardisation guidelines for glycomics, Minimum Information Required for A Glycomics Experiment (MIRAGE), is now being adopted by the above platforms (http://glycomics.ccrc.uga.edu/MIRAGE/index.php/Main_Page).

Data Type:

Glycan structures attached to proteins

High Level Software Functionality:

The growth in glycomics and the trend towards high-throughput approaches demands a general bioinformatics platform to assist collaborative research. Glycomics data should be accompanied by contextualizing 'metadata' making explicit how analyses were performed and accurate classification of sample source and preparation descriptions. To that end, guidance and bioinformatic resources specifying both the analytical data and metadata should be in place. To achieve these objectives, the UniCarb-DB, an open-source framework to capture MS data collections has been developed to formalize both metadata and experimental reporting specifications.

The main objective was the development of a relational database of experimentally determined carbohydrate structures, described by metadata data and experimental collections. This required the consideration of many diverse metadata collections. The salient features include:

a Hierarchical data organisation, to enable efficient browsing and searching of experiment data contributed by multiple institutions.
b Metadata-oriented design, to describe experiments and link related datasets. Provision of defined metadata collections built from controlled vocabularies and endorsed MIRAGE guidelines.
c Capture of processes to perform experiments, including data acquisition and interpretation.
d Mass spectrometer output files and structure assignments.

We have build a series of novel tools, embedded in the UniCarb framework, which have alleviated the difficulties in processing data formats and the association of appropriate metadata descriptions.
The diversity of data, both at the structural and experimental level, are numerous and the ANDS project helped develop a data capture system that is greatly enhancing the efficiency of researchers using mass spectrometers for glycomics research. The technical outputs of this project include web-based applications that enable the creation, management, and harvesting of experimental data acquired from mass spectrometry analysis, located at the Australian Proteomics Analysis Facility (APAF). This application is integrated with a centralized glycomics database (UniCarb-DB) that has the capacity and capability to store large analytical data collections, linked with structural (curated) data in the repository.

Currently each of the mass spectrometer instruments located at APAF produces results in a different format. This data capture project is automating the translation of results into a common format, making data capture easier and facilitating the use of the mass spectrometers to acquire and interpret data in the glycoanalysis field. The functionality of the project can be broken down into two categories. First the web application is allowing users to browse and search both structural and experimental LC-MS data including biological context descriptors, publications, structural (mass, substructure, composition) and MS spectral features. The second function is the provision of tools that support (meta)data submission to the centralised database and workflows for the upload and processing of LC-MS data to the centralized platform.

Features:
- Interfaces to search and retrieve glycan structure and experimental data
- Workflows to assist data upload, management and processing
- Building and extending metadata collections to describe the data stored
- Recommendation of formats for storing and making available glycomics data
- Secure, scalable and reliable web framework to support the growth of glycoinformatics
- Integration of existing tools, database, standards and protocols
- Generates RIF-CS Collection, Party and Activity descriptions from metadata
- Assignment of persistent identifiers to the all data collections where required.

ANZSRC-FOR code:

06 BIOLOGICAL SCIENCES