Instruments where data are captured/transferred from:

Illumina HiSeq 2000

Software is available at:

Software categories:


Integration metadata from various systems which are internal to an institution

Project Members:

Jianfeng Li (Project Manager,

Professor Dave Adelson (Project Sponsor,

ANDS Contact:

Andrew Williams (

Project Status:


Genomics Data Capture

University of Adelaide

Project Description:

The University of Adelaide and SA Pathology joint Centre for Cancer Genomics characterises cancer genomes via whole genome and exome re-sequencing. This work is centred on Next-Generation gene sequencers, e.g. Illumina HiSeq 2000. Each experimental run creates a set of sequences, associated settings descriptions.

Rich content collected from whole genome sequencing provides opportunity and challenges at the same time. These sequence files are normally bigger than 1GB of size which demands significant space for storing them. Some data files contains sensitive patient gene, they need to be stored in a secure area. Analysis of data is not straightforward and it normally takes long time. During the course of study, the information such as: where the data is, how the analyses were done, what tools have been used and etc. can easily be lost. These data are import to researchers not only in current study but also can be useful in the future to the research group and all research community.

This project provides a system to manage and store useful information in a secured area. Analytic outputs of raw data are also tacked to discover re-useable analysis pipeline for a particular problem.
It also has two ways to publish information for sharing. Sequence data is published to public sequence repository European Bioinformatics Institute (EBI) European Nucleotide Archive (ENA) and at the same time the collection description, along with relevant, associated party, activity and service records are made available for harvest to Research Data Australia (RDA).

The areas of research that will benefit directly from this activity include:
• 111203 - Cancer Genetics
• 110899p - Medical Microbiology not elsewhere classified
• 060408 - Genomics
• 321020 - Pathology.

With this software solution,
• data are managed in a model complies EBI and RDA requirements to ensure relevant information are collected;
• information can be easily found;
• whole analysis history of sequence which includes tools used are captured;
• submission to EBI is easier with an authorised submission account with EBI and required information collected from very beginning satisfies EBI strict rules on how to describe data.

Data Type:

Gene sequences

High Level Software Functionality:

Data capture off Illumina HiSeq 2000
Data and metadata transfer to EBI
Metadata harvest to RDA