Project promotion materials:

Data collections can be seen on:!/q=*%3A*/p=1/tab=service/group=QFAB

Software is available at:

Project Members:

Dominique Gorse (Project Manager,

Pierre-Alain Chaumeil (Developer,

Anne Kunert (Developer,

S William (Developer,

Xinyi Chua (Developer,

ANDS Contact:

Mingfang Wu (

Project Status:


Cancer Genomics Linkage Application

University of Queensland

Collaborator(s): University of Melbourne, Monash University, VLSCI Victorian Life Sciences Computational Initiative, Bioplatforms Australia BPA

Project Description:

Project impetus and drivers

Although cancers may look the same under the microscope, they behave quite differently at the genomic level. By identifying these genetic differences, researchers can start to understand what treatment works for particular patients and administer those first. Understanding which cancers don’t respond to treatment is also vital as this allows researchers to focus down on the molecular mechanisms that are responsible and design new drugs to attack those particular mechanisms that cause the cancer to be lethal.

In this context, the primary goal of the International Cancer Genome Consortium (ICGC) is to generate comprehensive catalogues of genomic abnormalities in tumours from 50 different cancer types and/or subtypes which are of clinical and societal importance across the globe. The Australian component of that Consortium, the Australian Pancreatic Cancer Genome Initiative (APGI) is delivering the genomic data associated with pancreatic tumour samples. For this, the APGI uses the best clinical material available, with well-characterised and accurately annotated clinico-pathological, treatment and outcome data acquired prospectively.

However, discovery of variations and similarities within the genomes of a given cancer collection and being able to compare these to other datasets remain a significant challenge for biologist and clinician researchers. The effective re-use of datasets of international importance is limited by the ability of research biologists and clinicians to access and to use computational and data infrastructure. A researcher performing such an analysis is currently expected to be conversant with programming, command line scripting, data management, high performance computing, network-based communications, and visualisation at a minimum. Additionally, they must ensure that the steps in the analysis are recorded in sufficient detail for the results to be reproduced in-house at least, and ideally would document the analysis in such a way as to allow publication of the method(s).

The Cancer Genomics Linkage Application funded by ANDS enables the re-use and integration of data available from public repositories such as the ICGC variant database or the DrugBank database of drugs and drug targets by leveraging the Genomics Virtual Lab capability on the research cloud. Researchers, such as Professor Andrew Biankin and colleagues from the Garvan Institute for Medical Research are now able to access genomic datasets of international importance and to integrate them with their own clinical and genomic datasets in order to explore, discover and validate key genomic abnormality that cause cancer using user friendly computational workflows. The project further provides the mechanism for such researchers to publish and to make available their analysis for re-use by the community.

Project outcomes

The project enables the in-depth interrogation of cancer genomic datasets and allows the comparison to other genomic datasets by providing research biologists and clinicians with direct access to them through the Genomics Virtual Lab (GVL). The key benefits to the Australian genomics community gained from this project are:

Research Champion:

Professor Andrew Biankin, Head, Pancreatic Cancer Research, Cancer Program, Garvan Institute of Medical Research

Professor Sean Grimmond, Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland

Professor John Mattick, Executive Director, Garvan Institute of Medical Research

Data Type:

Input Data:
- Pancreatic cancer tumour/normal paired genomic DNA sequences - Pancreatic cancer exome DNA sequences - Pancreatic cancer RNAseq transcriptome sequences - Pancreatic cancer small RNA sequences - Somatic variant data from the 50 different tumour types and/or subtypes. This will include SNPs, structural rearrangements and transcriptional and epigenetic modifications

Input Data:
At least 3 RIF-CS service description of Galaxy workflows and associated RIF-CS collection descriptions (the input and output data collections)

High Level Software Functionality:

The Cancer Genomics Linkage project is composed of four interrelated components:

The technical solutions developed for this project were:

For detailed information, please visit the project blog.