ANDS Logo

Project promotion materials:

Project Homepage:

http://www.ozhupohpp7.com/

Data collections can be seen on:

http://researchdata.ands.org.au/the-proteome-browser

Software is available at:

http://code.google.com/p/hupohpp/

Project Members:

Anitha Kannan (Project Manager, Anitha.Kannan@monash.edu)

Simon Yu (Developer, Xiaoming.Yu@monash.edu)

Anthony Beitz (Project Manager, anthony.beitz@monash.edu)

ANDS Contact:

Mingfang Wu (mingfang.wu@ands.org.au)

Project Status:

Completed

Human Chr7 Proteomics Integration Project

Monash University

Collaborator(s): Macquarie University, Bioplatforms Australia BPA, Proteomics Australia

Project Description

In 2010 the Human Proteome Organisation launched the Human Proteome Project (HPP), aimed at cataloguing the protein information arising from the plethora of worldwide proteomic based studies. To support complete coverage, one arm of the project has taken a gene- or chromosomal-centric strategy (C-HPP). The approach to dividing labour in this international effort has been to assign each of the 24 human chromosomes to one or more countries. In this scheme, the Australian/New Zealand consortium has been assigned Chromosome 7, as this chromosome contains various genetic markers associated with diseases relevant to the Australian population.

Despite multiple large international biological databases housing genomic and protein data, there is currently no single system that integrates up-to-date pertinent information from each of these data repositories and assembles the information into a format suitable for a global proteomics effort of the type proposed by the C-HPP.

We have undertaken to produce a data integration and analysis software system for the C-HPP effort and to make data collections from this resource discoverable through ANDS's Research Data Australia. Whilst the software is being designed to be ultimately species and chromosome independent, the initial focus is on the development of a resource for Human Chromosome 7.

The primary goal of this project was to integrate some widely used data sources into a web browser interface designed to display an overview of the current evidence supporting the identification of various gene products across the chromosome, such as protein expression, modification and disease association, with the ability to drill down to the original data. The design of the Proteome Browser portal allows for easy addition of both new data sources and data categories.

The Proteome Browser will assist Australian and International efforts in completing a map of the Human Proteome. The mapping of the human proteome, even a partial mapping, will help elucidate biological and molecular function and provide advance diagnosis and treatment of diseases. This approach may also be applied to other animals and plants.

Project Outcomes


In conjunction with the Proteomics community, the Proteome Browser team has developed an analysis tool that integrates protein data from a number of source systems including Ensemble, GPM, The Human Protein Atlas and NeXtProt. It provides a ‘traffic light’ representation of proteomic data, where the X axis relates to each gene (i.e. protein) ordered by default in the order found on the Chromosome, and the Y axis relates to the types of evidence that support the identification of proteins.

The traffic light system is used to indicate cases where different types of data exists or does not exist for a particular protein. Various aspects of the underlying contributing information are available for further analyses using clustering and drill down/through capabilities. The user interface is dynamic showing relevant information by providing the ability to filter and also provides links to external systems where appropriate.

The Proteome Browser team has realised the goal of producing an “easy to use” data integration and analysis software system for the Australia/New Zealand C-HPP effort, and to make data collections from these resources discoverable through ANDS’s Research Data Australia

Data Type:

Input Data:
We will integrate proteomics data from the following publicly available data in a phased approach: - NeXtProt - GPM - Human Protein Atlas - Peptide Atlas - Gene Cards - + 3 additional data sources (TBC)

Output Data:
C-HPP collections will include protein metadata based on: - Disease - Tissue expression/distribution - Cellular Component - Biological Process - Molecular Function - Interactions - Availability of Information - Quality of Information

High Level Software Functionality:

The Proteome Browser (TPB) portal: http://www.proteomebrowser.org


Detailed technical design and specification is accessible from the project wiki page: Technical Design and Specification

Here is a stack of technologies adopted by the project:

Apache Server


The web server is responsible for serving web pages, via the HTTPS protocol to clients.

Tomcat Application Server


The application server hosts the TPB System and hosts the business logic and the business model classes of applications. It serves requests for dynamic HTTP web pages from Web servers. This server also runs fortnightly Cron jobs to poll the data sources for new data. Therefore new/updated data released by any of the data sources is regularly integrated into the TPB traffic light report.

Database Server


The database server persists the metadata and data for The Proteome Browser.

LaRDs Storage Server


LaRDs storage server persists the RIF-CS files.

OAI-PMH Data Provider


Provides the RIF-CS feeding into the ANDS Collections Registry

A key architectural goal was to leverage industry best practices for designing and developing a scalable, enterprise-wide J2EE application. To meet this goal, the design of the TPB project was based on core J2EE patterns as well as the industry standard development guidelines. Standard design patterns like the ones outlined below were used in the design and development of the project:

• MVC (Model-View-Controller) model The Model-View-Controller design pattern solves inter-dependencies between data access code, business logic code and presentation code by decoupling data access, business logic, and data presentation and user interaction.

• DAO (Data Access Object)
The Data Access Object pattern helps to decouple the business logic from the database thus increasing the portability of the system.

• Spring IoC (Inversion of Control)
Inversion of Control or IoC is one of the techniques used to wire services or components to an application program. In Spring, the IoC principle is implemented using the Dependency Injection design pattern which leaves the system components loosely coupled and allows the developer to code to abstractions.

ANZSRC-FOR code:

Medical Biochemistry Proteins and Peptides (incl. Medical Proteomics) : Proteomics and Intermolecular Interactions (excl. Medical Proteomics)