ANDS Logo

Project Homepage:

http://itee.uq.edu.au/eresearch/projects/ands/dimer/

Data collections can be seen on:

https://researchdata.ands.org.au/contributors/the-university-of-queensland

Data Management Policy/Procedure:

http://www.itee.uq.edu.au/eresearch/projects/ands/dimer

Instruments where data are captured/transferred from:

Rigaku Saturn and R-AXIS IV++

Software is available at:

http://sourceforge.net/projects/dimer/

Categories:

Observational Instruments (e.g. from telescope, camera, etc.)

Project Members:

Dr. Nigel Ward (n.ward4@uq.edu.au)

Prof. Jane Hunter (jane@itee.uq.edu.au)

ANDS Contact:

Andrew White (andrew.white@ands.org.au)

Project Status:

Completed

DIMER Diffraction Image Repository

University of Queensland

Project Description

RESEARCH FOCUS

Structural biology has emerged as one of the most powerful approaches for defining the functions of proteins. The strong predictive power of structure in functional annotation has resulted in the rapid growth of the field of structural genomics, which has enabled the discovery of novel drug targets and advanced our understanding of protein evolution. The field promises to have a major impact on the life sciences, biotechnology, and medicine.

High throughput or parallel processing approaches have been developed for producing protein samples for structural biology and functional studies. Crystals that are successfully formed in these samples are subject to X-ray diffraction, which is the most widely used approach for protein structure determination, accounting for approximately 85% of structures in the Protein Data Bank. At UQ, the Structural Genomics group has established a high throughput processing pipeline at the Remote Operation Crystallization and X-ray Diffraction Facility (UQ ROCX) that applies parallel processing to hundreds of protein targets monthly.

DRIVERS AND GOALS

An ARC Discovery project (funded from 2006‐2011) that involved a collaboration between the Structural Genomics group and the UQ eResearch Lab, enabled the development of prototype e‐research services to capture the data resulting from the protein crystallization and structural analysis pipeline. The two main outcomes of the ARC project were: TIMTAM, a laboratory information management system for target selection and crystallisation experiments; and DIMER, a repository for X‐ray diffraction images that are processed to determine crystal structure. The ANDS project was established in order to fulfil a number of additional requirements associated with DIMER and to share the resulting datasets via UQ DataSpace and Research Data Australia.

It was recognised that support for automated capture of images and metadata was necessary in order to maximise the number of datasets in DIMER: this removes a significant barrier for users, given the time pressures felt by scientists and the difficulty is managing the large file sizes of diffraction image sets. It also assists researchers in fulfilling their data management obligations, which include the storage and backup of research outputs and granting appropriate access rights to colleagues and supervisors.

DIMER was also extended to allow published datasets to be accessible and shareable both via the UQ collections registry (UQ DataSpace) and ANDS Research Data Australia (RDA) discovery services. This boosts the profile of datasets hosted by DIMER and fills a gap in publishing the outputs of X-ray crystallography studies. Syndication to UQ DataSpace and ANDS RDA allows the diffraction image datasets to be linked to journal articles, (indexed in PubMed), and protein structures stored in the Protein Data Bank.

In addition to these two major features, this project also involved improving the robustness, scalability, and usability of the repository. These changes represented a transition of DIMER from a prototype to production system.

OUTCOMES

This project has implemented and deployed an automatic data capture component within the UQ ROCX facility that allows diffraction image datasets to be harvested directly from each instrument. The owner of the dataset receives an email notification and simply needs to complete a small number of metadata fields in order to create a complete record. Files are directly transferred into DIMER, removing the need for users to manually upload large diffraction image datasets.

The automatic capture of diffraction image datasets from the UQ ROCX facility improves the integration of DIMER into the researcher’s workflow. This is vital given the time constraints faced by researchers: scientists acknowledge the value of depositing datasets into open-access repositories, in terms of enhancing the credibility of published findings by allowing experiments to be reviewed and potentially repeated; however, given the pressure on scientists to progress their laboratory work and, in particular, to publish their work in academic journals, the publication of datasets is often overlooked.

In addition to making datasets available to the wider community following publication, storage in DIMER also enables researchers to fulfil their internal data management obligations. DIMER provides a repository that is accessible to all X-ray crystallography researchers at UQ, with support for granting read and/or write access to fellow laboratory group members, supervisors, and other collaborators. This facilitates collaborative research and prevents datasets becoming “lost” when a researcher leaves UQ. In the absence of using DIMER, researchers may store images on portable disks or their local machine. By automatically capturing datasets in DIMER, it is ensured that datasets are reliably stored and backed up.

As noted above, making datasets publicly available alongside traditional academic publications enhances the credibility of scientific research. The use of UQ DataSpace and ANDS Research Data Australia to publicise data also improves the exposure of a researcher’s output, boosting their academic profile. It has been noted that Research Data Australia provides a similar service to PubMed except that, whereas PubMed stores metadata about published articles, RDA stores metadata about published research data. DIMER fills the gap in publishing the outputs of X-ray crystallography studies alongside journal articles captured by PubMed and protein structures captured by the Protein Data Bank. Syndication of metadata from DIMER via ANDS RDA makes these datasets accessible and discoverable online.

At the present time, five datasets from DIMER have been made available under the Creative Commons Attribution 3.0 Australia License (this license is prescribed by DIMER). These can be viewed on Research Data Australia under the ‘Diffraction Image Experiment Repository’ collection, located at http://researchdata.ands.org.au/diffraction-image-experiment-repository-httpdataspaceuqeduaucollections1w.

These datasets were generated from a variety of projects undertaken at UQ – all of which have the potential to inform the design of novel drugs; for example, the development of a new class of antivirulence compounds to combat antibiotic-resistant infection. Further collections will come online as datasets are automatically captured from UQ ROCX and more researchers are encouraged to make their datasets publically accessible. 151 new datasets (out of a total of 204) have already been added to the repository as a result of the DIMER project. The publication rate for these datasets is, however, only 2.5%; the reasons for this, and how this could be improved, are discussed in the ‘Lessons Learnt’ section.

A range of other improvements have been made to the robustness, scalability, and usability of DIMER. These include an overhaul of DIMER web design to fit the standard UQ template, making it more familiar to users, and other user interface improvements that were suggested during user testing (e.g. a custom file upload applet, shorter URLs for citation purposes, and improved help content). Robustness of DIMER was improved in a number of areas, with better error handing across the board and fixes for rendering issues in versions of Internet Explorer. DIMER has coped well with increasing volumes of data, but some changes were made to preserve the responsiveness of the web user interface while running background tasks to process diffraction images.

Data Type:

protein crystallograph: a set of images, instrument data, and analytical output

High Level Software Functionality:

Features: "PD:
a) This project will support the capture of images from the Rigaku Saturn and R-AXIS IV++
b) The data collection and all associated metadata, is stored in a Structural Genomics Group repository";

ANZSRC-FOR code:

03 CHEMICAL SCIENCES