Project Homepage:

Project Members:

Steve McEachern (Project Lead,

ANDS Contact:

Karen Visser (

Project Status:

In Progress

Data Citation Infrastructure Establishment Program

Australian Data Archive

Project Description

The Data Citation Infrastructure Establishment Program was a project conducted by the Australian Data Archive, with the support of the Australian National Data Service across 2013 and 2014, as part of the ANDS Data Citation project. The Australian Data Archive provided project management, research and policy development for the project, with additional support from Olaf Delgado-Friedrichs, independent contractor, on software development for ADA’s DOI minting facility.

Project impetus and drivers

The Australian Data Archive has a long history of supporting data citation for the use of secondary data, dating back to its introduction of recommended bibliographic citations for data sourced from ADA, introduced in the 1990s. Despite this, ADA had little knowledge of the existing state of practices among researchers, as it had limited capacity to monitor and enforce these recommendations within its user base. ADA therefore had a significant interest in better understanding and improving citation practices within its user base into the future.

In parallel to this, the establishment of the Australian National Data Service, and the introduction of the DOI minting services for research data collections through the international DataCite network, meant that the use of automated electronic means for monitoring data usage was becoming increasingly viable. In addition, through the use of the DOI standard, there now existed a mechanism for supporting computer mediated data citation that was familiar to academic researchers, as it was one that was already in use with many research publications.

For these reasons, the establishment of the ADA Data Citation Infrastructure Establishment Program project under the ANDS Data Citation program was opportune. The aims of this project were four-fold:

a) To develop and expand on the Australian Data Archive’s existing support for data citation within the Australian Data Archive

b) To develop and embed a culture of data citation within the ADA user base, and to provide a sustainable environment for data citation in Australian social science research into the future

c) To understand and monitor the data citation practices of data depositors and third-party users of ADA data

d) To provide a contribution to emerging international efforts to support data citation, including the Thomson Reuters Data Citation Index and the ORCID-DataCite Network project

Project outcomes

This project, unlike some other more technically-oriented ANDS projects, has been focussed more on understanding and influencing data citation practices among researchers. The project did include the establishment of more detailed technical systems for supporting citation of ADA-provided data, but the primary focus of the project, as detailed in the aims above, was to improve understanding of existing citation practices in the social sciences, a field with a long history of data sharing, and to influence future citation practices of current and future social science researchers.

To this end, the approach that has been adopted is also based in social science, particularly in exploring the current citation practices within research publications. Project deliverables 1 and 3, examining citation practices of both data depositors and third party users, adopts a multi-method approach to understanding these practices, studying both automated citation indexing results from the new Thomson Reuters Data Citation Index (DCI) and a manual review of citations from publications identified through Google Scholar as referencing the Australian Election Study (AES) data series (used as the test case for this project). The findings of this analysis, presented in Appendix One, show that there is a strong culture of data referencing within the AES data user group, but that the specific practices (identified through a manual review) are quite varied in their approach – including in-text citation, inclusion of bibliographic information, or incorporation into methodological descriptions. By comparison, reliance on only specific bibliographic information in a text mining system such as DCI gives the impression of a much lower level of adoption of data citation. The outcome of this section of the project therefore suggests that data citation can be (and in the case of the AES, has been) embedded into disciplinary practices. However it also suggests that in order to utilise automated citation analysis techniques, data managers and support services will need to converge around a common model for enabling citation – and DOIs as the likely model for this – and then find the means to influence data users to adopt this common model, rather than the diverse set of approaches that currently exist.

To this end, the technical developments of this project, highlighted in the next section, provide one way forward for how this might be achieved. The development of ADA’s DOI minting facilities, and the establishment of internal workflows and external advice to users for the use of DOI-enabled bibliographic citations, provide a means for those users who are already utilising bibliographic methods to provide acknowledgement of data sources in a manner which supports automated monitoring. There would appear however to be less value in DOIs to understand the practices of those researchers however who tend more towards in-text citation methods or data descriptions. In these cases, approaches such as those identified by Boland et al. (2012 – DOI 10.1007/978-3-642-33290-6_17), using the mining of referenced snippets, may be more fruitful for understanding citation practices in the near term.

Project public information

The public information for the project is contained on the ADA website under the ADA “Data Access” section at: Dr Steve McEachern was the project leader and primary contact for the project.

Details of how to access project materials resulting from this project are available in Section 8 of this report. Data managers from other institutions are welcome to use and adapt these materials as needed, as all materials are available under a Creative Commons CC-BY license. Interested users are also welcome to contact the Project Lead, Steve McEachern, or the Australian Data Archive, at, if they would like more information about the project or are interested in future collaborations in this area.