Project promotion materials:

Project Homepage:

Data collections can be seen on:

Programming language(s):

Python, CartoDB, MATLAB, Natural Language Toolkit

Project Members:

Dr Fiona Tweedie ( )

Daniel McDonald ( )

Dr Isabell Kiral-Kornek ( )

Damien Irving ( )

Lachlan Musicman ( )

Karen Visser ( )

ANDS Contact:

Project Status:


Data Carpentry: Data Intensive (services) training in the cloud

University of Melbourne

Project Description

During this project, Research Community Coordinators based at the University of Melbourne developed training materials for widely-used data analysis tools (Python, Natural Language Toolkit, MATLAB and CartoDB) utilising Australian research datasets.

These training materials were designed to integrate with the existing curriculums taught by the Data Carpentry Foundation, an international network that teaches introductory programming and data analysis skills. To support the spread of data skills in Australia, Research Platforms Services hosted a training conference, which trained new instructors from Australia and New Zealand to teach data carpentry and provided intensive instruction to a cohort of researchers.

To ensure that training is relevant to researchers and does not fall into the trap of being an overview of software without adequate context, all training at the Research Bazaar is delivered by researchers and is challenge-driven. That is, participants are working with real data and solving programming problems very quickly within the workshop. This project funded Research Community Coordinators at the University of Melbourne to develop training materials using open research data and deliver this training to their research peers. Research Community Coordinators are drawn from disciplines across the university to ensure that the program addresses the needs of diverse research fields.

The project has showcased four datasets:
• The Malcolm Fraser Radio Electorate Talks, an important collection of political speeches spanning thirty years and highlighting Fraser’s self-fashioning as a local member (available under a CC-BY-NC license)
• Data collected by the Turquoise Coast (TURQ) HF ocean radar system, a collection of data concerning ocean surface currents, significant for modeling climate patterns (available under a CC-BY licence)
• An EEG dataset, a Generalizable BCI using Machine Learning for Feature Discovery, which demonstrates motor-related neural signals with possible application (available under a CC-0 license)
• Data on lobster populations in two locations off the Tasmanian coast, of interest for profiling the lobster population (including size, age, and reproduction) and informing fishing policies Note: full dataset not available due to commercial sensitivity. Limited dataset and data dictionary available under a CC-BY license.

High Level Software Functionality:

Lesson materials, available for reuse:
• Python/ Climate science (
• CartoDB/ Lobster population
• MATLAB/ EEG data