ANDS Logo

Project promotion materials:

Project Homepage:

http://itee.uq.edu.au/~eresearch/projects/ands/siss/

Data collections can be seen on:

https://researchdata.ands.org.au/contributors/the-university-of-queensland

Software is available at:

http://sourceforge.net/projects/andsdc4bproject/

Programming language(s):

Java

Categories:

Others

Integration metadata from various systems which are internal to an institution

Project Members:

Dr. Nigel Ward (n.ward4@uq.edu.au)

Prof. Jane Hunter (jane@itee.uq.edu.au)

ANDS Contact:

Andrew White (andrew.white@ands.org.au)

Project Status:

Completed

Spatially Integrated Social Science

University of Queensland

Project Description

Drivers & aims

The field of Spatially Integrated Social Science (SISS) recognises that much data that the social scientist examines has an associated geographic location (for example, surveys may be associated with the geo-location of respondents). SISS systems use such geographic information as the basis for both integrating heterogeneous social science data sets and for visualising the results of analyses.

Building a SISS system, however, involves a number of time-consuming and highly skilled processes, including sourcing data sets, understanding and encoding relationships between data and geography, and implementing appropriate statistical analysis techniques.

The UQ SISS project aimed to reduce the burden of these tasks by building spatial and statistical analysis tools for social scientists investigating Australian demographic, socio-economic and voting data.

Outcomes

Throughout 2011 the SISS project focussed on developing online tools that allow researchers to quickly access rich Australian socio-spatial datasets related to voting outcomes and census data, conduct statistical modelling, and visualize spatial relationships in the data. The project:
• Established a repository of statistical variables derived from Australian Bureau of Statistics Census Data and Australian Electoral Commission voting data;
• Developed geospatial visualisation and statistical services for analysing these variables;
• Exposed these visualisation and analysis services through three Web portals that allow researchers to easily analyse the variables; and
• Shared RIF-CS research data collection descriptions of the derived data via ANDS Research Data Australia (RDA).

Novelty

The ANDS SISS project built on previous ARC-funded investments and effort that established an e-Research Facility for Socio-Spatial Analysis at the University of Queensland. This pre-existing facility was built using proprietary technologies. It supported socio-spatial statistical analysis, modeling and visualisation of variables derived from 2006 Australian Bureau of Statistics (ABS) census data and 2007 Australian Electoral Commission (AEC) voting data at polling booth catchment level of geography.

The ANDS SISS project extended this work in number of ways. It:

• Provided access to variables based on more recent data (2006 census data and 2010 federal election data);
• Derived human capital indices and employment change variables that allow comparison across the 1996, 2001 and 2006 censuses;
• Completely re-built the facility using open source technologies (the facility previously relied on proprietary technologies);
• Created user interface components and statistical tools that dynamically re-configure based on metadata about data variables;
• Promoted the existence of the data variables to a broader community by syndicating collection descriptions to Research Data Australia.

Data collection and derivation

The project collected data from the 1996, 2001 and 2006 Australian Censuses of Population and Housing (sourced from the Australian Bureau of Statistics - ABS), and the 2010 Australian Federal Election (sourced from the Australian Electoral commission). Variables relevant to spatially integrated social science research were derived from the ABS and AEC collected data for three levels of geography:
Polling booth catchments, Statistical local areas, Local government areas.

Definitions for the Statistical Local Area and Local Government Area levels of geography come directly from the Australian Standard Geographical Classification. The Queensland Centre for Population Research (QCPR) created the Polling Booth Catchment level of geography by geo-coding polling booths (based on Australian Electoral Commission data) and spatially allocating Census Collection Districts (from ABS) to a nearest polling booth location to form polling booth catchments within each of the 150 Electoral Divisions

Many of the datasets also include location quotients comparing each variable's local value against the national benchmark for that variable.

Data sharing and management

Although the original census and voting data is freely available, the "joined" data, the derived data, and some of the region definitions contain significant intellectual property developed prior to this project, which we did not have permission to share. Despite this restriction, the project aimed to allow open analysis of the underlying and derived data. For this reason, the project provides:
1. mediated access to the underlying data (researchers are encourages to contact the project team to discuss access);
2. open access to analysis tools through the Web portals;
3. publicity for the data by syndicating data descriptions to the UQ Data Collections Registry and ANDS Research Data Australia (RDA)

We feel this approach balanced our obligations to protect the intellectual property of the research teams who created the derived data, while still promoting the data (through RDA), and allowing the socio-spatial research community to perform data analyses.

Data Type:

"Derived datasets: location quotients, spatial indices,summaries, human capital indices and social capital indice, inter-regional migration flows, economic change indices raw datasets: location quotients, spatial indices, summaries, human capital indices and social capital indice, inter-regional migration flows, economic change indices"

High Level Software Functionality:

We based our technical architecture on an existing socio-spatial research facility previously developed at UQ using proprietary technologies. Our software development team extended this work in a number of ways. In particular, the new system:
• Uses only open source technologies to build the portals (the pre-existing facility previously relied mainly on proprietary technologies);
• Presents the data and results using modern Web technologies (such as XML, WMS, GeoServer, and OpenLayers), rather than proprietary formats.
• Separates statistical analysis and data visualisation into two separate components. Previously R routines performed the analysis and constructed the visualisations. In the new system, R still performs the statistical analysis, but the visualisation is constructed in the Web client on the fly using Processing.js JavaScript client in modern browsers. To support older browsers (i.e. Internet Explorer 7.x and 8.x), we generate the visualisation as an image on the server-side and serve it to the Web client.
• Potentially provides access to any statistical tool developed as a package for the R-project (that supplies data in a format understood by the portal). This means the portals can potentially offer new tools for analysing and visualising socio-spatial data as new R analysis tools become available.
• Dynamically creates user interface components and statistical tools based on metadata about data variables. This means that new data can easily be added to the portals.

ANZSRC-FOR code:

1603 DEMOGRAPHY
1604 HUMAN GEOGRAPHY
1605 POLICY AND ADMINISTRATION
1609