Integrating 'Big' geoscience data into the petascale national environmental research interoperability platform (NERDIP): Successes and unforeseen challenges

Lesley Wyborn, Benjamin J.K. Evans

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    4 Citations (Scopus)

    Abstract

    The Australian Government has begun an initiative to organise publicly funded national data assets and make them accessible for research through the Research Data Services initiative (RDS), which supports over 40 PBytes of multidisciplinary data at eight nodes around Australia. One of these nodes is at the National Computational Infrastructure (NCI) that provides a national comprehensively integrated high performance computing facility. NCI is a partnership between the ANU, the Australian Bureau of Meteorology, Geoscience Australia (GA) and the Australian Commonwealth Science and Industry Research Organisation (CSIRO) and particularly focuses on Earth system sciences. As part of its activity in RDS, NCI has collocated over 10 PBytes of priority research data collections spanning a wide range of disciplines from geosciences, geophysics, environment, climate, weather, and water resources, through to astronomy, bioinformatics, and the social sciences. To facilitate access, maximise reuse and enable integration across the disciplines, data have been built into a platform that NCI has called, the National Environmental Research Data Interoperability Platform (NERDIP). The platform is co-located with the significant HPC resources: a 1.2 PetaFlop supercomputer (Raijin), and a HPC class 3000 core OpenStack cloud system (Tenjin). Combined, they offer unparalleled opportunities for geosciences researchers to undertake innovative Data-intensive Science at scales and resolutions never before attempted, as well as enabling participation in new collaborations in interdisciplinary science. However, compared with other 'Big Data' science disciplines (climate, oceans, weather, astronomy), current geoscience data management practices and data access methods need significant work to be able to scale-up and thus to take advantage of the changes in the global computing landscape. Although the geosciences have many 'Big Data' collections that could be incorporated within NERDIP, they typically comprise heterogeneous files that are distributed over multiple sites and sectors, and it is taking considerable time to aggregate these into large High Performance Data (HPD) sets that are structured to facilitate uptake in HPC environments. Once incorporated into NERDIP, the next challenge is to ensure that researchers are ready to both use modern tools, and to update their working practises so as to process these data effectively. This is an issue in part because the geoscience community has been slow to move to peak-class systems for Data-intensive Science and integrate with the rest of the Earth systems community.

    Original languageEnglish
    Title of host publicationProceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015
    EditorsFeng Luo, Kemafor Ogan, Mohammed J. Zaki, Laura Haas, Beng Chin Ooi, Vipin Kumar, Sudarsan Rachuri, Saumyadipta Pyne, Howard Ho, Xiaohua Hu, Shipeng Yu, Morris Hui-I Hsiao, Jian Li
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages2005-2009
    Number of pages5
    ISBN (Electronic)9781479999255
    DOIs
    Publication statusPublished - 22 Dec 2015
    Event3rd IEEE International Conference on Big Data, IEEE Big Data 2015 - Santa Clara, United States
    Duration: 29 Oct 20151 Nov 2015

    Publication series

    NameProceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015

    Conference

    Conference3rd IEEE International Conference on Big Data, IEEE Big Data 2015
    Country/TerritoryUnited States
    CitySanta Clara
    Period29/10/151/11/15

    Fingerprint

    Dive into the research topics of 'Integrating 'Big' geoscience data into the petascale national environmental research interoperability platform (NERDIP): Successes and unforeseen challenges'. Together they form a unique fingerprint.

    Cite this