This is Version 2 of the Depth of Regolith product of the Soil and Landscape Grid of Australia (produced 2015-06-01).
The Soil and Landscape Grid of Australia has produced a range of digital soil attribute products. The digital soil attribute maps are in raster format at a resolution of 3 arc sec (~90 x 90 m pixels).
The Soil and Landscape Grid of Australia has produced a range of digital soil attribute products. The digital soil attribute maps are in raster format at a resolution of 3 arc sec (~90 x 90 m pixels).
- Attribute Definition: The regolith is the in situ and transported material overlying unweathered bedrock;
- Units: metres;
- Spatial prediction method: data mining using piecewise linear regression;
- Period (temporal coverage; approximately): 1900-2013;
- Spatial resolution: 3 arc seconds (approx 90 m);
- Total number of gridded maps for this attribute: 3;
- Number of pixels with coverage per layer: 2007M (49200 * 40800);
- Data license : Creative Commons Attribution 4.0 (CC BY);
- Variance explained (cross-validation): R^2 = 0.38;
- Target data standard: GlobalSoilMap specifications;
- Format: Cloud Optimised GeoTIFF;
Credit
We at TERN acknowledge the Traditional Owners and Custodians throughout Australia, New Zealand and all nations. We honour their profound connections to land, water, biodiversity and culture and pay our respects to their Elders past, present and emerging.
This work was jointly funded by CSIRO, Terrestrial Ecosystem Research Network (TERN) and the Australian Government through the National Collaborative Research Infrastructure Strategy (NCRIS).
Purpose
The Soil and Landscape Grid of Australia is a comprehensive fine spatial resolution grid of functional soil attributes and key landscape features across Australia. The landscape attributes are derived from the data collected by the Shuttle Radar Topography Mission, whilst the soil attribute surfaces are modelled from existing soils information. These surfaces provide a consistent base upon which a broad range of analyses can be conducted. This provides essential information, not previously available, that is needed for modelling and managing Australian landscapes and ecosystems.
Lineage
The methodology consisted of the following steps: (i) drillhole data preparation, (ii) compilation and selection of the environmental covariate raster layers and (iii) model implementation and evaluation.
Drillhole data preparation:
Drillhole data was sourced from the National Groundwater Information System (NGIS) database. This spatial database holds nationally consistent information about bores that were drilled as part of the Bore Construction Licensing Framework (http://www.bom.gov.au/water/groundwater/ngis/). The database contains 357,834 bore locations with associated lithology, bore construction and hydrostratigraphy records. This information was loaded into a relational database to facilitate analysis.
Regolith depth extraction:
The first step was to recognise and extract the boundary between the regolith and bedrock within each drillhole record. This was done using a key word look-up table of bedrock or lithology related words from the record descriptions. 1,910 unique descriptors were discovered. Using this list of new standardised terms analysis of the drillholes was conducted, and the depth value associated with the word in the description that was unequivocally pointing to reaching fresh bedrock material was extracted from each record using a tool developed in C# code.
The second step of regolith depth extraction involved removal of drillhole bedrock depth records deemed necessary because of the “noisiness” in depth records resulting from inconsistencies we found in drilling and description standards indentified in the legacy database.
On completion of the filtering and removal of outliers the drillhole database used in the model comprised of 128,033 depth sites.
Selection and preparation of environmental covariates
The environmental correlations style of DSM applies environmental covariate datasets to predict target variables, here regolith depth. Strongly performing environmental covariates operate as proxies for the factors that control regolith formation including climate, relief, parent material organisms and time.
Depth modelling was implemented using the PC-based R-statistical software (R Core Team, 2014), and relied on the R-Cubist package (Kuhn et al. 2013). To generate modelling uncertainty estimates, the following procedures were followed: (i) the random withholding of a subset comprising 20% of the whole depth record dataset for external validation; (ii) Bootstrap sampling 100 times of the remaining dataset to produce repeated model training datasets, each time. The Cubist model was then run repeated times to produce a unique rule set for each of these training sets. Repeated model runs using different training sets, a procedure referred to as bagging or bootstrap aggregating, is a machine learning ensemble procedure designed to improve the stability and accuracy of the model. The Cubist rule sets generated were then evaluated and applied spatially calculating a mean predicted value (i.e. the final map). The 5% and 95% confidence intervals were estimated for each grid cell (pixel) in the prediction dataset by combining the variance from the bootstrapping process and the variance of the model residuals. Version 2 differs from version 1, in that the modelling of depths was performed on the log scale to better conform to assumptions of normality used in calculating the confidence intervals. The method to estimate the confidence intervals was improved to better represent the full range of variability in the modelling process. (Wilford et al, in press)
Drillhole data preparation:
Drillhole data was sourced from the National Groundwater Information System (NGIS) database. This spatial database holds nationally consistent information about bores that were drilled as part of the Bore Construction Licensing Framework (http://www.bom.gov.au/water/groundwater/ngis/). The database contains 357,834 bore locations with associated lithology, bore construction and hydrostratigraphy records. This information was loaded into a relational database to facilitate analysis.
Regolith depth extraction:
The first step was to recognise and extract the boundary between the regolith and bedrock within each drillhole record. This was done using a key word look-up table of bedrock or lithology related words from the record descriptions. 1,910 unique descriptors were discovered. Using this list of new standardised terms analysis of the drillholes was conducted, and the depth value associated with the word in the description that was unequivocally pointing to reaching fresh bedrock material was extracted from each record using a tool developed in C# code.
The second step of regolith depth extraction involved removal of drillhole bedrock depth records deemed necessary because of the “noisiness” in depth records resulting from inconsistencies we found in drilling and description standards indentified in the legacy database.
On completion of the filtering and removal of outliers the drillhole database used in the model comprised of 128,033 depth sites.
Selection and preparation of environmental covariates
The environmental correlations style of DSM applies environmental covariate datasets to predict target variables, here regolith depth. Strongly performing environmental covariates operate as proxies for the factors that control regolith formation including climate, relief, parent material organisms and time.
Depth modelling was implemented using the PC-based R-statistical software (R Core Team, 2014), and relied on the R-Cubist package (Kuhn et al. 2013). To generate modelling uncertainty estimates, the following procedures were followed: (i) the random withholding of a subset comprising 20% of the whole depth record dataset for external validation; (ii) Bootstrap sampling 100 times of the remaining dataset to produce repeated model training datasets, each time. The Cubist model was then run repeated times to produce a unique rule set for each of these training sets. Repeated model runs using different training sets, a procedure referred to as bagging or bootstrap aggregating, is a machine learning ensemble procedure designed to improve the stability and accuracy of the model. The Cubist rule sets generated were then evaluated and applied spatially calculating a mean predicted value (i.e. the final map). The 5% and 95% confidence intervals were estimated for each grid cell (pixel) in the prediction dataset by combining the variance from the bootstrapping process and the variance of the model residuals. Version 2 differs from version 1, in that the modelling of depths was performed on the log scale to better conform to assumptions of normality used in calculating the confidence intervals. The method to estimate the confidence intervals was improved to better represent the full range of variability in the modelling process. (Wilford et al, in press)