A full description of the methods used to generate this product can be found at -
https://aussoilsdsm.esoil.io/slga-version-2-products/total-soil-nitrogen
The first effort to derive national digital soil mapping of total soil nitrogen (expressed as a percentage of fine soil mass) is published and available on the CSIRO Data Access Portal among other places. The present work sought to update this mapping as part of ongoing efforts to expand and improve Australia’s national mapping and characterisation of its soil resources. Collectively these national soil mapping efforts constitute the Soil and Landscape Grid of Australia. The original work has been deemed as Version 1 (completed 2015), while the new work logically is Version 2 (completed 2023). This work has been made possible through support and funding from Australia’s National Collaborative Research Infrastructure Strategy (NCRIS) via the Terrestrial and Ecosystem Research Network.
As with the first effort, digital soil mapping is the underpinning framework for the ultimate creation of soil maps in this instance.
As with the other more recent national digital soil mapping efforts, the SoilDataFederator (Searle 2020) has been instrumental in the dynamic collation of disparate soil observational datasets from across the country. These data have been sourced mainly from each State and Territory Government departments tasked with soil survey and collection. Plus there are other data contributions from Universities and to a lessor extent individual research groups. The SoilDataFederator also taps into the larger CSIRO developed Natsoil database (CSIRO 2020) which holds the data related to research projects and field stations that CSIRO has managed.
The improvement in digital soil mapping has come about via several mechanism.
1. A huge expansion of the available library of data corresponding to each of the main soil state factors has been made possible (Searle et al. 2022). This is through acquisition of new data sets and improvement of others compared with those used for version 1.
2. Adoption of machine learning to derive empirical relationships between target variable (total soil nitrogen content) and various data related to the state factors that help determine and control soil variability across landscapes, here the Australian continent and very nearshore islands. While the adoption of ML is not an entirely new advancement, the coupling of it with additional data, and integration of it within a psedo-3D predictive framework permit an improved ability to spatially and vertically characterise soils than Version 1 did.
3. Together with a more powerful and streamlined predictive modelling approach, the quantification of uncertainties draws on the use of the UNEEC (Uncertainty Estimation based on Empirical Errors and Clustering; Shrestha and Solomatine 2006) approach instead of bootstrapping approach so that prediction interval bounds are more custom to the variations in state factor information. Bootstrapping tends to create uniform prediction interval ranges, whereas UNEEC can distinguish areas of relatively lower and higher uncertainties based on differences in soil and landscape characteristics. Therefore, for Version 2, the uncertainties are more custom and tightly defined to the environment they are quantified in.
4. An approach to understand and characterise issues of model extrapolation has been developed. This seeks to highlight areas where there is high confidence that models are going be unreliable, because these areas are outside the range of the underpinning data used in modelling. This issue is addressed via combination of data geometric and distance-based techniques.
The sequence of steps below were carried out to develop the Version 2 products:
- Prepared point and covariate data, including filtering, cleansing, and harmonisation.
- Point data intersection with covariates.
- Creation of model and test data sets.
- Ranger model hyperparameter value optimisation.
- Ranger model fitting with best hyperparameters.
- Spatialisation of ranger models.
- Uncertainty analysis with UNEEC method including rudimentary optimisation of class number size.
- Spatialisation of model uncertainties.
- Model extrapolation work with count of observation and boundary method (point data).
- Ranger model fitting of extrapolation outcomes.
- Spatialisation of model extrapolation outcomes.
- Model evaluations with both test data and against SLGA Version 1 products.
- Delivery of digital soil mapping outputs and computer code to repository.