A full description of the methods used to generate this product can be found at -
https://aussoilsdsm.esoil.io/slga-version-2-products/soil-ph-15-cacl2
Release 2 has come about via several mechanism and presents a completely different approach as to how release 1 was developed. Namely:
1. A huge expansion of the available library of data corresponding to each of the main soil state factors has been made possible (Searle et al. 2022). This is through acquisition of new data sets and improvement of others compared with those used for version 1.
2. Adoption of machine learning to derive empirical relationships between target variable (total soil nitrogen content) and various data related to the state factors that help determine and control soil variability across landscapes, here the Australian continent and very nearshore islands. While the adoption of ML is not an entirely new advancement, the coupling of it with additional data, and integration of it within a psedo-3D predictive framework permit an improved ability to spatially and vertically characterise soils than Version 1 did.
3. Together with a more powerful and streamlined predictive modelling approach, the quantification of uncertainties draws on the use of the UNEEC (Uncertainty Estimation based on Empirical Errors and Clustering; Shrestha and Solomatine 2006) approach instead of bootstrapping approach so that prediction interval bounds are more custom to the variations in state factor information. Bootstrapping tends to create uniform prediction interval ranges, whereas UNEEC can distinguish areas of relatively lower and higher uncertainties based on differences in soil and landscape characteristics. Therefore, for Version 2, the uncertainties are more custom and tightly defined to the environment they are quantified in.
4. An approach to understand and characterise issues of model extrapolation has been developed. This seeks to highlight areas where there is high confidence that models are going be unreliable, because these areas are outside the range of the underpinning data used in modelling. This issue is addressed via combination of data geometric and distance-based techniques.
The sequence of steps below were carried out to develop the Version 2 products:
- Prepared point and covariate data, including filtering, cleansing, and harmonisation.
- Point data intersection with covariates.
- Creation of model and test data sets.
- Ranger model hyperparameter value optimisation.
- Ranger model fitting with best hyperparameters.
- Spatialisation of ranger models.
- Uncertainty analysis with UNEEC method including rudimentary optimisation of class number size.
- Spatialisation of model uncertainties.
- Model extrapolation work with count of observation and boundary method (point data).
- Ranger model fitting of extrapolation outcomes.
- Spatialisation of model extrapolation outcomes.
- Model evaluations with both test data and against SLGA Version 1 products.
- Delivery of digital soil mapping outputs and computer code to repository.