Data on total organic carbon (TOC) concentration (%) was extracted with the
Soil Data Federator managed by CSIRO. The Soil Data Federator is a web API that compiles soil data from various institutions and government agencies throughout Australia. The laboratory methods for total organic carbon included in the study are 6A1, 6A1_UC, 6B2, 6B2b, 6B3, 6B3a. Details on the methods are available in
Table 1 (Wadoux et al, 2023).
We selected TOC data from the period 1970-2020 to get a compromise between representativity of current TOC concentration and spatial coverage. The data was cleaned and processed to harmonize units, exclude duplicates and potentially wrong data entries (e.g. missing upper or lower horizon depths, extreme TOC values, unknown sampling date). Additional TOC measurements from the Biome of Australian Soil Environments (BASE) contextual data (Bisset et al., 2016) were also included in the analyses. TOC concentration for BASE samples was determined by the Walkley-Black method (method 6A1). Upper limits for TOC concentration by biome and land cover classes were set according to published literature, consistent datasets (Australian national Soil Carbon Research Program (SCaRP) and BASE, and data exploration to exclude unrealistic TOC values (e.g. maximum TOC = 30% in temperate forests, maximum TOC = 14% in temperate rainfed pasture). Since TOC concentration in Australian ecosystems has been underestimated by previous SOC maps, we did not set conservative TOC upper limits, knowing that machine learning model would likely underestimate high SOC values.
The equal-area quadratic spline function were fitted to the whole collection of pre-processed TOC data, and then values extracted for the 0-5 cm, 5-15 cm, 15-30 cm, 30-60 cm, 60-100 cm, and 100-200 cm depth intervals, following GlobalSoilMap specifications (see Arrouays et al., 2014). Boxplots with TOC values by biome and land cover after data cleaning and depth standardization are shown in Figure 1.
Covariates: We collected a set of 57 spatially exhaustive environmental covariates covering Australia and representing proxies for factors influencing SOC formation and spatial distribution: soil properties, climate, organisms/vegetation, relief and parent material/age. The covariates were reprojected to WGS84 (EPSG:4326) projection and cropped to the same spatial extent. All covariates were resampled using bilinear interpolation or aggregated to conform with a spatial resolution with grid cell of 90 m x 90 m.
Mapping: The spatial distribution of soil TOC concentration is driven by the combined influence of climate, vegetation, relief and parent materials. We thus modelled TOC concentration as a function of environmental covariates representing biotic and abiotic control of TOC. The measurement of SOC and their corresponding value of environmental covariate at same measurement locations were used to fit the mapping model. For the mapping we used a machine learning model called quantile regression forest.
Mapping is made with Quantile regression forest, which is similar to the popular random forest algorithm for mapping. Instead of obtaining a single statistic, that is the mean prediction from the decision trees in the random forest, we report all the target values of the leaf node of the decision trees. With QRF, the prediction is thus not a single value but a cumulative distribution of the TOC prediction at each location, which can be used to compute empirical quantile estimates.
All processing for the generation of these products was undertaken using the R programming language (R Core Team, 2020).