A curated list of open technology projects to sustain a stable climate, energy supply, biodiversity and natural resources.

Africa Soil and Agronomy Data Cube

Accessing and using soil and agronomy data for Africa.
https://gitlab.com/openlandmap/africa-soil-and-agronomy-data-cube

Category: Sustainable Development
Sub Category: Education

Last synced: about 21 hours ago
JSON representation

Repository metadata

Accessing and using soil and agronomy data for Africa. See also: https://isda-africa.com/isdasoil

README.md

Africa Soil and Agronomy Data Cube (tutorial)

Created and maintained by: Tom Hengl ([email protected]) |
Last compiled on: 12 January, 2023


  • Introduction
    • Purpose of this tutorial
    • Listing all layers available
    • Opening
      data in QGIS
  • Computing with COG in R
    • Crop maps and
      get values for polygons of interest
    • Calculate stocks in t/ha
    • Aggregate
      values for different land cover classes
    • Generate
      sampling plan using the mapping uncertainty
    • Building
      a spatial model to explain cropland distribution
  • Other data and
    analysis possibilities
    • Other analysis possibilities
    • Other data sources for Africa
    • Acknowledgments
    • Help
      improve the Africa soil property and nutrient maps!
  • References

This work is licensed under a Creative Commons Attribution-ShareAlike
4.0 International
License
.

alt text Introduction

Purpose of this tutorial

This data processing tutorial explains how to: (1) access various land
cover, soil, terrain and climatic layers for whole of Africa and at
various resolutions (from 30-m to 1-km), (2) use the data to derive
stocks, (3) aggregate per land cover class and/or admin units, and (4)
generate sampling designs using different algorithms and aiming at
improving accuracy of predictions (locally). A more general tutorial
explaining how to access global layers (Cloud-Optimized GeoTIFFs) from
www.OpenLandMap.org is available
here.
A detailed tutorial on how to derive nutrient defficiencies is available
here.

The main source of data used in this tutorial include:

  • iSDAsoil layers representing
    soil properties and nutrients at two standard depth intervals 0–20 and
    20–50 cm,
  • Sentinel-2 cloud-free mosaics (prepared for the purpose of
    iSDAsoil project),
  • Digital Terrain Model (DTM) based on ALOS AW3D30 and NASADEM and
    DTM derivatives (prepared for the purpose of iSDAsoil project),
  • OpenLandMap layers (from 250-m to 1-km
    resolution),
  • Population map of Africa for 2018 at 30-m
    resolution

    (Facebook Connectivity Lab and Center for International Earth Science
    Information Network — CIESIN — Columbia University. 2016. High
    Resolution Settlement Layer HRSL);

To cite iSDAsoil layers and the auxiliary data documented in this
tutorial please use:

  • Hengl, T., Miller, M.A.E., Križan, J. et al. (2021) African soil
    properties and nutrients mapped at 30 m spatial resolution using
    two-scale ensemble machine learning
    . Sci Rep 11, 6130.
    https://doi.org/10.1038/s41598-021-85639-y

Predictive modeling using 2–scale framework is also documented in detail
here.
Production of the iSDAsoil dataset and app is discussed in detail in
this
article.

If you prefer to use python for processing Cloud-Optimized GeoTIFFs,
please refer to this
tutorial
.
The https://github.com/iSDA-Africa/ repository also contains examples
with iSDAsoil worked out in python.

Important note: We do NOT recommend downloading whole GeoTIFFs of
Africa at 30-m resolution as these are usually 10–20GB in size (per
file). The total size of the repository at the moment exceeds 1.5TB.
Instead, if you need to analyze whole land mask of Africa, we recommend
downloading the files directly from
zenodo.org
and/or Amazon AWS. Also note
that nutrient stocks and aggregated soil properties can be derived using
variety of procedures (see e.g. Hengl & MacMillan
(2019)) and the total values might
eventually differ.

If you discover an issue or a bug please report.

Listing all layers available

To list all layers available at 30-m resolution for whole of Africa
please use this table. To list all
layers available at 250-m resolution (global land mask) please use
this
table
.
Note: the file versions might change hence your code would need to be
updated. Please subscribe to this repository or refer to
https://isda-africa.com for the most up-to-date information about
iSDAsoil.

The Sentinel mosaics for Africa (prepared by
MultiOne.hr) are relatively large in size and
might still contain artifacts between scenes and missing values beyond
water bodies etc. The population density map at 30-m spatial resolution
does NOT include some areas such as Sudan’s and Somalia.

Opening data in QGIS

Before you start running any type of analysis, we recommend that you
first visualize the data using QGIS and familiarize yourself with it.
Zoom in on the area of interest and see if you can observe any potential
problems with that data such as:

  • inconsistent land / water mask,
  • incorrect missing value flag,
  • measurement units not matching the description,
  • spatial patterns not matching the description,

To add any of the layers listed above follow these steps in QGIS:

  • Select “Layer”> “Add Layer”> “Add Raster layer”;
  • Select “Source Type”> “Protocol HTTP(s)/Cloud”;
  • Enter the URL of the layer and leave “No authentication”;

Example of a URL for the soil texture layer is:

https://s3.eu-central-1.wasabisys.com/africa-soil/layers30m/
    sol_texture.class_m_30m_20..50cm_2001..2017_africa_epsg4326_v0.1.tif

To attach some applicable QGIS legend, you can use the global soil data
legends available via the
OpenLandMap.org.
This is an example of the layer visualized in QGIS:

Figure: iSDAsoil layers (soil texture class) visualized in QGIS.

See also the geo-harmonizer
tutorial

for more details about how to use COG files in QGIS.

To visualize Sentinel-2 cloud-free bands for Africa, you can use e.g.:

https://s3.eu-central-1.wasabisys.com/africa-soil/layers30m/
    lcv_b8a_sentinel.s2l2a_d_30m_s0..0cm_2018..2019.s22_africa_proj.laea_v0.1.tif

This is basically 25% probability quantile range for the Sentinel S2L2a
band B8A computed from some 20TB of input data for the season 2 for
years 2018 and 2019 using the Amazon AWS Sentinel-2 service. To learn
more about how were the Sentinel-2 mosaics produced, please refer to
Hengl et al. (2021).

Figure: Sentinel-2 mosaic for Africa visualized in QGIS (left), zoom-in
on the farm plot in South Africa (right).

alt text Computing with COG in R

Crop maps and get values for polygons of interest

To crop maps of interest for the area of interest we recommend using the
rgdal and terra packages (Hijmans, Bivand, Forner, Ooms, &
Pebesma, 2020
). Consider, for example, a
polygon map representing a 1-square-km farm in South Africa and covering
diversity of agricultural fields:

library(rgdal)
## Loading required package: sp

## rgdal: version: 1.5-12, (SVN revision 1018)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 3.4.3, released 2022/04/22
## Path to GDAL shared files: /usr/share/gdal
## GDAL binary built with GEOS: TRUE 
## Loaded PROJ runtime: Rel. 8.2.0, November 1st, 2021, [PJ_VERSION: 820]
## Path to PROJ shared files: /home/tomislav/.local/share/proj:/usr/share/proj
## PROJ CDN enabled:FALSE
## Linking to sp version:1.4-4
## To mute warnings of possible GDAL/OSR exportToProj4() degradation,
## use options("rgdal_show_exportToProj4_warnings"="none") before loading rgdal.
library(terra)
## terra 1.6.51

## 
## Attaching package: 'terra'

## The following object is masked from 'package:rgdal':
## 
##     project
library(raster)
pol = readOGR("./data/farm_example.kmz")
## OGR data source with driver: LIBKML 
## Source: "/home/tomislav/Documents/git/africa-soil-and-agronomy-data-cube/data/farm_example.kmz", layer: "farm_example.kmz"
## with 1 features
## It has 11 fields
pol.v <- vect(pol)

Figure: Polygon for a farm digitized in Google Earth.

We can access and crop Potassium concentrations for the area of interest
by using:

tif.cog = paste0("/vsicurl/https://s3.eu-central-1.wasabisys.com/africa-soil/layers30m/", 
               c("sol_log.k_mehlich3_m_30m_0..20cm_2001..2017_africa_epsg4326_v0.1.tif",
               "sol_log.k_mehlich3_md_30m_0..20cm_2001..2017_africa_epsg4326_v0.1.tif",
               "sol_log.k_mehlich3_m_30m_20..50cm_2001..2017_africa_epsg4326_v0.1.tif",
               "sol_log.k_mehlich3_md_30m_20..50cm_2001..2017_africa_epsg4326_v0.1.tif"))

where _m_30m_0..20cm and _m_30m_20..50cm are the predicted
extractable K in log-scale, and _md_30m_0..20cm and _md_30m_20..50cm
are associated prediction errors (1-std).

We can load the values to R by using:

sol30m = lapply(tif.cog, function(i){terra::crop(rast(i), pol.v)})  
sol30m.sp = do.call(cbind, lapply(sol30m, function(i){as.data.frame(i)}))
str(sol30m.sp)
## 'data.frame':    1560 obs. of  4 variables:
##  $ sol_log.k_mehlich3_m_30m_0..20cm_2001..2017_africa_epsg4326_v0.1  : int  46 46 46 45 47 46 46 46 45 45 ...
##  $ sol_log.k_mehlich3_md_30m_0..20cm_2001..2017_africa_epsg4326_v0.1 : int  6 5 6 5 5 5 5 5 5 5 ...
##  $ sol_log.k_mehlich3_m_30m_20..50cm_2001..2017_africa_epsg4326_v0.1 : int  44 44 43 43 44 44 44 44 44 44 ...
##  $ sol_log.k_mehlich3_md_30m_20..50cm_2001..2017_africa_epsg4326_v0.1: int  5 5 6 5 5 5 5 4 4 4 ...

Next, we need to back-transform values from log-scale to ppms:

sol30m.sp$k_mehlich3_l1 = expm1(sol30m.sp$sol_log.k_mehlich3_m_30m_0..20cm_2001..2017_africa_epsg4326_v0.1 / 10)
sol30m.sp$k_mehlich3_l2 = expm1(sol30m.sp$sol_log.k_mehlich3_m_30m_20..50cm_2001..2017_africa_epsg4326_v0.1 / 10)

Crop to the area of interest using the polygon map:

pol.r = terra::rasterize(pol.v, sol30m[[1]])
sol30m.m = as(as(raster(pol.r), "SpatialGridDataFrame"), "SpatialPixelsDataFrame")
sol30m.m$k_mehlich3_l1 = sol30m.sp$k_mehlich3_l1[sol30m.m@grid.index]
sol30m.m$k_mehlich3_l2 = sol30m.sp$k_mehlich3_l2[sol30m.m@grid.index]

We can finally plot extractable Potassium (K) values for the farm for
two depth intervals (l1 = 0–20-cm and l2 = 20–50-cm) using:

spplot(sol30m.m[c("k_mehlich3_l1","k_mehlich3_l2")])

According to the FAO guidelines (Roy, Finck, Blair, & Tandon,
2006
) K concentration ranging from 0–50-ppm is very
low (<50% expected yield), 50–100 is low, 100–175 is medium (80–100%
yield) and >175 is high (100% yield).

Calculate stocks in t/ha

In the previous examples we have downloaded and cropped images with
predictions of Potassium (K) in ppm. We would next like to derive total
stocks per farm. For this we need to also download bulk density and
coarse fragments maps and import them into session:

bld.cog = paste0("/vsicurl/https://s3.eu-central-1.wasabisys.com/africa-soil/layers30m/", 
               c("sol_db_od_m_30m_0..20cm_2001..2017_africa_epsg4326_v0.1.tif",
               "sol_log.wpg2_m_30m_0..20cm_2001..2017_africa_epsg4326_v0.1.tif"))
bld30m = lapply(bld.cog, function(i){crop(rast(i), pol.v)})  
bld30m.sp = do.call(cbind, lapply(bld30m, function(i){as.data.frame(i)}))
bld30m.sp$bld_l1 = bld30m.sp$sol_db_od_m_30m_0..20cm_2001..2017_africa_epsg4326_v0.1 * 10
bld30m.sp$wpg_l1 = expm1(bld30m.sp$sol_log.wpg2_m_30m_0..20cm_2001..2017_africa_epsg4326_v0.1 / 10)

Next, we convert the values from ppm to kg/m-square by using (Hengl &
MacMillan, 2019
):

sol30m.m$k_mehlich3_l1s = sol30m.m$k_mehlich3_l1/1e6 * 
       bld30m.sp$bld_l1[sol30m.m@grid.index] * 
       (1 - bld30m.sp$wpg_l1[sol30m.m@grid.index]/100) * 0.2
summary(sol30m.m$k_mehlich3_l1s)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.01903 0.02358 0.02501 0.02545 0.02715 0.03385

To derive total stocks in kg, we need to resample the summary map to
equal area projection so we can estimate areas in square-m. One suitable
projection system for this is the Lambert Azimuthal Equal Area
projection system
:

prj.laea = "+proj=laea +lat_0=5 +lon_0=20 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0"
pol.xy = project(pol.v, prj.laea)
r30m = rast(ext(pol.xy), crs=prj.laea, resolution=30)
sol30m.s = project(rast(raster(sol30m.m["k_mehlich3_l1s"])), r30m)
plot(sol30m.s)

So the total stock for the whole farm for 0–20-cm depth in tonnes is:

sum(as.data.frame(sol30m.s)*30^2, na.rm=TRUE)/1e3
## [1] 22.44592

Aggregate values for different land cover classes

To aggregate values for different land cover classes (see also these
worked out examples in
R
)
we have to also download the land cover map. We can use for this purpose
the Copernicus land cover map derived for year 2016 at 20-m spatial
resolution

(important note: this product is discontinued and should be only used
for testing purposes):

lcv.cog = paste0("/vsicurl/https://s3.eu-central-1.wasabisys.com/africa-soil/layers30m/",
                 "lcv_land.cover_esa.cci.l4_m_30m_s0..0cm_2016_africa_proj.laea_v0.1.tif")
#GDALinfo(lcv.cog)

We can clip the same polygon using:

lcv30m = crop(rast(lcv.cog), pol.xy)

We can check the nutrient stocks per land cover class by using:

sol30m.sxy = as(as(raster(sol30m.s), "SpatialGridDataFrame"), "SpatialPixelsDataFrame")
sol30m.sxy$lcv = as.factor(as.data.frame(lcv30m)[sol30m.sxy@grid.index,1])
library(plyr)
sol_agg.lu <- plyr::ddply(sol30m.sxy@data, .(lcv), summarize,
                          Total_K_kg=sum(k_mehlich3_l1s*30^2, na.rm=TRUE),
                          Area_m2=sum(!is.na(k_mehlich3_l1s))*30^2)
sol_agg.lu$Total_K_M = sol_agg.lu$Total_K/(sol_agg.lu$Area_m2)
sol_agg.lu[,c("lcv", "Total_K_kg", "Area_m2", "Total_K_M")]
##   lcv Total_K_kg Area_m2  Total_K_M
## 1   1  1169.2628   42300 0.02764215
## 2   2  1431.6753   52200 0.02742673
## 3   3  1913.4636   68400 0.02797461
## 4   4 17642.4259  707400 0.02493982
## 5   5   111.2897    4500 0.02473104
## 6   8   177.8013    7200 0.02469462

which shows that the
class 3 (Grassland)
has the relatively highest K concentration.

Generate sampling plan using the mapping uncertainty

One efficient way to generate sampling designs to improve maps of
Potassium in soil is to use the existing error map, then sample
proportionally to the prediction variance, collect new points and
re-build prediction models until some required accuracy level is reached
(Hengl, Nussbaum, Wright, Heuvelink, & Gräler,
2018
). This is referred to as
“uncertainty-guided sampling” by Stumpf et al.
(2017) and is best illustrated by the
figure below.

Figure: Uncertainty-guided sampling based on Stumpf et
al. (2016)
. The first
sample is used to build initial model, then the 2nd-round sampling
focuses on area of highest uncertainty and is used to rebuild the
model.

The uncertainty for mapping K for the farm above is (for top-soil):

unc.cog = paste0("/vsicurl/https://s3.eu-central-1.wasabisys.com/africa-soil/layers30m/", 
               "sol_log.k_mehlich3_md_30m_0..20cm_2001..2017_africa_epsg4326_v0.1.tif")
unc30m = terra::crop(rast(unc.cog), pol.v)
unc30m.xy = project(unc30m, r30m, method="bilinear")
pol.r.xy = rasterize(pol.xy, unc30m.xy)
unc30m.m = as(raster(unc30m.xy), "SpatialGridDataFrame")
unc30m.sp = as.data.frame(pol.r.xy)
unc30m.m$var = unc30m.m$sol_log.k_mehlich3_md_30m_0..20cm_2001..2017_africa_epsg4326_v0.1^2
unc30m.m = as(unc30m.m["var"], "SpatialPixelsDataFrame")

We can use the spatstat package to sample
proportionally to prediction variance (i.e. use the prediction variance
as weight map):

library(spatstat)
## Loading required package: spatstat.data

## Loading required package: spatstat.geom

## spatstat.geom 2.4-0

## 
## Attaching package: 'spatstat.geom'

## The following objects are masked from 'package:raster':
## 
##     area, rotate, shift

## The following objects are masked from 'package:terra':
## 
##     area, delaunay, rescale, rotate, shift, where.max, where.min

## Loading required package: spatstat.random

## spatstat.random 2.2-0

## Loading required package: spatstat.core

## Loading required package: nlme

## 
## Attaching package: 'nlme'

## The following object is masked from 'package:raster':
## 
##     getData

## Loading required package: rpart

## spatstat.core 2.4-2

## Loading required package: spatstat.linnet

## spatstat.linnet 2.3-2

## 
## spatstat 2.3-4       (nickname: 'Watch this space') 
## For an introduction to spatstat, type 'beginner'
dens.var <- as.im(as.image.SpatialGridDataFrame(unc30m.m["var"]))
extra.pnts <- rpoint(40, f=dens.var)
plot(raster(unc30m.m))
points(extra.pnts$x, extra.pnts$y)

In this case the prediction errors (variance) is higher towards the
western side of farm, and consequently more point samples are put in
that part. Alternatively, one could also setup a threshold prediction
error, then ONLY sample in the areas where the error is above certain
value by using Latin Hypercube
Sampling

or similar (method explained in Stumpf et al.
(2017)).

To plot the points in Google Earth, you can run:

library(plotKML)
## plotKML version 0.8-2 (2021-10-16)

## URL: https://github.com/Envirometrix/plotKML/
pnts.xy = SpatialPoints(coords=data.frame(extra.pnts$x, extra.pnts$y), 
                        proj4string = CRS(proj4string(unc30m.m)))
shape = "http://maps.google.com/mapfiles/kml/pal2/icon18.png"
# color only:
kml(pnts.xy, file.name="./data/extra.pnts.kml", shape = shape, labels = "")
## KML file opened for writing...

## Reprojecting to +proj=longlat +datum=WGS84 +no_defs ...

## Writing to KML...

## Closing  ./data/extra.pnts.kml

Building a spatial model to explain cropland distribution

In the following example we focus on modeling cropland distribution in
Ethiopia. Because loading all data at 30-m spatial resolution is
probably not a good idea for processing on a laptop or desktop computer,
we can instead focus on a coarser resolution e.g. 1-km.

First, we load the admin boundaries using the NaturalEarth
data

and prepare the target grid:

library(rnaturalearth)
#devtools::install_github("ropensci/rnaturalearthhires")
library(sp)
et = ne_states(geounit="ethiopia")
et.xy = project(vect(et), prj.laea)
#plot(et)
et1km = rast(ext(et.xy), resolution=1000, crs=prj.laea)
et1km
## class       : SpatRaster 
## dimensions  : 1278, 1624, 1  (nrow, ncol, nlyr)
## resolution  : 1000, 1000  (x, y)
## extent      : 1429621, 3053621, -147033.2, 1130967  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=laea +lat_0=5 +lon_0=20 +x_0=0 +y_0=0 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
et1km.m = rasterize(et.xy, et1km)

Next we need to resample and crop all layers of interest by using GDAL.
We first list layers of interest:

et.cog = paste0("/vsicurl/https://s3.eu-central-1.wasabisys.com/africa-soil/layers30m/",
                 c("dtm_slope_aw3d30.nasadem_m_30m_s0..0cm_2017_africa_proj.laea_v0.1.tif",
                "dtm_elevation_aw3d30.nasadem_m_30m_s0..0cm_2017_africa_proj.laea_v0.1.tif",
                "lcv_crops.coverfraction.layer_probav.lc100_p_30m_0..0cm_2019..2019_v3.0.1_africa_epsg4326.tif",
                "sol_ph_h2o_m_30m_0..20cm_2001..2017_africa_epsg4326_v0.1.tif",
                "sol_log.ecec.f_m_30m_0..20cm_2001..2017_africa_epsg4326_v0.1.tif"))
olm.cog = paste0("/vsicurl/https://s3.eu-central-1.wasabisys.com/openlandmap/layers1km/",
                 c("clm_precipitation_sm2rain.dec_m_1km_s0..0cm_2007..2018_v0.2.tif",
                 "clm_precipitation_sm2rain.jun_m_1km_s0..0cm_2007..2018_v0.2.tif",
                 "clm_precipitation_sm2rain.mar_m_1km_s0..0cm_2007..2018_v0.2.tif",
                 "clm_precipitation_sm2rain.sep_m_1km_s0..0cm_2007..2018_v0.2.tif"))

Some layers might have a different projection system so we run
reprojection and resampling by default:

et.sp1km = as(as(raster(et1km.m), "SpatialGridDataFrame"), "SpatialPixelsDataFrame")
et.te = paste(round(as.vector(ext(et1km))[c(1,3,2,4)]), collapse =" ")
tif.loc = paste0("./data/", basename(c(et.cog, olm.cog)))
if(any(!file.exists(tif.loc))){
  for(j in c(et.cog, olm.cog)){
    system(paste0('gdalwarp ', j, ' ./data/', basename(j), ' -co \"COMPRESS=DEFLATE\"',
              ' -t_srs \"', prj.laea, '\" -r \"average\" -te ', et.te, ' -tr 1000 1000'))
  }
}

We can read the subset of layers (now available on our local drive)
using the mask map:

for(j in tif.loc){ et.sp1km@data[,basename(j)]  <- readGDAL(j)$band1[et.sp1km@grid.index] }
## ./data/dtm_slope_aw3d30.nasadem_m_30m_s0..0cm_2017_africa_proj.laea_v0.1.tif has GDAL driver GTiff 
## and has 1278 rows and 1624 columns
## ./data/dtm_elevation_aw3d30.nasadem_m_30m_s0..0cm_2017_africa_proj.laea_v0.1.tif has GDAL driver GTiff 
## and has 1278 rows and 1624 columns
## ./data/lcv_crops.coverfraction.layer_probav.lc100_p_30m_0..0cm_2019..2019_v3.0.1_africa_epsg4326.tif has GDAL driver GTiff 
## and has 1278 rows and 1624 columns
## ./data/sol_ph_h2o_m_30m_0..20cm_2001..2017_africa_epsg4326_v0.1.tif has GDAL driver GTiff 
## and has 1278 rows and 1624 columns
## ./data/sol_log.ecec.f_m_30m_0..20cm_2001..2017_africa_epsg4326_v0.1.tif has GDAL driver GTiff 
## and has 1278 rows and 1624 columns
## ./data/clm_precipitation_sm2rain.dec_m_1km_s0..0cm_2007..2018_v0.2.tif has GDAL driver GTiff 
## and has 1278 rows and 1624 columns
## ./data/clm_precipitation_sm2rain.jun_m_1km_s0..0cm_2007..2018_v0.2.tif has GDAL driver GTiff 
## and has 1278 rows and 1624 columns
## ./data/clm_precipitation_sm2rain.mar_m_1km_s0..0cm_2007..2018_v0.2.tif has GDAL driver GTiff 
## and has 1278 rows and 1624 columns
## ./data/clm_precipitation_sm2rain.sep_m_1km_s0..0cm_2007..2018_v0.2.tif has GDAL driver GTiff 
## and has 1278 rows and 1624 columns

so that we can plot some of the maps e.g. cropland cover fraction for
year 2019 based on GLC100m
:

plot(raster(et.sp1km[4]))
lines(et.xy)

We can test if we can explain distribution of cropland in Ethiopia by
using terrain, soils, and rainfall only:

library(ranger)
fm.n = names(et.sp1km)[-1]
fm.crop = as.formula(paste(fm.n[grep("crop", fm.n)], " ~ ", paste(fm.n[-grep("crop", fm.n)], collapse = "+")))
sel = complete.cases(et.sp1km@data[,all.vars(fm.crop)])
m.crops = ranger(fm.crop, data=et.sp1km@data[sel,], num.trees=85, importance="impurity")

Which shows:

m.crops
## Ranger result
## 
## Call:
##  ranger(fm.crop, data = et.sp1km@data[sel, ], num.trees = 85,      importance = "impurity") 
## 
## Type:                             Regression 
## Number of trees:                  85 
## Sample size:                      1084451 
## Number of independent variables:  8 
## Mtry:                             2 
## Target node size:                 5 
## Variable importance mode:         impurity 
## Splitrule:                        variance 
## OOB prediction error (MSE):       31.36133 
## R squared (OOB):                  0.9136888

This shows that the cropland distribution in Ethiopia is systematically
controlled by: (1) altitude, (2) rainfall in June and September, (3)
soil pH, and (4) slope:

varImp = data.frame(m.crops$variable.importance)
varImp
##                                                                           m.crops.variable.importance
## dtm_slope_aw3d30.nasadem_m_30m_s0..0cm_2017_africa_proj.laea_v0.1.tif                        29728238
## dtm_elevation_aw3d30.nasadem_m_30m_s0..0cm_2017_africa_proj.laea_v0.1.tif                   125905292
## sol_ph_h2o_m_30m_0..20cm_2001..2017_africa_epsg4326_v0.1.tif                                 29975380
## sol_log.ecec.f_m_30m_0..20cm_2001..2017_africa_epsg4326_v0.1.tif                             27429178
## clm_precipitation_sm2rain.dec_m_1km_s0..0cm_2007..2018_v0.2.tif                              32004779
## clm_precipitation_sm2rain.jun_m_1km_s0..0cm_2007..2018_v0.2.tif                              50478226
## clm_precipitation_sm2rain.mar_m_1km_s0..0cm_2007..2018_v0.2.tif                              26283621
## clm_precipitation_sm2rain.sep_m_1km_s0..0cm_2007..2018_v0.2.tif                              65345821

These few environmental variables explain almost 90% of variability in
the cropland distribution in Ethiopia. We could use the model fitted
above to predict what would be cropland distribution in the future
i.e. assuming different climate change scenarios and long term changes
in monthly rainfall or similar. Consider for example a linear decrease
in monthly rainfall of 30%:

et.sp1km.f = et.sp1km[sel, m.crops$forest$independent.variable.names]
sel.clm = grep("sm2rain", names(et.sp1km.f))
for(i in sel.clm){
  et.sp1km.f@data[,i] <- et.sp1km.f@data[,i] * 0.7
}

We can predict the cropland distribution by using:

crop.tif = "./data/et_cropland_predicted_1km.tif"
if(!file.exists(crop.tif)){
  crop.f = predict(m.crops, et.sp1km.f@data)
  crop.sp = SpatialPixelsDataFrame(et.sp1km.f@coords, data=data.frame(response=crop.f$predictions),
                                   proj4string = CRS(prj.laea))
  writeGDAL(crop.sp, crop.tif, type = "Byte", mvFlag = 255, options = c("COMPRESS=DEFLATE"))
}

Figure: Actual (left) versus potential (right) cropland cover assuming
linear drop in rainfall of 30%.

This is just to demonstrate spatial analysis possibilities. In practice,
predicting suitability of land using machine learning is more complex
and should be designed carefully considering both ecological and
socio-economic parameters (Møller, Mulder, Heuvelink, Jacobsen, &
Greve, 2021
).

alt text Other data and analysis possibilities

Other analysis possibilities

Other analysis possibilities with the iSDAsoil and other 30-m resolution
layers include:

  • Overlay yield data (farm plots or individual 30✕30-m pixels) and fit a
    regression model / try to model yields using soil terrain and climate
    parameters (Hengl et al., 2017),
  • Derive soil organic carbon stock in the soil in kg/m-square and
    associated (propagated) uncertainty,
  • Use the land cover and population density maps at county level and
    derive soil nutrient deficiencies and how they relate to different
    land cover classes and population densities (Berkhout, Malan, & Kram,
    2019
    ),
  • Aggregate values spatially using gdalwarp to e.g. 1-km spatial
    resolution and observe patterns per country,

Other data sources for Africa

Other data sources (not included in this Data Cube) and data portals for
Africa with Earth Observation and similar data sets:

Figure: 5-m spatial resolution imagery is available publicly via the
Planet Explorer.

Have in mind that the 3rd party data might be: (1) of different spatial
resolution, (2) covering different time-periods (not overlapping in
time), (3) of different accuracy including location accuracy.
Consequently they might be incompatible with the Cloud-Optimized
GeoTIFFs listed in this tutorial.

Acknowledgments

Innovative Solutions for Decision Agriculture Ltd
(iSDA)
is a social enterprise with the mission
to improve smallholder farmer profitability across Africa. iSDA builds
on the legacy of the African Soils information service (AfSIS) project.
We are grateful for the outputs generated by all former AfSIS project
partners: Columbia University, Rothamsted Research, World Agroforestry
(ICRAF), Quantitative Engineering Design (QED), ISRIC — World Soil
Information, International Institute of Tropical Agriculture (IITA),
Ethiopia Soil Information Service (EthioSIS), Ghana Soil Information
Service (GhaSIS), Nigeria Soil Information Service (NiSIS) and Tanzania
Soil Information Service (TanSIS). More details on AfSIS partners and
data contributors can be found at https://isda-africa.com/isdasoil

Help improve the Africa soil property and nutrient maps!

If you are a soil data producer or are aware of soil profile, soil
sample data described in literature or reports, please contact us so
that we can add those points in the next major update of maps. We would
attribute your contribution and provide support with translating the
point data to analysis-ready data. Please contribute your point data and
help make better predictions of soil / land resources in Africa.

Once your point data is added to the bulk of the training data, we will
use it to improve predictions and then notice you of the improvements.

References

Berkhout, E. D., Malan, M., & Kram, T. (2019). Better soils for healthier lives? An econometric
assessment of the link between soil nutrients and malnutrition in
Sub-Saharan Africa. PloS One, 14(1), e0210642.

Hengl, T., Leenaars, J. G., Shepherd, K. D., Walsh, M. G., Heuvelink, G.
B., Mamo, T., et al.others. (2017). Soil nutrient
maps of Sub-Saharan Africa: assessment of soil nutrient content at 250 m
spatial resolution using machine learning. Nutrient Cycling in
Agroecosystems
, 109(1), 77–102.
doi:10.1007/s10705-017-9870-x

Hengl, T., & MacMillan, R. A. (2019). Predictive
soil mapping with R
(p. 370). Lulu. com. Retrieved from
https://soilmapper.org

Hengl, T., Miller, M. A., Križan, J., Shepherd, K. D., Sila, A.,
Kilibarda, M., et al.others. (2021). African soil properties and
nutrients mapped at 30 m spatial resolution using two-scale ensemble
machine learning. Scientific Reports, 11(1), 1–18.
doi:10.1038/s41598-021-85639-y

Hengl, T., Nussbaum, M., Wright, M. N., Heuvelink, G. B., & Gräler, B.
(2018). Random forest as a generic framework for predictive modeling of
spatial and spatio-temporal variables. PeerJ, 6, e5518.
doi:10.7717/peerj.5518

Hijmans, R. J., Bivand, R., Forner, K., Ooms, J., & Pebesma, E. (2020).
terra: Spatial Data Analysis. CRAN.
Retrieved from https://rspatial.org/terra

Møller, A. B., Mulder, V. L., Heuvelink, G. B. M., Jacobsen, N. M., &
Greve, M. H. (2021). Can we use machine learning for agricultural land
suitability assessment? Agronomy, 11(4).
doi:10.3390/agronomy11040703

Roy, R. N., Finck, A., Blair, G. J., & Tandon, H. L. S. (2006). Plant nutrition for food security: A guide for integrated
nutrient management
(Vol. 16, p. 368). FAO.

Stumpf, F., Schmidt, K., Goebes, P., Behrens, T., Schönbrodt-Stitt, S.,
Wadoux, A., … Scholten, T. (2017). Uncertainty-guided sampling to
improve digital soil maps. Catena, 153, 30–38.
doi:10.1016/j.catena.2017.01.033

Score: -Infinity