blockCV

Suitable for the evaluation of a variety of spatial modelling applications, including classification of remote sensing imagery, soil mapping, and species distribution modelling.
https://github.com/rvalavi/blockcv

Category: Biosphere
Sub Category: Species Distribution Modeling

Keywords

cross-validation r r-package rstats spatial spatial-cross-validation spatial-modelling species-distribution-modelling

Last synced: about 20 hours ago
JSON representation

Repository metadata

The blockCV package creates spatially or environmentally separated training and testing folds for cross-validation to provide a robust error estimation in spatially structured environments. See

README.md

blockCV

R build status
codecov
CRAN version
total
License
DOI
Methods in Ecology & Evolution

Spatial and environmental blocking for k-fold and LOO cross-validation

The package blockCV offers a range of functions for generating train
and test folds for k-fold and leave-one-out (LOO)
cross-validation (CV). It allows for separation of data spatially and
environmentally, with various options for block construction.
Additionally, it includes a function for assessing the level of spatial
autocorrelation in response or raster covariates, to aid in selecting an
appropriate distance band for data separation. The blockCV package is
suitable for the evaluation of a variety of spatial modelling
applications, including classification of remote sensing imagery, soil
mapping, and species distribution modelling (SDM). It also provides
support for different SDM scenarios, including presence-absence and
presence-background species data, rare and common species, and raster
data for predictor variables.

Main features

  • There are four blocking methods: spatial, clustering,
    buffers, and NNDM (Nearest Neighbour Distance Matching)
    blocks
  • Several ways to construct spatial blocks
  • The assignment of the spatial blocks to cross-validation folds can
    be done in three different ways: random, systematic and
    checkerboard pattern
  • The spatial blocks can be assigned to cross-validation folds to have
    evenly distributed records for binary (e.g. species
    presence-absence/background) or multi-class responses (e.g. land
    cover classes for remote sensing image classification)
  • The buffering and NNDM functions can account for presence-absence
    and presence-background data types
  • Using geostatistical techniques to inform the choice of a suitable
    distance band by which to separate the data sets

New updates of the version 3.0

The latest major version of blockCV (v3.0) features significant updates and changes. All function names have been revised to more general names, beginning with cv_*. Although the previous functions (version 2.x) will continue to work, they will be removed in future updates after being available for an extended period. It is highly recommended to update your code with the new functions provided below.

Some new updates:

  • Function names have been changed, with all functions now starting
    with cv_
  • The CV blocking functions are now: cv_spatial, cv_cluster,
    cv_buffer, and cv_nndm
  • Spatial blocks now support hexagonal (now, default),
    rectangular, and user-defined blocks
  • A fast C++ implementation of Nearest Neighbour Distance Matching
    (NNDM)
    algorithm (Milà et al. 2022) is now added
  • The NNDM algorithm can handle species presence-background data and
    other types of data
  • The cv_cluster function generates blocks based on kmeans
    clustering. It now works on both environmental rasters and the
    spatial coordinates of sample points
  • The cv_spatial_autocor function now calculates the spatial
    autocorrelation range for both the response (i.e. binary or
    continuous data)
    and a set of continuous raster covariates
  • The new cv_plot function allows for visualization of folds from
    all blocking strategies using ggplot facets
  • The terra package is now used for all raster processing and
    supports both stars and raster objects, as well as files on
    disk.
  • The new cv_similarity provides measures on possible extrapolation
    to testing folds

Installation

To install the latest update of the package from GitHub use:

remotes::install_github("rvalavi/blockCV", dependencies = TRUE)

Or installing from CRAN:

install.packages("blockCV", dependencies = TRUE)

Vignettes

To see the practical examples of the package see:

  1. blockCV introduction: how to create block cross-validation
    folds
  2. Block cross-validation for species distribution
    modelling
  3. Using blockCV with the caret and tidymodels (see here)

Basic usage

This code snippet showcases some of the package's functionalities, but for more comprehensive tutorials, please refer to the vignette included with the package (and above).

# loading the package
library(blockCV)
library(sf) # working with spatial vector data
library(terra) # working with spatial raster data
# load raster data; the pipe operator |> is available in R v4.1 or higher
myrasters <- system.file("extdata/au/", package = "blockCV") |>
  list.files(full.names = TRUE) |>
  terra::rast()

# load species presence-absence data and convert to sf
pa_data <- read.csv(system.file("extdata/", "species.csv", package = "blockCV")) |>
  sf::st_as_sf(coords = c("x", "y"), crs = 7845)

# spatial blocking by specified range and random assignment
sb <- cv_spatial(
    x = pa_data, # sf or SpatialPoints of sample data (e.g. species data)
    column = "occ", # the response column (binary or multi-class)
    r = myrasters, # a raster for background (optional)
    size = 450000, # size of the blocks in metres
    k = 5, # number of folds
    hexagon = TRUE, # use hexagonal blocks - defualt
    selection = "random", # random blocks-to-fold
    iteration = 100, # to find evenly dispersed folds
    biomod2 = TRUE # also create folds for biomod2
)

Or create spatial clusters for k-fold cross-validation:

# create spatial clusters
set.seed(6)
sc <- cv_cluster(
    x = pa_data, 
    column = "occ", # optionally count data in folds (binary or multi-class)
    k = 5
)
# now plot the created folds
cv_plot(
    cv = sc, # a blockCV object
    x = pa_data, # sample points
    r = myrasters[[1]], # optionally add a raster background
    points_alpha = 0.5,
    nrow = 2
)

Investigate spatial autocorrelation in the landscape to choose a
suitable size for spatial blocks:

# exploring the effective range of spatial autocorrelation in raster covariates or sample data
cv_spatial_autocor(
    r = myrasters, # a SpatRaster object or path to files
    num_sample = 5000, # number of cells to be used
    plot = TRUE
)

Alternatively, you can manually choose the size of spatial blocks in an
interactive session using a Shiny app.

# a shiny interactive app to aid selecting a size for spatial blocks
cv_block_size(
    r = myrasters[[1]],
    x = pa_data, # optionally add sample points
    column = "occ",
    min_size = 2e5,
    max_size = 9e5
)

Reporting issues

Please report issues at: https://github.com/rvalavi/blockCV/issues

Citation

To cite package blockCV in publications, please use:

Valavi R, Elith J, Lahoz-Monfort JJ, Guillera-Arroita G. blockCV: An R
package for generating spatially or environmentally separated folds for
k-fold cross-validation of species distribution models
. Methods Ecol
Evol
. 2019; 10:225--232. https://doi.org/10.1111/2041-210X.13107


Owner metadata


GitHub Events

Total
Last Year

Committers metadata

Last synced: 3 days ago

Total Commits: 310
Total Committers: 4
Avg Commits per committer: 77.5
Development Distribution Score (DDS): 0.016

Commits in past year: 33
Committers in past year: 1
Avg Commits per committer in past year: 33.0
Development Distribution Score (DDS) in past year: 0.0

Name Email Commits
Roozbeh Valavi v****r@g****m 305
Ian Flint i****t@u****u 2
Ian Flint i****t@2****u 2
MayaGueguen m****n@g****m 1

Committer domains:


Issue and Pull Request metadata

Last synced: 8 months ago

Total issues: 48
Total pull requests: 7
Average time to close issues: 3 months
Average time to close pull requests: 7 days
Total issue authors: 32
Total pull request authors: 4
Average comments per issue: 4.06
Average comments per pull request: 0.71
Merged pull request: 6
Bot issues: 0
Bot pull requests: 0

Past year issues: 5
Past year pull requests: 1
Past year average time to close issues: 25 days
Past year average time to close pull requests: 1 minute
Past year issue authors: 4
Past year pull request authors: 1
Past year average comments per issue: 1.6
Past year average comments per pull request: 1.0
Past year merged pull request: 1
Past year bot issues: 0
Past year bot pull requests: 0

More stats: https://issues.ecosyste.ms/repositories/lookup?url=https://github.com/rvalavi/blockcv

Top Issue Authors

  • pat-s (6)
  • AMBarbosa (5)
  • ozgurhsyndgn (3)
  • Cam-in (2)
  • immaryw (2)
  • bcknr (1)
  • Moncef-Boukhecheba (1)
  • Navvie2019 (1)
  • rudeboybert (1)
  • Geethen (1)
  • zhangzhixin1102 (1)
  • topepo (1)
  • anackr (1)
  • bfakos (1)
  • thomasp85 (1)

Top Pull Request Authors

  • rvalavi (3)
  • MayaGueguen (2)
  • iflint1 (2)
  • be-marc (1)

Top Issue Labels

  • invalid (2)
  • good first issue (2)
  • enhancement (2)
  • help wanted (1)
  • bug (1)

Top Pull Request Labels


Package metadata

cran.r-project.org: blockCV

Spatial and Environmental Blocking for K-Fold and LOO Cross-Validation

  • Homepage: https://github.com/rvalavi/blockCV
  • Documentation: http://cran.r-project.org/web/packages/blockCV/blockCV.pdf
  • Licenses: GPL (≥ 3)
  • Latest release: 2.1.4 (published almost 5 years ago)
  • Last Synced: 2026-05-09T09:51:50.683Z (4 days ago)
  • Versions: 10
  • Dependent Packages: 8
  • Dependent Repositories: 10
  • Downloads: 4,947 Last month
  • Docker Downloads: 8
  • Rankings:
    • Forks count: 3.598%
    • Stargazers count: 3.938%
    • Dependent packages count: 6.615%
    • Dependent repos count: 9.247%
    • Average: 10.809%
    • Downloads: 13.896%
    • Docker downloads count: 27.563%
  • Maintainers (1)

Dependencies

.github/workflows/R-CMD-check.yml actions
  • actions/checkout v3 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-tinytex v2 composite
DESCRIPTION cran
  • R >= 3.5.0 depends
  • progress * imports
  • raster >= 2.5 imports
  • sf >= 0.8 imports
  • automap >= 1.0 suggests
  • covr * suggests
  • cowplot * suggests
  • future * suggests
  • future.apply * suggests
  • geosphere * suggests
  • ggplot2 >= 3.2.1 suggests
  • knitr * suggests
  • methods * suggests
  • rgdal * suggests
  • rgeos * suggests
  • rmarkdown * suggests
  • shiny >= 1.0.3 suggests
  • shinydashboard * suggests
  • testthat * suggests

Score: 14.71875657631762