AgML

Aspires to identify key research gaps and opportunities at the intersection of agricultural modelling and machine learning research and support enhanced collaboration and engagement between experts in these disciplines.
https://github.com/bigdatawur/agml-cy-bench

Keywords from Contributors

measurements sanitation control training featured feature-flag feature-toggle

Last synced: 7 months ago
JSON representation

Acceptance Criteria

Revelant topics? false
External users? true
Open source license? true
Active? true
Fork? false

Repository metadata

CY-Bench (Crop Yield Benchmark) is a comprehensive dataset and benchmark to forecast crop yields at subnational level. CY-Bench standardizes selection, processing and spatio-temporal harmonization of public subnational yield statistics with relevant predictors. Contributors include agronomers, climate scientists and machine learning researchers.

Host: GitHub
URL: https://github.com/bigdatawur/agml-cy-bench
Owner: BigDataWUR
License: other
Created: 2023-11-23T13:12:11.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-12-04T22:40:44.000Z (7 months ago)
Last Synced: 2024-12-05T07:34:20.974Z (7 months ago)
Language: Jupyter Notebook
Homepage: https://cybench.agml.org/
Size: 33 MB
Stars: 19
Watchers: 5
Forks: 9
Open Issues: 35
Releases: 0
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE

README.md

AgML - Machine Learning for Agricultural Modeling

AgML is the AgMIP transdisciplinary community of agricultural and machine learning modelers.

AgML aspires to

identify key research gaps and opportunities at the intersection of agricultural modelling and machine learning
research,
support enhanced collaboration and engagement between experts in these disciplines, and
conduct and publish protocol-based studies to establish best practices for robust machine learning use in agricultural
modelling.

AgML Crop Yield Forecasting

The objective of AgML Crop Yield Forecasting task is to create a benchmark to compare models for crop yield forecasting
across countries and crops. The models and forecasts can be used for food security planning or famine early warning. The
benchmark is called CY-Bench (crop yield benchmark).

Overview
Getting started
Dataset
Leaderboard
How to cite
How to contribute

Overview

Early in-season predictions of crop yields can inform decisions at multiple levels of the food value chain from
late-season agricultural management such as fertilization, harvest, and storage to import or export of produce.
Anticipating crop yields is also important to ensure market transparency at the global level (
e.g. Agriculture Market Information System, GEOGLAM Crop Monitor)
and to plan response actions in food insecure countries at risk of food production shortfalls.

We propose CY-Bench, a dataset and benchmark for subnational crop yield forecasting, with coverage of major crop
growing countries and underrepresented countries of the world for maize and wheat. By subnational, we mean the
administrative level where yield statistics are published. When statistics are available for multiple levels, we
pick the highest resolution. By yield, we mean end-of-season yield statistics as published by national statistics
offices or similar entities representing a group of countries. By forecasting, we mean prediction is made ahead of
harvest. The task is also called in-season crop yield forecasting. In-season forecasting is done at a number of
time points during the growing season from start of season (SOS) to end of season (EOS) or harvest. The first
forecast is made at middle-of-season (EOS - SOS)/2. Other options are quarter-of-season (EOS - SOS)/4
and n-day(s) before harvest. The exact time point or time step when forecast is made depends on the crop calendar
for the selected crop and country (or region). All time series inputs are truncated up to the forecast or
inference time point, i.e. data from the remaining part of the season is not used. Since yield statistics may not
be available for the current season, we evaluate models using predictors and yield statistics for all available
years. The models and forecasts can be used for food security planning or famine early warning. We compare models,
algorithms and architectures by keeping other parts of the workflow as similar as possible. For example: the
dataset includes same source for each type of predictor (e.g. weather variables, soil moisture, evapotranspiration,
remote sensing biomass indicators, soil properties), and selected data are preprocessed using the same pipeline
(use the crop mask, crop calendar; use the same boundary files and approach for spatial aggregation) and (for
algorithms that require feature design) and same feature design protocol.

Coverage for maize

Undifferentiated Maize or Grain Maize where differentiated
Maize Coverage Map

Coverage for wheat

Undifferentiated Wheat or Winter Wheat where differentiated
Wheat Coverage Map

Deciphering crop names

The terms used to reference different varieties or seasons of maize/wheat has been simplified in CY-Bench. The following
table describes the representative crop name as provided in the crop statistics

Country/Region	Maize	Wheat
EU-EUROSTAT	grain maize	soft wheat
Africa-FEWSNET	maize	-
Argentina	corn	wheat
Australia	-	winter wheat
Brazil	grain corn	grain wheat
China	grain corn	grain wheat/spring wheat/winter wheat
Germany	grain maize	winter wheat
India	maize	wheat
Mali	maize	-
Mexico	white/yellow corn	-
USA	grain corn	winter wheat

Getting started

cybench is an open source python library to load CY-Bench dataset and run the CY-Bench tasks.

Installation

git clone https://github.com/BigDataWUR/AgML-CY-Bench

Requirements

The benchmark results were produced in the following test environment:

Operating system: Ubuntu 18.04
CPU: Intel Xeon Gold 6448Y (32 Cores)
memory (RAM): 256GB
disk storage: 2TB
GPU: NVIDIA RTX A6000

Benchmark run time

During the benchmark run with the baseline models, several countries were run in parallel, each in a GPU in a
distributed cluster.
The larger countries took approximately 18 hours to complete.
If run sequentially in a single capable GPU, the whole benchmark should take 50-60 hours to complete.

Software requirements: Python 3.9.4, scikit-learn 1.4.2, PyTorch 2.3.0+cu118.

Downloading dataset

Get the dataset
from Zenodo.

Running the benchmark

First write a model class your_model that extends the BaseModel class. The base model class definition is
inside models.model.

from cybench.models.model import BaseModel
from cybench.runs.run_benchmark import run_benchmark

class MyModel(BaseModel): 
    pass


run_name = <run_name>
dataset_name = "maize_US"
run_benchmark(run_name=run_name, 
              model_name="my_model",
              model_constructor=MyModel,
              model_init_kwargs: <int args>,
              model_fit_kwargs: <fit params>,
              dataset_name=dataset_name)

Dataset

Dataset can be loaded by crop and (optionally by country).

For example

dataset = Dataset.load("maize")

will load data for countries covered by the maize dataset. Maize data for the US can be loaded as follows:

dataset = Dataset.load("maize_US")

Data sources

Crop Statistics	Shapefiles or administrative boundaries	Predictors, crop masks, crop calendars
Africa from FEWSNET	Africa from FEWSNET	Weather: AgERA5
Mali (1)	Use Africa shapefiles from FEWSNET	Soil: WISE soil data
Argentina	Argentina	Soil moisture: GLDAS
Australia	Australia	Evapotranspiration: FAO
Brazil	Brazil	FAPAR: JRC FAPAR
China	China	Crop calendars: ESA WorldCereal
EU	EU	NDVI: MOD09CMG
Germany (2)	Use EU shapefiles	Crop Masks: ESA WorldCereal
India	India
Mexico	Mexico
US	US

1: Mali data at admin level 3. Mali data is also included in the FEWSNET Africa dataset, but at admin level 1 only.

2: Germany data is also included in the EU dataset, but there most of the data fails coherence tests (e.g. yield =
production / harvest_area)

Leaderboard

See baseline results

How to cite

Please cite CY-bench as follows:

How to contribute

Thank you for your interest in contributing to AgML Crop Yield Forecasting. Please
check contributing guidelines for how to get involved and contribute.

Additional information

For more information please visit the AgML website.

Owner metadata

Name: BigDataWUR
Login: BigDataWUR
Email:
Kind: organization
Description:
Website:
Location:
Twitter:
Company:
Icon url: https://avatars.githubusercontent.com/u/26596900?v=4
Repositories: 15
Last ynced at: 2024-05-29T13:40:10.062Z
Profile URL: https://github.com/BigDataWUR

GitHub Events

Total

Create event: 5
Issues event: 11
Watch event: 4
Delete event: 4
Issue comment event: 10
Member event: 2
Push event: 39
Pull request event: 12
Pull request review comment event: 4
Pull request review event: 10
Fork event: 2

Last Year

Create event: 5
Issues event: 11
Watch event: 4
Delete event: 4
Issue comment event: 10
Member event: 2
Push event: 39
Pull request event: 12
Pull request review comment event: 4
Pull request review event: 10
Fork event: 2

Committers metadata

Last synced: 7 months ago

Total Commits: 706
Total Committers: 27
Avg Commits per committer: 26.148
Development Distribution Score (DDS): 0.416

Commits in past year: 700
Committers in past year: 27
Avg Commits per committer in past year: 25.926
Development Distribution Score (DDS) in past year: 0.42

Name	Email	Commits
krsnapaudel	d**l@w**l	412
ellaampy	e**y@g**m	38
Michiel Kallenberg		35
ronvree	r**e@g**m	29
Aike Potze	a**e@h**m	26
Michiel Kallenberg	m**g@g**m	24
janet68	4****8	19
Pratishtha Poudel	p**a@g**m	19
hbja	h**6@g**m	16
Maximilian Zachow	m**w@p**m	13
Michiel Kallenberg	4****g	11
Inti Luna Aviles	i**s@g**m	10
Raed Hamed	5****d	9
AbdelrahmanAmr3	a**i@g**m	8
VANT	i****a	8
Carla	r**n@p**e	7
Jonathan Richetti	r**5@p**r	5
smkuhlani	3****i	5
mmeronijrc	m**i@e**u	2
Hilmy	8****a	2
oumniaennaji	7****i	2
gnodnooh	g**h@g**m	1
Abdo A2	6****3	1
Amit Srivastava	6****n	1
Jonathan Richetti	r**5@v**r	1
Gonzalo-Mier	g**z@w**l	1
Amit Srivastava	a**9@g**m	1

Committer domains:

Issue and Pull Request metadata

Last synced: 7 months ago

Total issues: 174
Total pull requests: 207
Average time to close issues: about 1 month
Average time to close pull requests: 10 days
Total issue authors: 9
Total pull request authors: 19
Average comments per issue: 0.81
Average comments per pull request: 1.04
Merged pull request: 162
Bot issues: 0
Bot pull requests: 0

Past year issues: 151
Past year pull requests: 201
Past year average time to close issues: about 1 month
Past year average time to close pull requests: 11 days
Past year issue authors: 9
Past year pull request authors: 19
Past year average comments per issue: 0.81
Past year average comments per pull request: 1.06
Past year merged pull request: 156
Past year bot issues: 0
Past year bot pull requests: 0

More stats: https://issues.ecosyste.ms/repositories/lookup?url=https://github.com/bigdatawur/agml-cy-bench

Top Issue Authors

krsnapaudel (161)
michielkallenberg (5)
mzachow (2)
aikepotze (1)
CarlaLimone (1)
gnodnooh (1)
hbja (1)
JRichetti (1)
meronmi (1)

Top Pull Request Authors

krsnapaudel (105)
michielkallenberg (26)
hbja (9)
ellaampy (9)
Raed-Hamed (9)
intiluna (7)
mzachow (6)
poudelpratishtha (6)
ronvree (5)
oumniaennaji (4)
janet6868 (4)
aikepotze (4)
AbdelrahmanAmr3 (4)
mmeronijrc (2)
umdsgy (2)

Top Issue Labels

model-api (44)
data-preparation (36)
bug (26)
documentation (20)
data-card (16)
enhancement (6)
baseline-models (5)
general (4)
help wanted (1)

Top Pull Request Labels

baseline-models (2)

Dependencies

.github/workflows/documentation.yml actions

actions/checkout v3 composite
actions/setup-python v3 composite
peaceiris/actions-gh-pages v3 composite

.github/workflows/test.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

Score: 7.284820912568604

AgML

Keywords from Contributors

Acceptance Criteria

Repository metadata

README.md

AgML - Machine Learning for Agricultural Modeling

AgML Crop Yield Forecasting

Table of contents

Overview

Coverage for maize

Coverage for wheat

Deciphering crop names

Getting started

Installation

Requirements

Downloading dataset

Running the benchmark

Dataset

Data sources

Leaderboard

How to cite

How to contribute

Additional information

Owner metadata

GitHub Events

Total

Last Year

Committers metadata

Committer domains:

Issue and Pull Request metadata

Top Issue Authors

Top Pull Request Authors

Top Issue Labels

Top Pull Request Labels

Dependencies