S4A

A Sentinel-2 multi-year, multi-country benchmark dataset for crop classification and segmentation with deep learning.
https://github.com/orion-ai-lab/s4a

Category: Consumption
Sub Category: Agriculture and Nutrition

Keywords

crop-classification deep-learning segmentation sentinel-2

Last synced: 34 minutes ago
JSON representation

Repository metadata

Sen4AgriNet: A Sentinel-2 multi-year, multi-country benchmark dataset for crop classification and segmentation with deep learning

Host: GitHub
URL: https://github.com/orion-ai-lab/s4a
Owner: Orion-AI-Lab
License: mit
Created: 2021-12-14T17:28:41.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2024-11-06T10:02:04.000Z (over 1 year ago)
Last Synced: 2026-07-11T16:03:14.423Z (12 days ago)
Topics: crop-classification, deep-learning, segmentation, sentinel-2
Language: Jupyter Notebook
Homepage:
Size: 6.73 MB
Stars: 112
Watchers: 6
Forks: 21
Open Issues: 5
Releases: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Sen4AgriNet

A Sentinel-2 multi-year, multi-country benchmark dataset for crop classification and segmentation with deep learning

Contributors: Sykas D., Zografakis D., Sdraka M.

Supplementary repo with DL experiments using the Sen4AgriNet dataset: Sen4AgriNet-Models.

This repository provides a native PyTorch Dataset Class for Sen4AgriNet dataset (patches_dataset.py). Should work with any new version of PyTorch1.7.1+ and Python3.8.5+.

Dataset heavily relies on cocoapi for dataloading and indexing, therefore make sure you have it installed:

pip3 install pycocotools

Then make sure every other requirement is installed:

pip3 install -r requirements.txt

Instructions

In order to use the provided PyTroch Dataset class, the required netCDF files of Sen4AgriNet must be downloaded and placed inside the dataset/netcdf/ folder. These files are available for download at Dropbox, Google Drive and HuggingFace Hub.

Then, three separate COCO files must be created: one for training, one for validation and one for testing. Alternatively, the predefined COCO files for the 3 Scenarios can be downloaded from here.

After this initial setup, patches_dataset.py can be used in a PyTorch deep learning pipeline to load, prepare and return patches from the dataset according to the split dictated by the COCO files. This Dataset class has the following features:

Reads the netCDF files of the dataset containing the Sentinel-2 observations over time and the corresponding labels.
Isolates the Sentinel-2 bands requested by the user.
Computes the median Sentinel-2 image on a given frequency, e.g. monthly (or loads precomputed medians, if any).
Returns the timeseries of median images inside a predefined window.
Normalizes the images.
Returns hollstein masks for clouds, cirrus, shadow or snow.
Returns a parcel mask: 1 for parcel, 0 for non-parcel.
Can alternatively return binary labels: 1 for crops, 0 for non-crops.

Dataset exploration

This is roughly the way that our patches_dataset.py works. The whole procedure is also described in the provided notebook.

Open a netCDF file for exploration.

import netCDF4
from pathlib import Path

patch = netCDF4.Dataset(Path('data/2020_31TCG_patch_14_14.nc'), 'r')
patch

Outputs

"""
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    title: S4A Patch Dataset
    authors: Papoutsis I., Sykas D., Zografakis D., Sdraka M.
    patch_full_name: 2020_31TCG_patch_14_14
    patch_year: 2020
    patch_name: patch_14_14
    patch_country_code: ES
    patch_tile: 31TCG
    creation_date: 27 Apr 2021
    references: Documentation available at .
    institution: National Observatory of Athens.
    version: 21.03
    _format: NETCDF4
    _nco_version: netCDF Operators version 4.9.1 (Homepage = http://nco.sf.net, Code = http://github.com/nco/nco)
    _xarray_version: 0.17.0
    dimensions(sizes):
    variables(dimensions):
    groups: B01, B02, B03, B04, B05, B06, B07, B08, B09, B10, B11, B12, B8A, labels, parcels
"""

Visualize a single timestamp.

import xarray as xr

band_data = xr.open_dataset(xr.backends.NetCDF4DataStore(patch['B02']))
band_data.B02.isel(time=0).plot()

Single Month

Visualize the labels:

labels = xr.open_dataset(xr.backends.NetCDF4DataStore(patch['labels']))
labels.labels.plot()

Labels

Visualize the parcels:

parcels = xr.open_dataset(xr.backends.NetCDF4DataStore(patch['parcels']))
parcels.parcels.plot()

Parcels

Plot the median of observations for each month:

import pandas as pd
# Or maybe aggregate based on a given frequency
# Refer to
# https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases
group_freq = '1MS'

# Grab year from netcdf4's global attribute
year = patch.patch_year

# output intervals
date_range = pd.date_range(start=f'{year}-01-01', end=f'{int(year) + 1}-01-01', freq=group_freq)

# Aggregate based on given frequency
band_data = band_data.groupby_bins(
    'time',
    bins=date_range,
    right=True,
    include_lowest=False,
    labels=date_range[:-1]
).median(dim='time')

If you plot right now, you might notice that some months are empty:
Single Month

(Optional) Fill in empty months:

import matplotlib.pyplot as plt

band_data = band_data.interpolate_na(dim='time_bins', method='linear', fill_value='extrapolate')

fig, axes = plt.subplots(nrows=3, ncols=4, figsize=(18, 12))

for i, season in enumerate(band_data.B02):

    ax = axes.flat[i]
    cax = band_data.B02.isel(time_bins=i).plot(ax=ax)


for i, ax in enumerate(axes.flat):
    ax.axes.get_xaxis().set_ticklabels([])
    ax.axes.get_yaxis().set_ticklabels([])
    ax.axes.axis('tight')
    ax.set_xlabel('')
    ax.set_ylabel('')
    ax.set_title(f'Month: {i+1}')


plt.tight_layout()
plt.show()

Per Month

PatchesDataset usage example

Please refer to the provided notebook for a detailed usage example of the provided PatchesDataset.

Read the COCO file to be used.

from pathlib import Path
from pycocotools.coco import COCO
root_path_coco = Path('coco_files/')
coco_train = COCO(root_path_coco / 'coco_example.json')

Initialize the PatchesDataset.

from torch.utils.data import DataLoader
from patches_dataset import PatchesDataset
from utils.config import LINEAR_ENCODER
root_path_netcdf = Path('dataset/netcdf')  # Path to the netCDF files
dataset_train = PatchesDataset(root_path_netcdf=root_path_netcdf,
                               coco=coco_train,
                               group_freq='1MS',
                               prefix='test_patchesdataset',
                               bands=['B02', 'B03', 'B04'],
                               linear_encoder=LINEAR_ENCODER,
                               saved_medians=False,
                               window_len=6,
                               requires_norm=False,
                               return_masks=False,
                               clouds=False,
                               cirrus=False,
                               shadow=False,
                               snow=False,
                               output_size=(183, 183)
                              )

Initialize the Dataloader.

dataloader_train = DataLoader(dataset_train,
                              batch_size=1,
                              shuffle=True,
                              num_workers=4,
                              pin_memory=True
                             )

Get a batch.

batch = next(iter(dataloader_train))

The batch variable is a dictionary containing the keys: medians, labels, idx.
batch['medians'] contains a pytorch tensor of size [1, 6, 3, 183, 183] where:

batch size: 1
timestamps: 6
bands: 3
height: 183
width: 183

Batch Medians

batch['labels'] contains the corresponding labels of the medians, which is a pytorch tensor of size [1, 183, 183] where:

batch size: 1
height: 183
width: 183

Batch Labels

batch['idx'] contains the index of the returned timeseries.

Webpage

Dataset Webpage: https://www.sen4agrinet.space.noa.gr/

Experiments

Please visit Sen4AgriNet-Models for a complete experimentation pipeline using the Sen4AgriNet dataset.

Citation

To cite please use:

@ARTICLE{
  9749916,
  author={Sykas, Dimitrios and Sdraka, Maria and Zografakis, Dimitrios and Papoutsis, Ioannis},
  journal={IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing},
  title={A Sentinel-2 multi-year, multi-country benchmark dataset for crop classification and segmentation with deep learning},
  year={2022},
  doi={10.1109/JSTARS.2022.3164771}
}

Owner metadata

Name: Orion Lab
Login: Orion-AI-Lab
Email: ipapoutsis@noa.gr
Kind: organization
Description: Orion Lab research group: Deep Learning in Earth Observation at the National Observatory of Athens
Website:
Location: Greece
Twitter:
Company:
Icon url: https://avatars.githubusercontent.com/u/94176283?v=4
Repositories: 5
Last ynced at: 2023-03-10T08:16:05.630Z
Profile URL: https://github.com/Orion-AI-Lab

GitHub Events

Total

Fork event: 3
Issues event: 4
Watch event: 14
Issue comment event: 1
Push event: 1

Last Year

Fork event: 1
Watch event: 3

Committers metadata

Last synced: 2 days ago

Total Commits: 18
Total Committers: 2
Avg Commits per committer: 9.0
Development Distribution Score (DDS): 0.111

Commits in past year: 0
Committers in past year: 0
Avg Commits per committer in past year: 0.0
Development Distribution Score (DDS) in past year: 0.0

Name	Email	Commits
masdra	p**s@g**m	16
dimzog	z**3@g**m	2

Issue and Pull Request metadata

Last synced: 4 months ago

Total issues: 9
Total pull requests: 1
Average time to close issues: about 7 hours
Average time to close pull requests: 15 days
Total issue authors: 7
Total pull request authors: 1
Average comments per issue: 1.78
Average comments per pull request: 0.0
Merged pull request: 1
Bot issues: 0
Bot pull requests: 0

Past year issues: 5
Past year pull requests: 0
Past year average time to close issues: about 2 hours
Past year average time to close pull requests: N/A
Past year issue authors: 3
Past year pull request authors: 0
Past year average comments per issue: 0.4
Past year average comments per pull request: 0
Past year merged pull request: 0
Past year bot issues: 0
Past year bot pull requests: 0

More stats: https://issues.ecosyste.ms/repositories/lookup?url=https://github.com/orion-ai-lab/s4a

Top Issue Authors

StorywithLove (2)
VSainteuf (2)
Spiruel (1)
PeterKKan (1)
Multihuntr (1)
nilsleh (1)
saqibzia-dev (1)

Top Pull Request Authors

paren8esis (1)

Top Issue Labels

Top Pull Request Labels

Score: 5.455321115357702

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Sustainable Technology

S4A

Keywords

Repository metadata

README.md

Sen4AgriNet

A Sentinel-2 multi-year, multi-country benchmark dataset for crop classification and segmentation with deep learning

Instructions

Dataset exploration

PatchesDataset usage example

Webpage

Experiments

Citation

Owner metadata

GitHub Events

Total

Last Year

Committers metadata

Issue and Pull Request metadata

Top Issue Authors

Top Pull Request Authors

Top Issue Labels

Top Pull Request Labels