A curated list of open technology projects to sustain a stable climate, energy supply, biodiversity and natural resources.

WildlifeDatasets

Pipeline for wildlife re-identification including dataset zoo, training tools and trained models.
https://github.com/WildlifeDatasets/wildlife-datasets

Category: Biosphere
Sub Category: Terrestrial Wildlife

Keywords

dataset datasets deep-learning ecology ecology-modelling machine-learning

Last synced: about 24 hours ago
JSON representation

Repository metadata

WildlifeDatasets: An open-source toolkit for animal re-identification

README.md

Datasets for identification of individual animals Trained model for individual re‑identification Tools for training re‑identification models

Wildlife Re-Identification (Re-ID) Datasets

The aim of the project is to provide comprehensive overview of datasets for wildlife individual re-identification and an easy-to-use package for developers of machine learning methods. The core functionality includes:

  • overview of 42 publicly available wildlife re-identification datasets.
  • utilities to mass download and convert them into a unified format and fix some wrong labels.
  • default splits for several machine learning tasks including the ability create additional splits.

An introductory example is provided in a Jupyter notebook. The package provides a natural synergy with Wildlife tools, which provides our MegaDescriptor model and tools for training neural networks.

Changelog

[31/10/2024] Added AmvrakikosTurtles, ReunionTurtles, SouthernProvinceTurtles, ZakynthosTurtles (sea turtles), ELPephants (elephants) and Chicks4FreeID (chickens).
[13/06/2024] Added WildlifeReID-10k (unification of multiple datasets).
[09/05/2024] Added CatIndividualImages (cats), CowDataset (cows) and DogFaceNet (dogs).
[28/02/2024] Added MPDD (dogs), PolarBearVidID (polar bears) and SeaStarReID2023 (sea stars).
[04/01/2024] Received Best paper award at WACV 2024.

Summary of datasets

An overview of the provided datasets is available in the documentation, while the more numerical summary is located in a Jupyter notebook. Due to its size, it may be necessary to view it via nbviewer.

We include basic characteristics such as publication years, number of images, number of individuals, dataset time span (difference between the last and first image taken) and additional information such as source, number of poses, inclusion of timestamps, whether the animals were captured in the wild and whether the dataset contain multiple species.

Installation

The installation of the package is simple by

pip install wildlife-datasets

Basic functionality

We show an example of downloading, extracting and processing the MacaqueFaces dataset.

from wildlife_datasets import analysis, datasets

datasets.MacaqueFaces.get_data('data/MacaqueFaces')
dataset = datasets.MacaqueFaces('data/MacaqueFaces')

The class dataset contains the summary of the dataset. The content depends on the dataset. Each dataset contains the identity and paths to images. This particular dataset also contains information about the date taken and contrast. Other datasets store information about bounding boxes, segmentation masks, position from which the image was taken, keypoints or various other information such as age or gender.

dataset.df

The dataset also contains basic metadata including information about the number of individuals, time span, licences or published year.

dataset.summary

This particular dataset already contains cropped images of faces. Other datasets may contain uncropped images with bounding boxes or even segmentation masks.

d.plot_grid()

Additional functionality

For additional functionality including mass loading, datasets splitting or evaluation metrics we refer to the documentation or the notebooks.

Citation

If you like our package, please cite our paper. You may be also interested in our SeaTurtleID dataset published in another paper.

@InProceedings{Cermak_2024_WACV,
    author    = {\v{C}erm\'ak, Vojt\v{e}ch and Picek, Luk\'a\v{s} and Adam, Luk\'a\v{s} and Papafitsoros, Kostas},
    title     = {{WildlifeDatasets: An Open-Source Toolkit for Animal Re-Identification}},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    month     = {January},
    year      = {2024},
    pages     = {5953-5963}
}

Owner metadata


GitHub Events

Total
Last Year

Committers metadata

Last synced: 6 days ago

Total Commits: 767
Total Committers: 4
Avg Commits per committer: 191.75
Development Distribution Score (DDS): 0.529

Commits in past year: 231
Committers in past year: 1
Avg Commits per committer in past year: 231.0
Development Distribution Score (DDS) in past year: 0.0

Name Email Commits
sadda l****r@g****m 361
adamluk3 a****3@l****z 337
cermavo3 c****3@l****z 46
Vojtech Cermak c****h@s****z 23

Committer domains:


Issue and Pull Request metadata

Last synced: 2 days ago

Total issues: 5
Total pull requests: 0
Average time to close issues: 4 months
Average time to close pull requests: N/A
Total issue authors: 4
Total pull request authors: 0
Average comments per issue: 2.4
Average comments per pull request: 0
Merged pull request: 0
Bot issues: 0
Bot pull requests: 0

Past year issues: 1
Past year pull requests: 0
Past year average time to close issues: 2 days
Past year average time to close pull requests: N/A
Past year issue authors: 1
Past year pull request authors: 0
Past year average comments per issue: 5.0
Past year average comments per pull request: 0
Past year merged pull request: 0
Past year bot issues: 0
Past year bot pull requests: 0

More stats: https://issues.ecosyste.ms/repositories/lookup?url=https://github.com/WildlifeDatasets/wildlife-datasets

Top Issue Authors

  • mfruhner (2)
  • zhoumu53 (1)
  • VojtechCermak (1)
  • MatthiasZuerl (1)

Top Pull Request Authors


Top Issue Labels

Top Pull Request Labels


Package metadata

pypi.org: wildlife-datasets

Library for easier access and research of wildlife re-identification datasets

  • Homepage: https://github.com/WildlifeDatasets/wildlife-datasets
  • Documentation: https://wildlifedatasets.github.io/wildlife-datasets/
  • Licenses: mit
  • Latest release: 1.0.6 (published 13 days ago)
  • Last Synced: 2025-04-25T14:32:55.190Z (2 days ago)
  • Versions: 52
  • Dependent Packages: 1
  • Dependent Repositories: 0
  • Downloads: 2,298 Last month
  • Rankings:
    • Downloads: 5.006%
    • Dependent packages count: 6.633%
    • Average: 20.189%
    • Stargazers count: 28.203%
    • Forks count: 30.492%
    • Dependent repos count: 30.611%
  • Maintainers (2)

Score: 13.732563637143349