gediDB

A toolbox for processing and providing Global Ecosystem Dynamics Investigation (GEDI) L2A-B and L4A-C data.
https://github.com/simonbesnard1/gedidb

Category: Sustainable Development
Sub Category: Environmental Satellites

Keywords

gedi python remote-sensing tiledb

Keywords from Contributors

optimize archiving transforms measur generic compose observation projection animals conversion

Last synced: about 17 hours ago
JSON representation

Repository metadata

A toolbox for processing and providing Global Ecosystem Dynamics Investigation (GEDI) L2A-B and L4A-C data

README.md

gediDB: A toolbox for Global Ecosystem Dynamics Investigation (GEDI) L2A-B and L4A-C data

Pipelines
Code coverage
Docs
Available on PyPI
PyPI Downloads
DOI
DOI
Code style: black

gediDB is an open-source Python package designed to streamline the processing, analysis, and management of GEDI L2A-B and L4A-C data. This toolbox enables efficient and flexible data querying and management of large GEDI datasets stored with TileDB, a high-performance, multi-dimensional array database.

gediDB integrates key functionalities such as structured data querying, multi-dimensional data processing, and metadata management. With built-in support for parallel engines (e.g. Dask), the toolbox ensures scalability for large datasets, allowing efficient parallel processing on local machines or clusters.

Key Features of gediDB

  • TileDB-Based Storage: GEDI data is stored and managed in TileDB arrays, providing efficient, scalable, multi-dimensional data storage, enabling fast and flexible access to large volumes of data.
  • Flexible Data Querying: Easily query GEDI data across spatial, temporal, and variable dimensions. Access data within bounding boxes, or retrieve the nearest shots to a specific location, using intuitive filtering options for precision.
  • Parallel Processing: Process large GEDI datasets in parallel, enabling concurrent downloading, processing, and TileDB insertion of GEDI products. The number of concurrent processes can be easily controlled based on available system resources.
  • Metadata-Driven: Maintain and manage metadata for each dataset, ensuring that important contextual information like units, descriptions, and source details are stored and accessible.
  • Geospatial Data Management: Integrate seamlessly with tileDB to enable spatial queries, transformations, and geospatial analyses.

Why gediDB?

gediDB simplifies and automates the workflow for GEDI data processing, making it easier to retrieve, filter, and analyze complex datasets in an efficient, scalable manner. Whether you're investigating biomass distribution, monitoring forest dynamics, or conducting large-scale ecological studies, gediDB supports users with tools to handle and analyze large GEDI datasets with ease.

Documentation

Learn more about gediDB in its official documentation at
https://gedidb.readthedocs.io/en/latest/.

Contributing

You can find information about contributing to gediDB on our
Contributing page.

Future development

Planned future developments for gediDB are designed to improve usability and extend the package’s scope for both researchers and operational users:

  • Compatibility with upcoming GEDI product releases: ensures long-term sustainability of the toolbox as new mission data become available, avoiding version lock-in for users building workflows on gediDB.

  • Improved performance and flexibility in querying profile variables: will make it easier for users to analyse canopy structure profiles (e.g., RH metrics) at scale, which are currently among the most data-intensive GEDI products.

  • Expanded documentation and tutorials: will benefit new users by lowering the entry barrier, providing clear end-to-end examples, and connecting scientific use cases to code snippets.

  • Strengthened testing for reliability and maintainability: supports developers and long-term users by ensuring that changes do not break existing workflows, and by increasing trust in the reproducibility of analyses built on gediDB.

Development progress and discussion of these features are tracked openly through the project’s GitHub issues.

History

The development of the gediDB package began during the PhD of Amelia Holcomb, who initially created part of this toolset to analyze and manage GEDI data for her research. Recognizing the potential of her work to benefit the broader scientific community, the Global Land Monitoring team collaborated with Amelia in March 2024 to expand and optimize her code, transforming it into a scalable and versatile Python package named gediDB. This collaboration refined the toolbox to handle large-scale datasets with TileDB, integrate parallel processing, and incorporate a robust querying and metadata management system. Today, gediDB is designed to help researchers in ecological and environmental sciences by making GEDI data processing more efficient and accessible.

About the authors

Simon Besnard, a senior researcher in the Global Land Monitoring Group at GFZ Helmholtz Centre Potsdam, studies terrestrial ecosystems' dynamics and their feedback on environmental conditions. He specializes in developing methods to analyze large EO and climate datasets to understand ecosystem functioning in a changing climate. His current research focuses on forest structure changes over the past decade and their links to the carbon cycle.

Felix Dombrowski is a Bachelor’s student in Computer Science at the University of Potsdam and a research intern in the Global Land Monitoring Group at GFZ Helmholtz Centre Potsdam. At GFZ, his work has focused on developing toolboxes to process Earth Observation data efficiently.

Amelia Holcomb is a PhD candidate in Computer Science at the University of Cambridge, researching remote sensing and machine learning to study carbon sequestration and forest regrowth. Previously, she worked as a site reliability engineer at Google on Bigtable. She holds an MMath from the University of Waterloo and a B.A. in Mathematics from Yale.

Contact

For any questions or inquiries, please contact:

Acknowledgments

The development of gediDB was supported by the European Union through the FORWARDS and NextGenCarbon projects, and by the Helmholtz Association via the Helmholtz Foundation Model Initiative (3D-ABC project). Amelia Holcomb acknowledges funding from the Harding Distinguished Postgraduate Scholarship. We would also like to thank the R2D2 Workshop (March 2024, GFZ Potsdam) for providing the opportunity to meet and discuss GEDI data processing.

License

This project is licensed under the EUROPEAN UNION PUBLIC LICENCE v.1.2 License - see the LICENSE file for details.

Citation (CITATION.cff)

cff-version: "1.2.0"
authors:
- family-names: Besnard
  given-names: Simon
  orcid: "https://orcid.org/0000-0002-1137-103X"
- family-names: Dombrowski
  given-names: Felix
  orcid: "https://orcid.org/0009-0000-9210-3530"
- family-names: Holcomb
  given-names: Amelia
  orcid: "https://orcid.org/0000-0001-5081-7201"
contact:
- family-names: Besnard
  given-names: Simon
  orcid: "https://orcid.org/0000-0002-1137-103X"
doi: 10.5281/zenodo.17191670
message: If you use this software, please cite our article in the
  Journal of Open Source Software.
preferred-citation:
  authors:
  - family-names: Besnard
    given-names: Simon
    orcid: "https://orcid.org/0000-0002-1137-103X"
  - family-names: Dombrowski
    given-names: Felix
    orcid: "https://orcid.org/0009-0000-9210-3530"
  - family-names: Holcomb
    given-names: Amelia
    orcid: "https://orcid.org/0000-0001-5081-7201"
  date-published: 2025-09-26
  doi: 10.21105/joss.08593
  issn: 2475-9066
  issue: 113
  journal: Journal of Open Source Software
  publisher:
    name: Open Journals
  start: 8593
  title: "gediDB: A toolbox for processing and providing Global
    Ecosystem Dynamics Investigation (GEDI) L2A-B and L4A-C data"
  type: article
  url: "https://joss.theoj.org/papers/10.21105/joss.08593"
  volume: 10
title: "gediDB: A toolbox for processing and providing Global Ecosystem
  Dynamics Investigation (GEDI) L2A-B and L4A-C data"


Owner metadata


GitHub Events

Total
Last Year

Committers metadata

Last synced: 7 days ago

Total Commits: 1,444
Total Committers: 6
Avg Commits per committer: 240.667
Development Distribution Score (DDS): 0.186

Commits in past year: 199
Committers in past year: 3
Avg Commits per committer in past year: 66.333
Development Distribution Score (DDS) in past year: 0.131

Name Email Commits
Simon b****d@g****e 1176
felixd f****d@g****e 219
dependabot[bot] 4****] 29
Amelia a****b@g****m 12
Joseph H Kennedy me@j****g 4
Romulo Pereira Goncalves r****o@g****e 4

Committer domains:


Issue and Pull Request metadata

Last synced: 13 days ago

Total issues: 26
Total pull requests: 40
Average time to close issues: 28 days
Average time to close pull requests: about 15 hours
Total issue authors: 6
Total pull request authors: 3
Average comments per issue: 0.23
Average comments per pull request: 0.08
Merged pull request: 29
Bot issues: 0
Bot pull requests: 24

Past year issues: 16
Past year pull requests: 30
Past year average time to close issues: 17 days
Past year average time to close pull requests: about 16 hours
Past year issue authors: 4
Past year pull request authors: 3
Past year average comments per issue: 0.38
Past year average comments per pull request: 0.0
Past year merged pull request: 20
Past year bot issues: 0
Past year bot pull requests: 20

More stats: https://issues.ecosyste.ms/repositories/lookup?url=https://github.com/simonbesnard1/gedidb

Top Issue Authors

  • simonbesnard1 (14)
  • jhkennedy (8)
  • Job2502 (1)
  • yu-feng-ho (1)
  • AlexandraRunge (1)
  • b-yogesh (1)

Top Pull Request Authors

  • dependabot[bot] (24)
  • simonbesnard1 (14)
  • jhkennedy (2)

Top Issue Labels

  • enhancement (9)
  • bug (2)
  • documentation (1)
  • 03 - Maintenance (1)

Top Pull Request Labels

  • 03 - Maintenance (21)

Package metadata

pypi.org: gedidb

A toolbox to download, process, store and visualise Global Ecosystem Dynamics Investigation (GEDI) L2A-B and L4A-C data

  • Homepage:
  • Documentation: https://gedidb.readthedocs.io/
  • Licenses: European Union Public Licence 1.2 (EUPL 1.2)
  • Latest release: 2026.4.30 (published 2 months ago)
  • Last Synced: 2026-06-22T10:13:45.496Z (11 days ago)
  • Versions: 11
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 211 Last month
  • Rankings:
    • Dependent packages count: 9.694%
    • Average: 32.137%
    • Dependent repos count: 54.58%
  • Maintainers (1)

Dependencies

.github/workflows/ci.yaml actions
  • actions/checkout v4 composite
  • actions/setup-python v5 composite
  • actions/upload-artifact v4 composite
  • codecov/codecov-action v5.3.1 composite
pyproject.toml pypi
  • dask *
  • distributed *
  • geopandas *
  • h5py *
  • numpy *
  • pandas *
  • requests *
  • retry *
  • scipy *
  • tiledb *
  • xarray *
.github/workflows/pylint.yaml actions
  • actions/checkout v4 composite
  • actions/setup-python v5 composite
  • xarray-contrib/ci-trigger v1 composite
.github/workflows/pypi-release.yaml actions
  • actions/checkout v4 composite
  • actions/download-artifact v4 composite
  • actions/setup-python v5 composite
  • actions/upload-artifact v4 composite
  • pypa/gh-action-pypi-publish v1.12.4 composite
.github/workflows/generate-paper-pdf.yaml actions
  • actions/checkout v4 composite
  • actions/upload-artifact v4 composite
  • openjournals/openjournals-draft-action master composite
ci/requirements/environment.yml conda
  • boto3
  • bottleneck
  • cartopy
  • cftime
  • dask
  • distributed
  • flox
  • fsspec
  • geopandas
  • h5netcdf
  • h5py
  • hdf5
  • hypothesis
  • lxml
  • netcdf4
  • numba
  • numpy >=2
  • packaging
  • pandas
  • pip
  • pre-commit
  • pyarrow
  • pydap
  • pydap-server
  • pytest
  • pytest-cov
  • pytest-env
  • pytest-timeout
  • pytest-xdist
  • requests
  • retry
  • scipy
  • xarray

Score: 10.038717501796231