OlmoEarth

A collection of real-world Earth-observation model configurations, tutorials and tooling built on top of the open OlmoEarth foundation models to map ecosystems, forest loss, mangroves and more using satellite data.
https://github.com/allenai/olmoearth_projects

Category: Biosphere
Sub Category: Forest Remote Sensing

Last synced: about 18 hours ago
JSON representation

Repository metadata

OlmoEarth projects

README.md

OlmoEarth Projects

Configuration, platform tooling, and tutorials for applying
OlmoEarth to remote sensing
tasks.

What's in this repo

Directory Description
olmoearth_projects/ Python package with platform tooling: olmoearth_run integration, training utilities, and label quality tools.
olmoearth_run_data/ Per-project model configurations (dataset.json, model.yaml, olmoearth_run.yaml).
tutorials/ Self-contained tutorials for getting started with OlmoEarth. Each tutorial has its own dependencies and README.
docs/ Per-model documentation for the fine-tuned models available in this repository.

Available Models

The links above provide more details about the training data and intended use
case for each model.

Tutorials

See the tutorials/ directory for self-contained tutorials hosted in this
repo. Additional tutorials are available in rslearn:

OlmoEarth Ecosystem

These tutorials use all or a subset of the components of OlmoEarth:

  • olmoearth_pretrain, the OlmoEarth
    pre-trained model.
  • rslearn, our tool for obtaining satellite
    images and other geospatial data from online data sources, and for fine-tuning
    remote sensing foundation models.
  • olmoearth_run, our higher-level
    infrastructure that automates various steps on top of rslearn such as window creation
    and inference post-processing.

Installation

We recommend installing using uv. See
Installing uv for
instructions to install uv. Once uv is installed:

git clone https://github.com/allenai/olmoearth_projects.git
cd olmoearth_projects
uv sync
source .venv/bin/activate

Tutorials manage their own dependencies separately; see each tutorial's README
for setup instructions.

Applying Existing Models

There are three steps to applying the models in this repository:

  1. Customize the prediction request geometry, which specifies the spatial and temporal
    extent to run the model on.
  2. Execute the olmoearth_run steps to build an rslearn dataset for inference, and to
    apply the model on the dataset.
  3. Collect and visualize the outputs.

Customizing the Prediction Request Geometry

The configuration files for each project are stored under
olmoearth_run_data/PROJECT_NAME/. There are three configuration files:

  • dataset.json: this is an rslearn dataset configuration file that specifies the
    types of satellite images that need to be downloaded to run the model, and how to
    obtain them. Most models rely on some combination of Sentinel-1 and Sentinel-2
    satellite images, and are configured to download those images from Microsoft
    Planetary Computer.
  • model.yaml: this is an rslearn model configuration file that specifies the model
    architecture, fine-tuning hyperparameters, data loading steps, etc.
  • olmoearth_run.yaml: this is an olmoearth_run configuration file that specifies how
    the prediction request geometry should be translated into rslearn windows, and how
    the inference outputs should be combined together.

Some projects also include an example prediction_request_geometry.geojson, but this
will need to be modified to specify your target region. The spatial extent is specified
with standard GeoJSON features; you can use geojson.io to draw
polygons on a map and get the corresponding GeoJSON. The temporal extent is specified
using properties on each feature:

{
  "type": "FeatureCollection",
  "properties": {},
  "features": [
    {
      "type": "Feature",
      "geometry": {
        // ...
      },
      "properties": {
        "oe_start_time": "2024-01-01T00:00:00+00:00",
        "oe_end_time": "2024-02-01T00:00:00+00:00"
      },
    }
  ]
}

Here, the oe_start_time and oe_end_time indicate that the prediction for the
location of this feature should be based on satellite images around January 2024. The
per-model documentation details how these timestamps should be chosen. Some models like
forest loss driver classification provide project-specific tooling for generating the
prediction request geometry.

Executing olmoearth_run

Consult the per-model documentation to download the associated fine-tuned model
checkpoint. For example:

mkdir ./checkpoints
wget https://huggingface.co/allenai/OlmoEarth-v1-FT-LFMC-Base/resolve/main/model.ckpt -O checkpoints/lfmc.ckpt

Set needed environment variables:

export NUM_WORKERS=32
export WANDB_PROJECT=lfmc
export WANDB_NAME=lfmc_inference_run
export WANDB_ENTITY=YOUR_WANDB_ENTITY

Then, execute olmoearth_run:

mkdir ./project_data
python -m olmoearth_projects.main olmoearth_run olmoearth_run --config_path $PWD/olmoearth_run_data/lfmc/ --checkpoint_path $PWD/checkpoints/lfmc.ckpt --scratch_path project_data/lfmc/

Visualizing Outputs

The results directory (project_data/lfmc/results/results_raster/ in the example)
should be populated with one or more GeoTIFFs. You can visualize this in GIS software
like qgis:

qgis project_data/lfmc/results/results_raster/*.tif

Reproducing Fine-tuning for Existing Models

We have released model checkpoints for each of the fine-tuned models in this
repository, but you can reproduce the model by fine-tuning the pre-trained OlmoEarth
checkpoint on each task training dataset.

First, consult the per-model documentation above for the URL of the rslearn dataset tar
file, and download and extract it. For example, for the LFMC model:

wget https://huggingface.co/datasets/allenai/olmoearth_projects_lfmc/blob/main/dataset.tar
tar xvf dataset.tar

Set environment variables expected by the fine-tuning procedure (uses W&B)

export DATASET_PATH=/path/to/extracted/data/
export NUM_WORKERS=32
export TRAINER_DATA_PATH=./trainer_data
export PREDICTION_OUTPUT_LAYER=output
export WANDB_PROJECT=olmoearth_projects
export WANDB_NAME=my_training_run
export WANDB_ENTITY=...

Then run fine-tuning using the model configuration file in the olmoearth_run_data,
e.g.:

rslearn model fit --config olmoearth_run_data/lfmc/model.yaml

Losses and metrics should then be logged to your W&B. The checkpoint would be saved in
the TRAINER_DATA_PATH (e.g. ./trainer_data); two checkpoints should be saved, the
latest checkpoint (last.ckpt) and the best checkpoint (epoch=....ckpt). You can use
the best checkpoint for the Applying Existing Models section in lieu of the checkpoint
that we proivde.

If training fails halfway, you can resume it from last.ckpt:

rslearn model fit --config olmoearth_run_data/lfmc/model.yaml --ckpt_path $TRAINER_DATA_PATH/last.ckpt

License

This code is licensed under the OlmoEarth Artifact License.


Owner metadata


GitHub Events

Total
Last Year

Committers metadata

Last synced: 3 days ago

Total Commits: 382
Total Committers: 12
Avg Commits per committer: 31.833
Development Distribution Score (DDS): 0.393

Commits in past year: 382
Committers in past year: 12
Avg Commits per committer in past year: 31.833
Development Distribution Score (DDS) in past year: 0.393

Name Email Commits
Gabriel Tseng g****g@m****a 232
Favyen Bastani f****b@a****g 51
Patrick Johnson 8****J 29
hgherzog h****h@a****g 18
Yawen Zhang y****z@a****g 15
Favyen Bastani f****i@p****m 11
root r****t@p****n 11
root r****t@n****n 5
Josh Hansen 5****n 3
root r****t@n****n 3
Caleb Robinson c****6@g****m 2
root r****t@s****n 2

Committer domains:


Issue and Pull Request metadata

Last synced: 17 days ago

Total issues: 3
Total pull requests: 8
Average time to close issues: N/A
Average time to close pull requests: 3 days
Total issue authors: 3
Total pull request authors: 4
Average comments per issue: 1.33
Average comments per pull request: 0.25
Merged pull request: 5
Bot issues: 0
Bot pull requests: 0

Past year issues: 3
Past year pull requests: 8
Past year average time to close issues: N/A
Past year average time to close pull requests: 3 days
Past year issue authors: 3
Past year pull request authors: 4
Past year average comments per issue: 1.33
Past year average comments per pull request: 0.25
Past year merged pull request: 5
Past year bot issues: 0
Past year bot pull requests: 0

More stats: https://issues.ecosyste.ms/repositories/lookup?url=https://github.com/allenai/olmoearth_projects

Top Issue Authors

  • ashjs2003 (1)
  • zhanghaochen-817 (1)
  • robmarkcole (1)

Top Pull Request Authors

  • gabrieltseng (3)
  • calebrob6 (2)
  • favyen2 (2)
  • APatrickJ (1)

Top Issue Labels

Top Pull Request Labels


Dependencies

.github/workflows/build-test.yml actions
  • actions/checkout v4 composite
  • docker/build-push-action v6 composite
  • docker/login-action v3 composite
  • docker/metadata-action v5 composite
  • google-github-actions/auth v2 composite
.github/workflows/lint.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v4 composite
Dockerfile docker
  • pytorch/pytorch 2.7.0-cuda12.8-cudnn9-runtime@sha256 build
pyproject.toml pypi
  • geopandas >=1.1.1
  • olmoearth-pretrain *
  • olmoearth-runner >=0.1.3
  • openpyxl >=3.1.5
  • python-dotenv >=1.0
  • pyyaml >=6
  • rslearn [extra]>=0.0.11
  • wandb >=0.21
uv.lock pypi
  • 323 dependencies

Score: 6.984716320118265