A curated list of open technology projects to sustain a stable climate, energy supply, biodiversity and natural resources.

AtmoRep

A stochastic model of atmosphere dynamics using large scale representation learning.
https://github.com/clessig/atmorep

Category: Atmosphere
Sub Category: Atmospheric Composition and Dynamics

Last synced: about 19 hours ago
JSON representation

Repository metadata

AtmoRep model code

README.md

IMPORTANT NOTE:

Please note that that the folder is not maintained anymore since March 1st.
Please use the WeatherGenerator code instead: https://github.com/ecmwf/WeatherGenerator

AtmoRep

This repository contains the source code for the AtmoRep models for large scale representation learning of atmospheric dynamics as well as links to the pre-trained models and the required model input data.

The pre-print for the work is available on ArXiv: https://arxiv.org/abs/2308.13280.

@misc{Lessig2023atmorep,
  title = {AtmoRep: A stochastic model of atmosphere dynamics using large scale representation learning},
	author = {Christian Lessig and Ilaria Luise and Bing Gong and Michael Langguth and Scarlet Stadler and Martin Schultz},
  eprint = {2308.13280},
  primaryclass = {physics.ao-ph},
  url = {https://arxiv.org/abs/2308.13280},
  year = {2023},

Starter README

1. Pull code

%> wget [email protected]:clessig/atmorep.git

This creates a directory atmorep with the code that contains the source code including the python scripts for model training and evaluation.

After following the steps described below, the final directory structure will look as follows:

└── atmorep/
    ├── atmorep/
    │     └── ... 
    ├── data/                         <- top level data directory
    │    ├── normalisation/           <- directory for data normalisations 
    │    ├── vorticity/
    │    │       ├── ml105/           <- model levels with monthly GRIB files
    │    │       │     ├── era5_vorticity_y2021_m03_ml137.grib   <- grib data file
    │    │       │     ├── ...
    │    │       ├── ml114/
    │    │       ├── ml123/
    │    │       ├── ml137/
    │    │       ├── ml96/
    .    .       .
    │    ├── temperature/
    .    .
    ├── models
    │    ├── id4nvwbetz   <- Directory containing model weights and config
    │    │       ├──  model_id4nvwbetz.json     
    │    │       └──  AtmoRep_id4nvwbetz.mod
    │    ├── id<model_id>
    .    .
    └── results
         ├── id4nvwbetz
         ...

The directories data, models, and results need to be created if they do not exist. All directories might be large and should thus be on a directory with sufficient storage space; in this case they can be soft-linked to the default ones above or they can be set in atmorep/config/config.

2. Download the data

2.1 Download pre-trained models

Models can be downloaded from: https://datapub.fz-juelich.de/atmorep/trained-models.html

An example for downloading the pre-trained models is given here, in this case for the vorticity model.

% atmorep/> mkdir models
% atmorep/> cd models
% atmorep/data/> wget https://datapub.fz-juelich.de/atmorep/models/model_id4nvwbetz.tar.gz
% atmorep/data/> tar xvzf model_id4nvwbetz.tar.gz
% atmorep/data/> ls id4nvwbetz
AtmoRep_id4nvwbetz.mod  model_id4nvwbetz.json

2.2 Download model input data (ERA5)

The input data in the required structure can be downloaded from the Jülich datapub server. Direct link to WebDAV https://datapub.fz-juelich.de/atmorep/data/. Alternatively, it can be directly downloaded from MARS using the following script.

Download a subset of files

All data files (fields and normalizations) should be downloaded into the data directory. Un-taring the files will generate the correct folder structure. For example (we will use the vorticity example also below to run the first model so it is recommended to download it as a first step):

% atmorep/> mkdir data
% atmorep/> cd data
% atmorep/data/> wget https://datapub.fz-juelich.de/atmorep/data/vorticity/ml137/era5_vorticity_y2021_ml137.tar
% atmorep/data/> tar xvf era5_vorticity_y2021_ml137.tar
% atmorep/data/> ls -lah vorticity/ml137/
total 18G
era5_vorticity_y2021_m01_ml137.grib
era5_vorticity_y2021_m02_ml137.grib
...
era5_vorticity_y2021_m12_ml137.grib

For efficiency reasons, AtmoRep takes monthly ERA5 data as input. Therefore, each tar file contains 12 GRIB files of about 1.5 GBytes each.

Coefficients for data normalization per field and level can be downloaded here: https://datapub.fz-juelich.de/atmorep/data/normalization/. They should also be located in the data directory:

% atmorep/data/> wget https://datapub.fz-juelich.de/atmorep/data/normalization/normalization_vorticity_ml137.tar.gz
% atmorep/data/> tar xvzf normalization_vorticity_ml137.tar.gz

3. Install python packages

Create a python environment, e.g.

% atmorep/> python3 -m venv pyenv

and activate the environment:

% atmorep/> source pyenv/bin/activate

conda is also possible, no environment is strictly required although we would recommend it. Please make sure to use a recent python version (we tested with python3.10).
Then install the AtmoRep package:

% atmorep/>
% atmorep/> pip install -e .

torch is currently not included (since it is often available or has particular dependencies, e.g. a specific Cuda version). In the simplest case, it can just be installed by:

% atmorep/> pip install torch

We require torch 2.x. (A container solution allows to run even on systems where torch 2.x is not available.)

4. Run model:

Pre-trained models can normally be run by:

% atmorep/> python atmorep/core/evaluate.py

You can easily adapt the configuration by selecting the corresponding model_id in evaluate.py (see below). It defaults to the single-field configuration of vorticity, of which we have downloaded the data above.

Depending on your compute hardware, you might also have to run the computations by submitting the job using a batch system or allocate a compute node in interactive mode (if an interactive seesion is possible, then this is recommended). If you run an interactive session you will likely need to use the following:

%  atmorep/> export CUDA_VISIBLE_DEVICES=0,1,2,3
%  atmorep/> MASTER_ADDR="$(scontrol show hostnames "$SLURM_JOB_NODELIST" | head -n 1)"

The default evaluation mode is currently global forecast. The output will be (similar to) this:

devices : ['cuda:0', 'cuda:1', 'cuda:2', 'cuda:3']
Wandb run: atmorep-ztvyw7k6-8932958
Running Evaluate.evaluate with mode = global_forecast
Loaded AtmoRep id=4nvwbetz, ignoring/missing 2 elements.
Loaded model id = 4nvwbetz at epoch = -2.
Number of batches per global forecast: 14
INFO:: data stats vorticity : 5.374998363549821e-05 / 0.9978392720222473
num_accs_per_task : 1
with_hvd : True
hvd_rank : 0

...

wandb_id : ztvyw7k6
dates : [[2021, 2, 10, 12]]
token_overlap : [0, 0]
forecast_num_tokens : 1
validation loss for strategy=forecast at epoch 0 : 0.12402566522359848
validation loss for vorticity : 0.12402566522359848
wandb: Waiting for W&B process to finish... (success).
wandb: 
wandb: Run history:
wandb:        val. loss forecast ▁
wandb: val., forecast, vorticity ▁
wandb: 
wandb: Run summary:
wandb:        val. loss forecast 0.12403
wandb: val., forecast, vorticity 0.12403
wandb: 
wandb: You can sync this run to the cloud by running:
wandb: wandb sync /p/project/atmo-rep/lessig/atmorep/atmorep/lessig-cleanup/atmorep/wandb/offline-run-20231124_095428-ztvyw7k6

For the vorticity example above, we evaluate with global_forecast for a specific date and using only a single model level:

mode, options = 'global_forecast', { 'fields[0][2]' : [137],
                                     'dates' : [ [2021, 2, 10, 12] ],
                                     'token_overlap' : [0, 0],
                                     'forecast_num_tokens' : 1, 
                                     'attention' : False}

We perform a 3 hour forecast, since 1 token is 3 hours wide. Another mode is the BERT masked token model mode used for pre-training:

mode, options = 'BERT', {'years_test' : [2021], 'fields[0][2]' : [123, 137]}

Again, we chose some custom options by using two levels instead of the five ones that are default and were used during pre-training and by using 2021 as the test year (since we downloaded the data).

The generated model output (stored in ./results/id{wandbid}) for the global_forecast example can be post-processed into a spatial map with the following code. The run_id at the top needs to be replaced by the wandb_id of your run, it can be read off from the console output. Results will be stored as example_0000{0,1,2}.png. The code is also an as-simple-as-possible example with many parameters hard-coded, see our analysis code for a proper handling.

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Atmorep
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Christian
    family-names: Lessig
    email: [email protected]
    affiliation: European Centre for Medium-Range Weather Forecasts (ECMWF)
  - given-names: Ilaria
    family-names: Luise
    email: [email protected]
    affiliation: European Organization for Nuclear Research (CERN)
   - given-names: Martin
    family-names: Schultz
    email: [email protected]
    orcid: 'https://orcid.org/0000-0003-3455-774X'
    affiliation: Forschungszentrum Jülich (FZJ)
  - given-names: Michael
    family-names: Langguth
    email: [email protected]
    orcid: 'https://orcid.org/0000-0003-3354-5333'
    affiliation: Forschungszentrum Jülich (FZJ)
identifiers:
  - type: url
    value: 'https://arxiv.org/abs/2308.13280'
    description: corresponding Preprint
repository-code: 'https://isggit.cs.uni-magdeburg.de/atmorep/atmorep'
url: 'https://www.atmorep.org'
abstract: >-
  AtmoRep is a novel, task-independent stochastic computer 
  model of atmospheric dynamics that can provide skillful 
  results for a wide range of applications. AtmoRep uses 
  large-scale representation learning from artificial 
  intelligence to determine a general description of the 
  highly complex, stochastic dynamics of the atmosphere 
  from the best available estimate of the system's historical 
  trajectory as constrained by observations. This is enabled 
  by a novel self-supervised learning objective and a unique 
  ensemble that samples from the stochastic model with a 
  variability informed by the one in the historical record. 
  Our work establishes that large-scale neural networks can
  provide skillful, task-independent models of atmospheric
  dynamics. With this, they provide a novel means to make
  the large record of atmospheric observations accessible
  for applications and for scientific inquiry, complementing
  existing simulations based on first principles.
license: MIT
commit: b0da5b32ec70295914bbb486dbcb77885671dc45
version: 2.0 (preprint)
date-released: '2023-11-28'

Owner metadata


GitHub Events

Total
Last Year

Committers metadata

Last synced: 8 days ago

Total Commits: 96
Total Committers: 4
Avg Commits per committer: 24.0
Development Distribution Score (DDS): 0.646

Commits in past year: 50
Committers in past year: 4
Avg Commits per committer in past year: 12.5
Development Distribution Score (DDS) in past year: 0.42

Name Email Commits
Ilaria Luise i****e@c****h 34
Christian Lessig c****g@o****e 23
Christian Lessig c****g@g****m 20
iluise l****a@g****m 19

Committer domains:


Issue and Pull Request metadata

Last synced: 2 days ago

Total issues: 90
Total pull requests: 49
Average time to close issues: 3 months
Average time to close pull requests: 16 days
Total issue authors: 14
Total pull request authors: 8
Average comments per issue: 3.44
Average comments per pull request: 0.92
Merged pull request: 35
Bot issues: 0
Bot pull requests: 0

Past year issues: 74
Past year pull requests: 39
Past year average time to close issues: about 1 month
Past year average time to close pull requests: 8 days
Past year issue authors: 13
Past year pull request authors: 8
Past year average comments per issue: 3.88
Past year average comments per pull request: 1.15
Past year merged pull request: 25
Past year bot issues: 0
Past year bot pull requests: 0

More stats: https://issues.ecosyste.ms/repositories/lookup?url=https://github.com/clessig/atmorep

Top Issue Authors

  • iluise (21)
  • kacpnowak (11)
  • sbAsma (10)
  • clessig (9)
  • mlangguth89 (9)
  • grassesi (7)
  • nish03 (7)
  • ankitpatnala (6)
  • sascholle (3)
  • maruf-anu (2)
  • dancivitarese (2)
  • jpolz (1)
  • Sindhu-Vasireddy (1)
  • javak87 (1)

Top Pull Request Authors

  • iluise (22)
  • clessig (9)
  • grassesi (6)
  • sbAsma (5)
  • kacpnowak (3)
  • mlangguth89 (2)
  • jpolz (1)
  • zalbanob (1)

Top Issue Labels

  • bug (16)
  • enhancement (16)
  • core model (10)
  • I/O (8)
  • good first issue (8)
  • scientific (4)
  • performance (2)
  • analysis (2)
  • question (2)
  • help wanted (2)
  • triaged (1)

Top Pull Request Labels

  • bug (6)
  • core model (4)
  • enhancement (3)
  • I/O (2)

Dependencies

setup.py pypi
  • cfgrib *
  • cloudpickle *
  • ecmwflibs *
  • matplotlib *
  • netcdf4 *
  • numpy *
  • pandas *
  • pathlib *
  • pytz *
  • torchinfo *
  • typing_extensions *
  • wandb *
  • xarray *
  • zarr *

Score: 5.908082938168931