wildfire forecasting
The project intends to reproduce the Fire Forecasting capabilities of GEFF using Deep Learning and develop further improvements in accuracy, geography and time scale through inclusion of additional variables or optimization of model architecture and hyperparameters.
https://github.com/ECMWFCode4Earth/wildfire-forecasting
Category: Biosphere
Sub Category: Wildfire
Keywords
deep-learning earth-observation gis remote-sensing wildfire-forecasting
Last synced: about 18 hours ago
JSON representation
Repository metadata
Forecasting wildfire danger using deep learning.
- Host: GitHub
- URL: https://github.com/ECMWFCode4Earth/wildfire-forecasting
- Owner: ECMWFCode4Earth
- License: gpl-3.0
- Created: 2020-05-14T14:42:07.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2022-01-23T21:11:40.000Z (over 3 years ago)
- Last Synced: 2025-04-17T22:07:12.229Z (9 days ago)
- Topics: deep-learning, earth-observation, gis, remote-sensing, wildfire-forecasting
- Language: Jupyter Notebook
- Homepage:
- Size: 901 MB
- Stars: 52
- Watchers: 3
- Forks: 11
- Open Issues: 1
- Releases: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
README.md
Forecasting Wildfire Danger Using Deep Learning
- Introduction
- TL; DR
- Getting Started
- Running Inference
- Implementation overview
- Documentation
- Acknowledgements
Introduction
The Global ECMWF Fire Forecasting (GEFF) system, implemented in Fortran 90, is based on empirical models conceptualised several decades back. Recent GIS & Machine Learning advances could, theoretically, be used to boost these models' performance or completely replace the current forecasting system. However thorough benchmarking is needed to compare GEFF to Deep Learning based prediction techniques.
The project intends to reproduce the Fire Forecasting capabilities of GEFF using Deep Learning and develop further improvements in accuracy, geography and time scale through inclusion of additional variables or optimisation of model architecture & hyperparameters. Finally, a preliminary fire spread prediction tool is proposed to allow monitoring activities.
TL; DR
This codebase (and this README) is a work-in-progress. The master
is a stable release and we aim to address issues and introduce enhancements on a rolling basis. If you encounter a bug, please file an issue. Here are a quick few pointers that just work to get you going with the project:
- Clone & navigate into the repo and create a conda environment using
environment.yml
on Ubuntu 18.04 and 20.04 only. - All EDA and Inference notebooks must be run within this environment. Use
conda activate wildfire-dl
- Check out the EDA notebooks titled
EDA_X_mini_sample.ipynb
. We recommendjupyterlab
. - Check out the Inference notebooks for
1 day, 10 day, 14 day and 21 day predictions
. - The notebooks also include code to download a small sample dataset.
Next:
- See Getting Started for how to set up your local environment for training or inference
- For a detailed description of the project codebase, check out the Code_Structure_Overview
- Read the Running Inference section for testing pre-trained models on sample data.
- See Implementation Overview for details on tools & frameworks and how to retrain the model.
The work-in-progress documentation can be viewed online on wildfire-forecasting.readthedocs.io.
Getting Started
Using Binder
While we have included support for launching the repository in , the limited memory offered by Binder means that you might end up with crashed/dead kernels while trying to test the
Inference
or the Forecast
notebooks. At this point, we don't have a workaround for this issue.
Clone this repo
git clone https://github.com/esowc/wildfire-forecasting.git
cd wildfire-forecasting
Once you have cloned and navigated into the repository, you can set up a development environment using either conda
or docker
. Refer to the relevant instructions below and then skip to the next section on Running Inference
Using conda
To create the environment, run:
conda env create -f environment.yml
conda clean -a
conda activate wildfire-dl
The setup is tested on Ubuntu 18.04, 20.04 and Windows 10 only. On systems with CUDA supported GPU and CUDA drivers set up, the conda environment and the code ensure that GPUs are used by default for training and inference. If there isn't sufficient GPU memory, this will typically lead to Out of Memory Runtime Errors. As a rule of thumb, around 4 GiB GPU memory is needed for inference and around 12 GiB for training.
Using Docker
We include a Dockerfile
& docker-compose.yml
and provide detailed instructions for setting up your development environment using Docker for training on both CPUs and GPUs. Please head over to the Docker README for more details.
Running Inference
-
Examples:
The Inference_2_1.ipynb, Inference_4_10.ipynb, Inference_4_14.ipynb, Inference_7_21.ipynb notebooks demonstrate the end-to-end procedure of loading data, creating model from saved checkpoint, and getting the predictions for 2 day input, 1 day output; and 4 day input, 10 day output, 4 day input, 14 day output and 7 day input, 21 day output experiments respectively. -
Testing data:
Ensure the access to fwi-forcings and fwi-reanalysis data. Limited sample data is available atgs://deepgeff-data-v0
(Released for educational purposes only). -
Pre-trained model:
All previously trained models are listed in pre-trained_models.md with associated metadata. Select and download the desired pre-trained model checkpoint file via gsutil fromgs://deepgeff-models-v0
, set the$CHECKPOINT_FILE
,$FORCINGS_DIR
and$REANALYSIS_DIR
directory paths through the flags while running testing or inference.- Example usage:
python src/test.py -in-days=2 -out-days=1 -forcings-dir=${FORCINGS_DIR} -reanalysis-dir=${REANALYSIS_DIR} -checkpoint-file='path/to/checkpoint'
- Example usage:
Implementation overview
We implement a modified U-Net style Deep Learning architecture using PyTorch 1.6. We use PyTorch Lightning for code organisation and reducing boilerplate. The mammoth size of the total original dataset (~1TB) means we use extensive GPU acceleration in the code using NVIDIA CUDA Toolkit. For a GeForce RTX 2080 with 12GB memory and 40 vCPUs with 110 GB RAM, this translates to a 25x speedup over using only 8 vCPUs with 52GB RAM.
For reading geospatial datasets, we use xarray
and netcdf4
. The imbalanced-learn
library is useful for Undersampling to tackle the high data skew. Code-linting and formatting is done using black
and flake8
.
-
The entry point for training is src/train.py. Input variables used for training the model, by default, as configured in the master branch, are
Temperature
,Precipitation
,Windspeed
andRelative Humidity
. Support for additional variablesLeaf Area Index
,Volumetric Soil Water Level 1
andLand Skin Temperature
and implemented in the respective branches:-
For training with input variables
t2
,tp
,wspeed
andrh
+ additionallylai
, switch to the lai branch. Note: You will additionally require require the data for precisely these 5 variables in the /data dir to perform the training/inference for this combination of inputs. -
For training with input variables
t2
,tp
,wspeed
andrh
+ additionallyswvl1
, switch to the swvl1 branch. Note: You will additionally require require the data for precisely these 5 variables in the /data dir to perform the training/inference for this combination of inputs. -
For training with input variables
t2
,tp
,wspeed
andrh
+ additionallyskt
, switch to the skt branch. Note: You will additionally require require the data for precisely these 5 variables in the /data dir to perform the training/inference for this combination of inputs. -
For training with input variables
t2
,tp
,wspeed
andrh
+ additionallyskt
as well asswvl1
, switch to the skt+swvl1 branch. Note: You will additionally require the data for precisely these 6 variables in the/data
dir to perform the training/inference for this combination of inputs.- Example Usage:
python src/train.py [-in-days 4] [-out-days 1] [-forcings-dir ${FORCINGS_DIR}] [-reanalysis-dir ${REANALYSIS_DIR}]
- Example Usage:
-
-
Dataset: We train our model on 1 year of global data. The
gs://deepgeff-data-v0
dataset demonstrated in the various EDA and Inference notebooks are not intended for use withsrc/train.py
. The scripts will fail if used with those small datasets. If you intend to re-run the training, reach out to us for access to a bigger dataset necessary for the scripts. -
Logging: We use Weights & Biases for logging our training. When running the training script, you can either provide a
wandb API key
or choose to skip logging altogether. W&B logging is free and lets you monitor your training remotely. You can sign up for an account and then usewandb login
from inside the environment to supply the key. -
Visualizing Results: Upon completion of training, the results summary json from
wandb
can be visualized in terms of Accuracy %, MSE % and MAE % using the plotting module.- Example Usage:
python src/plot.py -f <file> -i <in-days> -o <out-days>
- Example Usage:
-
The entry point for inference is src/test.py. Note: When performing inference for a model trained with an additional variable in any of the branches, ensure access to the respective variables in the /data dir.
- Example Usage:
python src/test.py [-in-days 4] [-out-days 1] [-forcings-dir ${FORCINGS_DIR}] [-reanalysis-dir ${REANALYSIS_DIR}] [-checkpoint-file ${CHECKPOINT_FILE}]
- Example Usage:
-
Configuration Details:
Optional arguments (default values indicated below):
` -h, --help show this help message and exit`
-
The src/ directory contains the architecture implementation.
- The src/dataloader/ directory contains the implementation specific to the training data.
- The src/model/ directory contains the model implementation.
- The src/model/base_model.py script has the common implementation used by every model.
- The src/config/ directory stores the config files generated via training.
-
The data/EDA/ directory contains the Exploratory Data Analysis and Preprocessing required for forcings data demonstrated via Jupyter Notebooks.
- Forcings: data/EDA/EDA_forcings_mini_sample.ipynb (Resolution: 0.07 deg x 0.07 deg, 10 days)
- FWI-Reanalysis: data/EDA/EDA_reanalysis_mini_sample.ipynb (Resolution: 0.1 deg x 0.1 deg, 1 day)
- FWI-Forecast: data/EDA/EDA_forecast_mini_sample.ipynb (Resolution: 0.1 deg x 0.1 deg, 10 days)
-
A walk-through of the codebase is in the Code_Structure_Overview.md.
Documentation
We use Sphinx for building our docs and host them on Readthedocs. The latest build of the docs can be accessed online here. In order to build the docs from source, you will need sphinx
and sphinx-autoapi
. Follow the instructions below:
cd docs
make html
Once the docs get built, you can access them inside docs/build/html/
.
Acknowledgements
This project tackles Challenge #26 from Stream 2: Machine Learning and Artificial Intelligence, as part of the ECMWF Summer of Weather Code 2020 Program.
Team: Roshni Biswas, Anurag Saha Roy, Tejasvi S Tomar.
Owner metadata
- Name: ECMWF Code for Earth
- Login: ECMWFCode4Earth
- Email:
- Kind: organization
- Description: ECMWF Code for Earth is a collaborative programme where each summer several developer teams work on innovative earth sciences-related software.
- Website: https://codeforearth.ecmwf.int
- Location: Online
- Twitter: ECMWFCode4Earth
- Company:
- Icon url: https://avatars.githubusercontent.com/u/44897980?v=4
- Repositories: 37
- Last ynced at: 2023-05-10T13:37:08.293Z
- Profile URL: https://github.com/ECMWFCode4Earth
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
Committers metadata
Last synced: 6 days ago
Total Commits: 234
Total Committers: 4
Avg Commits per committer: 58.5
Development Distribution Score (DDS): 0.368
Commits in past year: 0
Committers in past year: 0
Avg Commits per committer in past year: 0.0
Development Distribution Score (DDS) in past year: 0.0
Name | Commits | |
---|---|---|
Roshni Biswas | r****s@d****u | 148 |
lazyoracle | c****t@a****e | 79 |
lazyoracle | a****g@a****m | 6 |
lazyoracle | c****t@a****y@me | 1 |
Committer domains:
Issue and Pull Request metadata
Last synced: 2 days ago
Total issues: 8
Total pull requests: 44
Average time to close issues: 8 days
Average time to close pull requests: 1 day
Total issue authors: 5
Total pull request authors: 3
Average comments per issue: 1.5
Average comments per pull request: 0.18
Merged pull request: 40
Bot issues: 0
Bot pull requests: 0
Past year issues: 0
Past year pull requests: 0
Past year average time to close issues: N/A
Past year average time to close pull requests: N/A
Past year issue authors: 0
Past year pull request authors: 0
Past year average comments per issue: 0
Past year average comments per pull request: 0
Past year merged pull request: 0
Past year bot issues: 0
Past year bot pull requests: 0
Top Issue Authors
- lazyoracle (3)
- roshni-b (2)
- 99snowleopards (1)
- cvitolo (1)
- Gunnika (1)
Top Pull Request Authors
- roshni-b (26)
- lazyoracle (17)
- cvitolo (1)
Top Issue Labels
- bug (2)
- enhancement (1)
Top Pull Request Labels
- enhancement (2)
- documentation (2)
Dependencies
- sphinx ==3.1.1
- sphinx-autoapi ==1.4.0
- gsutil ==4.52
- netcdf4 ==1.5.3
- pytorch-lightning ==0.9.0
- sphinx-autoapi ==1.4.0
- wandb ==0.8.36
Score: 5.356586274672012