Air Quality Forecast

This end-to-end machine learning collaboration project is used to predict ozone and nitrogen dioxide levels in Utrecht as a form of timeseries regression.
https://github.com/atodorov284/air-quality-forecast

Category: Natural Resources
Sub Category: Air Quality

Last synced: about 2 hours ago
JSON representation

Repository metadata

This end-to-end machine learning collaboration project is used to predict ozone and nitrogen dioxide levels in Utrecht as a form of timeseries regression. It was developed in the course Machine Learning for Industry at the University of Groningen.

README.md


title: Air Quality Forecasting
emoji: 📈
colorFrom: yellow
colorTo: gray
sdk: streamlit
sdk_version: 1.39.0
app_file: streamlit_src/app.py
pinned: false

Air Quality Forecast

Air pollution is a significant environmental concern, especially in urban areas, where the high levels of nitrogen dioxide and ozone can have a negative impact on human health, the ecosystem and on the overall quality of life. Given these risks, monitoring and forecasting the level of air pollution is an important task in order to allow for timely actions to reduce the harmful effects.

In the Netherlands, cities like Utrecht experience challenges concerning air quality due to urbanization, transportation, and industrial activities. Developing a system that can provide accurate and robust real-time air quality monitoring and reliable forecasts for future pollution levels would allow authorities and residents to take preventive measures and adjust their future activities based on expected air quality. This project focuses on the time-series forecasting of air pollution levels, specifically NO2 and O3 concentrations, for the next three days. This task can be framed as a regression problem, where the goal is to predict continuous values based on historical environmental data. Moreover, it provides infrastructure for real-time prediction, based on recent measurements.


Streamlit Application

Explore the interactive air quality forecast for Utrecht through our Streamlit app on Hugging Face Spaces:

Air Quality Forecasting App

🚀 How to Run the App

To launch the Utrecht Air Quality Monitoring application on a localhost, follow these simple steps:

  1. Navigate to the streamlit_src folder in your terminal where the app files are located.

  2. Run the Streamlit application by entering the following command:

    streamlit run app.py
    

[!TIP]
Alternative Path: If you are not in the streamlit_src folder, provide the full path to app.py. For example, from the root directory:

  • Windows:
    streamlit run .\streamlit_src\app.py
    
  • macOS/Linux:
    streamlit run ./streamlit_src/app.py
    

🚀 How to Run the Scripts

Setting Up

Clone the Repository: Start by cloning the repository to your local machine.

git clone https://github.com/atodorov284/air-quality-forecast.git
cd air-quality-forecast

Set Up Environment:
Make sure all dependencies are installed by running the following requirements.txt file from the repository root:

pip install -r requirements.txt

Running Source Code

First, navigate to the air-quality-forecast folder, which contains the source code for the project:

cd air_quality_forecast

📊 View the MLFlow Dashboard:
To track experiments, run model_development.py, which will start an MLFlow server on localhost at port 5000.

python model_development.py

[!TIP]
If the server does not start automatically, manually run the MLFlow UI using:

mlflow ui --port 5000

You might need to grant admin permissions for this process

🔄 Using the parser to retrain the model or make predictions on new data:
Instructions on how to use the retraining protocol or making predictions on new data can be found in the README.md in the air-quality-forecast directory

[!NOTE]
The retrain datasets need to be under data/retrain and the prediction dataset needs to be under data/inference.


[!IMPORTANT]
The notebooks in this project were used as scratch for analysis and data merge and do not reflect our thorough methodology (source is under air-quality-forecast). Some extra scripts for the generation of our plots in the report can be found under extra_scripts.


📖 Viewing the Documentation

The project documentation is generated using Sphinx and can be viewed as HTML files. To access the documentation:

  1. Navigate to the _build/html/ directory inside the docs folder:
cd docs\_build\html\
  1. Open the index.html file in your web browser. You can do this by double-clicking the file in your file explorer or using the following command:
open index.html  # macOS
xdg-open index.html  # Linux
start index.html  # Windows
  1. Alternatively you can navigate to the index.html file through the file explorer and double click it to run it

📂 Project Folder Structure

├── LICENSE               <- Open-source MIT license
├── Makefile              <- Makefile with convenience commands like `make data` or `make train`
├── README.md             <- The top-level README for developers using this project.
├── data                  <- Folder containing data used for training, testing, and inference
│   ├── inference         <- Data for inference predictions
│   ├── model_predictions <- Folder containing model-generated predictions
│   ├── other             <- Additional data or miscellaneous files
│   ├── processed         <- The final, canonical data sets for modeling. Contains the train-test split.
│   └── raw               <- The original, immutable data dump.
│
├── .github               <- Contains automated workflows for reproducibility, flake8 checks, and scheduled updates.
│
├── docs                  <- Contains files to make the HTML documentation for this project using Sphinx
│
├── mlruns                <- Contains all the experiments ran using mlflow.
│
├── mlartifacts           <- Contains the artifacts generated by mlflow experiments.
│
├── notebooks             <- Scratch Jupyter notebooks (not to be evaluated, source code is in air-quality-forecast)
│
├── pyproject.toml        <- Project configuration file with package metadata for
│                            air-quality-forecast and configuration for tools like black
│
├── reports               <- Generated analysis as HTML, PDF, LaTeX, etc.
│
├── requirements.txt      <- The requirements file for reproducing the analysis environment, e.g.
│                            generated with `pip freeze > requirements.txt`
│
├── setup.cfg             <- Configuration file for flake8
│
├── configs               <- Configuration folder for the hyperparameter search space (for now)
│
├── saved_models          <- Folder with the saved models in `.pkl` and `.xgb`.
│
├── extra_scripts         <- Some extra scripts in R and .tex to generate figures
│
├── streamlit_src         <- Streamlit application source code
│   ├── controllers       <- Handles application logic and data flow for different app sections
│   ├── json_interactions <- Manages JSON data interactions for configuration and storage
│   ├── models            <- Contains model loading, preprocessing, and prediction logic
│   └── views             <- Manages the UI components for different app sections
│
└── air_quality_forecast  <- Source code used in this project.
    │
    ├── api_caller.py             <- Manages API requests to retrieve air quality and meteorological data
    ├── data_pipeline.py          <- Loads, extracts, and preprocesses the data. Final result is the train-test under data/processed
    ├── get_prediction_data.py    <- Prepares input data required for generating forecasts
    ├── main.py                   <- Main entry point for executing the forecasting pipeline
    ├── model_development.py      <- Trains the models using k-fold CV and Bayesian hyperparameter tuning
    ├── parser_ui.py              <- Manages configuration settings and command-line arguments
    ├── prediction.py             <- Generates forecasts using the trained model
    └── utils.py                  <- Utility functions for common tasks across scripts


Owner metadata


GitHub Events

Total
Last Year

Committers metadata

Last synced: about 19 hours ago

Total Commits: 642
Total Committers: 3
Avg Commits per committer: 214.0
Development Distribution Score (DDS): 0.125

Commits in past year: 642
Committers in past year: 3
Avg Commits per committer in past year: 214.0
Development Distribution Score (DDS) in past year: 0.125

Name Email Commits
WorkflowBotAlex a****4@g****m 562
03chrisk c****r@g****m 57
LukaszSawala l****3@g****m 23

Committer domains:


Issue and Pull Request metadata

Last synced: 1 day ago

Total issues: 0
Total pull requests: 8
Average time to close issues: N/A
Average time to close pull requests: about 2 hours
Total issue authors: 0
Total pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull request: 7
Bot issues: 0
Bot pull requests: 0

Past year issues: 0
Past year pull requests: 8
Past year average time to close issues: N/A
Past year average time to close pull requests: about 2 hours
Past year issue authors: 0
Past year pull request authors: 1
Past year average comments per issue: 0
Past year average comments per pull request: 0.0
Past year merged pull request: 7
Past year bot issues: 0
Past year bot pull requests: 0

More stats: https://issues.ecosyste.ms/repositories/lookup?url=https://github.com/atodorov284/air-quality-forecast

Top Issue Authors

Top Pull Request Authors

  • atodorov284 (8)

Top Issue Labels

Top Pull Request Labels


Dependencies

.github/workflows/running.yml actions
  • actions/checkout v2 composite
.github/workflows/style.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v1 composite
  • suo/flake8-github-action releases/v1 composite
pyproject.toml pypi
requirements.txt pypi
  • numpy *
Dockerfile docker
  • python 3-slim build
mlartifacts/146369756112246655/6f1b848cc06f4bb0bc7362cc7115f6a5/artifacts/model/requirements.txt pypi
  • cloudpickle ==3.0.0
  • mlflow ==2.16.2
  • numpy ==1.26.2
  • pandas ==2.2.2
  • psutil ==5.9.0
  • scikit-learn ==1.5.2
  • scipy ==1.14.1
mlartifacts/146369756112246655/75504635158f410eb328c9b66cafcbd2/artifacts/model/requirements.txt pypi
  • cloudpickle ==3.0.0
  • mlflow ==2.16.2
  • numpy ==1.26.2
  • pandas ==2.2.2
  • psutil ==5.9.0
  • scikit-learn ==1.5.2
  • scikit-optimize ==0.10.2
  • scipy ==1.14.1
mlartifacts/146369756112246655/b33081c79d9d476da56015976d4ebd46/artifacts/model/requirements.txt pypi
  • cloudpickle ==3.0.0
  • mlflow ==2.16.2
  • numpy ==1.26.2
  • pandas ==2.2.2
  • psutil ==5.9.0
  • scikit-learn ==1.5.2
  • scipy ==1.14.1
mlartifacts/146369756112246655/d55b4821b4974c04aeb9bb278c16e511/artifacts/model/requirements.txt pypi
  • cloudpickle ==3.0.0
  • mlflow ==2.16.2
  • numpy ==1.26.2
  • pandas ==2.2.2
  • psutil ==5.9.0
  • scikit-learn ==1.5.2
  • scipy ==1.14.1
mlartifacts/146369756112246655/e7a15b20d5ea4ab9993de2ee987a3d7e/artifacts/model/requirements.txt pypi
  • cloudpickle ==3.0.0
  • mlflow ==2.16.2
  • numpy ==1.26.2
  • pandas ==2.2.2
  • psutil ==5.9.0
  • scikit-learn ==1.5.2
  • scipy ==1.14.1
mlartifacts/146369756112246655/f975ffd80c794779a8445b2bb23687f5/artifacts/model/requirements.txt pypi
  • cloudpickle ==3.0.0
  • mlflow ==2.16.2
  • numpy ==1.26.2
  • pandas ==2.2.2
  • psutil ==5.9.0
  • scikit-learn ==1.5.2
  • scikit-optimize ==0.10.2
  • scipy ==1.14.1
mlartifacts/366942821633260301/0f99ee9734a343f0a27bcd766e3ab1b3/artifacts/model/requirements.txt pypi
  • cloudpickle ==3.0.0
  • mlflow ==2.16.2
  • numpy ==1.26.2
  • pandas ==2.2.2
  • psutil ==5.9.0
  • scikit-learn ==1.5.2
  • scikit-optimize ==0.10.2
  • scipy ==1.14.1
mlartifacts/366942821633260301/49c4d16d69124268880814c139255f48/artifacts/model/requirements.txt pypi
  • mlflow ==2.16.2
  • numpy ==1.26.2
  • pandas ==2.2.2
  • psutil ==5.9.0
  • scikit-learn ==1.5.2
  • scipy ==1.14.1
  • xgboost ==2.1.1
mlartifacts/366942821633260301/58eb8390ad164113b3c869bc59d343b2/artifacts/model/requirements.txt pypi
  • mlflow ==2.16.2
  • numpy ==1.26.2
  • pandas ==2.2.2
  • psutil ==5.9.0
  • scikit-learn ==1.5.2
  • scipy ==1.14.1
  • xgboost ==2.1.1
mlartifacts/366942821633260301/6370e43914f74bd5957c6070d1f219d3/artifacts/model/requirements.txt pypi
  • mlflow ==2.16.2
  • numpy ==1.26.2
  • pandas ==2.2.2
  • psutil ==5.9.0
  • scikit-learn ==1.5.2
  • scipy ==1.14.1
  • xgboost ==2.1.1
mlartifacts/366942821633260301/c4fa83acbb0a4e9195b0e3573c933349/artifacts/model/requirements.txt pypi
  • mlflow ==2.16.2
  • numpy ==1.26.2
  • pandas ==2.2.2
  • psutil ==5.9.0
  • scikit-learn ==1.5.2
  • scipy ==1.14.1
  • xgboost ==2.1.1
mlartifacts/770095492999162530/3b92772b6ee84da484d0a10c329509bf/artifacts/model/requirements.txt pypi
  • cloudpickle ==3.0.0
  • mlflow ==2.16.2
  • numpy ==1.26.2
  • pandas ==2.2.2
  • psutil ==5.9.0
  • scikit-learn ==1.5.2
  • scipy ==1.14.1
mlartifacts/770095492999162530/5894d229c45e4ed5a24cd8b321962103/artifacts/model/requirements.txt pypi
  • cloudpickle ==3.0.0
  • mlflow ==2.16.2
  • numpy ==1.26.2
  • pandas ==2.2.2
  • psutil ==5.9.0
  • scikit-learn ==1.5.2
  • scikit-optimize ==0.10.2
  • scipy ==1.14.1
mlartifacts/770095492999162530/62f57694ba4442debb458bfeb16bab78/artifacts/model/requirements.txt pypi
  • cloudpickle ==3.0.0
  • mlflow ==2.16.2
  • numpy ==1.26.2
  • pandas ==2.2.2
  • psutil ==5.9.0
  • scikit-learn ==1.5.2
  • scipy ==1.14.1
mlartifacts/770095492999162530/732fae4292724380b62671c206423b30/artifacts/model/requirements.txt pypi
  • cloudpickle ==3.0.0
  • mlflow ==2.16.2
  • numpy ==1.26.2
  • pandas ==2.2.2
  • psutil ==5.9.0
  • scikit-learn ==1.5.2
  • scipy ==1.14.1
mlartifacts/770095492999162530/78eeb550e9dc4ea892dfbfc20c640966/artifacts/model/requirements.txt pypi
  • cloudpickle ==3.0.0
  • mlflow ==2.16.2
  • numpy ==1.26.2
  • pandas ==2.2.2
  • psutil ==5.9.0
  • scikit-learn ==1.5.2
  • scikit-optimize ==0.10.2
  • scipy ==1.14.1
mlartifacts/770095492999162530/8bf8aabe6ec64c92ba1c9e97179051d2/artifacts/model/requirements.txt pypi
  • cloudpickle ==3.0.0
  • mlflow ==2.16.2
  • numpy ==1.26.2
  • pandas ==2.2.2
  • psutil ==5.9.0
  • scikit-learn ==1.5.2
  • scikit-optimize ==0.10.2
  • scipy ==1.14.1
mlartifacts/770095492999162530/9b0e2e7215e647eeb517203f99f9d55f/artifacts/model/requirements.txt pypi
  • cloudpickle ==3.0.0
  • mlflow ==2.16.2
  • numpy ==1.26.2
  • pandas ==2.2.2
  • psutil ==5.9.0
  • scikit-learn ==1.5.2
  • scipy ==1.14.1
mlartifacts/770095492999162530/b13e32dae1e04397b158cdde45e2310c/artifacts/model/requirements.txt pypi
  • cloudpickle ==3.0.0
  • mlflow ==2.16.2
  • numpy ==1.26.2
  • pandas ==2.2.2
  • psutil ==5.9.0
  • scikit-learn ==1.5.2
  • scikit-optimize ==0.10.2
  • scipy ==1.14.1
mlartifacts/770095492999162530/bb939ca36f17423bba0d862a4b248ad8/artifacts/model/requirements.txt pypi
  • cloudpickle ==3.0.0
  • mlflow ==2.16.2
  • numpy ==1.26.2
  • pandas ==2.2.2
  • psutil ==5.9.0
  • scikit-learn ==1.5.2
  • scipy ==1.14.1
notebooks/mlartifacts/371509179810119961/442b56b8dbb5497d8b360a75c6933267/artifacts/model/requirements.txt pypi
  • cloudpickle ==3.0.0
  • mlflow ==2.16.2
  • numpy ==1.26.2
  • pandas ==2.2.2
  • psutil ==5.9.0
  • scikit-learn ==1.5.2
  • scikit-optimize ==0.10.2
  • scipy ==1.14.1
notebooks/mlruns/247522349482579112/3cbfdb0b05ee4ebda2b1a998d3371311/artifacts/model/requirements.txt pypi
  • cloudpickle ==3.0.0
  • mlflow ==2.16.2
  • numpy ==1.26.2
  • pandas ==2.2.2
  • psutil ==5.9.0
  • scikit-learn ==1.5.2
  • scipy ==1.14.1
.github/workflows/predict_and_deploy.yml actions
  • actions/checkout v2 composite
  • actions/checkout v3 composite
  • actions/setup-python v2 composite

Score: 2.8903717578961645