s2spy
A high-level python package integrating expert knowledge and artificial intelligence to boost (sub) seasonal forecasting.
https://github.com/ai4s2s/s2spy
Category: Climate Change
Sub Category: Climate Data Processing and Analysis
Keywords from Contributors
hydrology
Last synced: about 1 hour ago
JSON representation
Repository metadata
A high-level python package integrating expert knowledge and artificial intelligence to boost (sub) seasonal forecasting
- Host: GitHub
- URL: https://github.com/ai4s2s/s2spy
- Owner: AI4S2S
- License: apache-2.0
- Created: 2022-05-12T13:02:53.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2024-10-09T08:04:12.000Z (7 months ago)
- Last Synced: 2025-04-22T12:04:06.274Z (8 days ago)
- Language: Python
- Homepage: https://ai4s2s.readthedocs.io/
- Size: 18.4 MB
- Stars: 21
- Watchers: 3
- Forks: 7
- Open Issues: 16
- Releases: 6
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: docs/CONTRIBUTING.md
- License: LICENSE
- Code of conduct: docs/CODE_OF_CONDUCT.md
- Citation: CITATION.cff
README.md
s2spy: Boost (sub) seasonal forecasting with AI
A high-level python package integrating expert knowledge and artificial intelligence to boost (sub) seasonal forecasting.
Why s2spy?
Producing reliable sub-seasonal to seasonal (S2S) forecasts with machine learning techniques remains a challenge. Currently, these data-driven S2S forecasts generally suffer from a lack of trust because of:
- Intransparent data processing and poorly reproducible scientific outcomes
- Technical pitfalls related to machine learning-based predictability (e.g. overfitting)
- Black-box methods without sufficient explanation
To tackle these challenges, we build s2spy
which is an open-source, high-level python package. It provides an interface between artificial intelligence and expert knowledge, to boost predictability and physical understanding of S2S processes. By implementing optimal data-handling and parallel-computing packages, it can efficiently run across different Big Climate Data platforms. Key components will be explainable AI and causal discovery, which will support the classical scientific interplay between theory, hypothesis-generation and data-driven hypothesis-testing, enabling knowledge-mining from data.
Developing this tool will be a community effort. It helps us achieve trustworthy data-driven forecasts by providing:
- Transparent and reproducible analyses
- Best practices in model verifications
- Understanding the sources of predictability
Installation
To install the latest release of s2spy, do:
python3 -m pip install s2spy
To install the in-development version from the GitHub repository, do:
python3 -m pip install git+https://github.com/AI4S2S/s2spy.git
Configure the package for development and testing
For developing and testing the package, please follow the developer guide, which can be found here.
Getting started
s2spy
provides end-to-end solutions for machine learning (ML) based S2S forecasting.
Datetime operations & Data processing
In a typical ML-based S2S project, the first step is always data processing. Our calendar-based package, lilio
, is used for time operations. For instance, a user is looking for predictors for winter climate at seasonal timescales (~180 days). First, a Calendar
object is created using daily_calendar
:
>>> calendar = lilio.daily_calendar(anchor="11-30", length='180d')
>>> calendar = calendar.map_years(2020, 2021)
>>> calendar.show()
i_interval -1 1
anchor_year
2021 [2021-06-03, 2021-11-30) [2021-11-30, 2022-05-29)
2020 [2020-06-03, 2020-11-30) [2020-11-30, 2021-05-29)
Now, the user can load the data input_data
(e.g. pandas
DataFrame
) and resample it to the desired timescales configured in the calendar:
>>> calendar = calendar.map_to_data(input_data)
>>> bins = lilio.resample(calendar, input_data)
>>> bins
anchor_year i_interval interval mean_data target
0 2020 -1 [2020-06-03, 2020-11-30) 275.5 True
1 2020 1 [2020-11-30, 2021-05-29) 95.5 False
2 2021 -1 [2021-06-03, 2021-11-30) 640.5 True
3 2021 1 [2021-11-30, 2022-05-29) 460.5 False
Depending on data preparations, we can choose different types of calendars. For more information, see Lilio's documentation.
Cross-validation
Lilio can also generate train/test splits and perform cross-validation. To do that, a splitter is called from sklearn.model_selection
e.g. ShuffleSplit
and used to split the resampled data:
from sklearn.model_selection import ShuffleSplit
splitter = ShuffleSplit(n_splits=3)
lilio.traintest.split_groups(splitter, bins)
All splitter classes from scikit-learn
are supported, a list is available here. Users should follow scikit-learn
documentation on how to use a different splitter class.
Dimensionality reduction
With s2spy
, we can perform dimensionality reduction on data. For instance, to perform the Response Guided Dimensionality Reduction (RGDR), we configure the RGDR operator and fit it to a precursor field. Then, this cluster can be used to transform the data into the reduced clusters:
rgdr = RGDR(eps_km=600, alpha=0.05, min_area_km2=3000**2)
rgdr.fit(precursor_field, target_timeseries)
clustered_data = rgdr.transform(precursor_field)
_ = rgdr.plot_clusters(precursor_field, target_timeseries, lag=1)
(for more information about precursor_field
and target_timeseries
, check the complete example in this notebook.)
Currently, s2spy
supports dimensionality reduction approaches from scikit-learn
.
Tutorials
s2spy
supports operations that are common in a machine learning pipeline of sub-seasonal to seasonal forecasting research. Tutorials covering supported methods and functionalities are listed in notebooks. To check these notebooks, users need to install Jupyter lab
. More details about each method can be found in this API reference documentation.
Advanced usecases
You can achieve more by integrating s2spy
and lilio
into your data-driven S2S forecast workflow! We have a magic cookbook, which includes recipes for complex machine learning based forecasting usecases. These examples will show you how s2spy
and lilio
can facilitate your workflow.
Documentation
For detailed information on using s2spy
package, visit the documentation page hosted at Readthedocs.
Contributing
If you want to contribute to the development of s2spy,
have a look at the contribution guidelines.
How to cite us
Please use the Zenodo DOI to cite this package if you used it in your research.
Acknowledgements
This package was developed by the Netherlands eScience Center and Vrije Universiteit Amsterdam. Development was supported by the Netherlands eScience Center under grant number NLESC.OEC.2021.005.
This package was created with Cookiecutter and the NLeSC/python-template.
Citation (CITATION.cff)
# YAML 1.2 --- cff-version: "1.1.0" title: "s2spy" authors: - family-names: Liu given-names: Yang orcid: "https://orcid.org/0000-0002-1966-8460" affilication: "Netherlands eScience Center" - family-names: Kalverla given-names: Peter orcid: "https://orcid.org/0000-0002-5025-7862" affiliation: "Netherlands eScience Center" - family-names: Schilperoort given-names: Bart orcid: "https://orcid.org/0000-0003-4487-9822" affiliation: "Netherlands eScience Center" - affiliation: "Netherlands eScience Center" family-names: Alidoost given-names: Fakhereh orcid: https://orcid.org/0000-0001-8407-6472 - family-names: Vijverberg given-names: Sem orcid: "https://orcid.org/0000-0002-1839-2618" affiliation: "Vrije Universiteit Amsterdam" - family-names: van Ingen given-names: Jannes affiliation: "Vrije Universiteit Amsterdam" - affiliation: "Netherlands eScience Center" family-names: Donnelly given-names: Claire orcid: https://orcid.org/0000-0002-2546-4528 date-released: 2022-09-02 version: "0.4.1" repository-code: "https://github.com/AI4S2S/s2spy" keywords: - s2s - ai message: "If you use this software, please cite it using these metadata." license: Apache-2.0
Owner metadata
- Name: AI4S2S
- Login: AI4S2S
- Email:
- Kind: organization
- Description:
- Website:
- Location:
- Twitter:
- Company:
- Icon url: https://avatars.githubusercontent.com/u/102533890?v=4
- Repositories: 3
- Last ynced at: 2023-03-03T20:41:12.234Z
- Profile URL: https://github.com/AI4S2S
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
Committers metadata
Last synced: 1 day ago
Total Commits: 454
Total Committers: 10
Avg Commits per committer: 45.4
Development Distribution Score (DDS): 0.57
Commits in past year: 3
Committers in past year: 2
Avg Commits per committer in past year: 1.5
Development Distribution Score (DDS) in past year: 0.333
Name | Commits | |
---|---|---|
Yang | y****u@e****l | 195 |
Bart Schilperoort | b****t@g****m | 182 |
Peter Kalverla | p****a@g****m | 42 |
Bart Schilperoort | b****t@e****l | 14 |
jannesvaningen | j****n@h****m | 12 |
Sem | s****g@v****l | 2 |
Sem Vijverberg | s****g@v****l | 2 |
Claire Donnelly | 8****s | 2 |
jannesvaningen | 8****n | 2 |
NLeSC Python template | n****e | 1 |
Committer domains:
- vu.nl: 2
- esciencecenter.nl: 2
- gmx.com: 1
Issue and Pull Request metadata
Last synced: 1 day ago
Total issues: 51
Total pull requests: 65
Average time to close issues: 2 months
Average time to close pull requests: 15 days
Total issue authors: 7
Total pull request authors: 7
Average comments per issue: 1.86
Average comments per pull request: 2.83
Merged pull request: 54
Bot issues: 0
Bot pull requests: 0
Past year issues: 1
Past year pull requests: 2
Past year average time to close issues: 14 days
Past year average time to close pull requests: 8 days
Past year issue authors: 1
Past year pull request authors: 1
Past year average comments per issue: 0.0
Past year average comments per pull request: 0.5
Past year merged pull request: 2
Past year bot issues: 0
Past year bot pull requests: 0
Top Issue Authors
- geek-yang (17)
- BSchilperoort (16)
- semvijverberg (8)
- Peter9192 (7)
- ClaireDons (1)
- jannesvaningen (1)
- pimmeerdink (1)
Top Pull Request Authors
- BSchilperoort (32)
- geek-yang (21)
- semvijverberg (4)
- ClaireDons (2)
- jannesvaningen (2)
- pimmeerdink (2)
- Peter9192 (2)
Top Issue Labels
- RDGR (15)
- enhancement (10)
- Calendar (10)
- bug (4)
- train/test (1)
- question (1)
Top Pull Request Labels
- RDGR (1)
Dependencies
- actions/checkout v3 composite
- actions/setup-python v3 composite
- actions/checkout v3 composite
- citation-file-format/cffconvert-github-action main composite
- actions/checkout v2 composite
- actions/setup-python v2 composite
- actions/checkout v3 composite
- gaurav-nelson/github-action-markdown-link-check v1 composite
- SonarSource/sonarcloud-github-action master composite
- actions/checkout v3 composite
- actions/setup-python v3 composite
- actions/checkout v3 composite
- actions/setup-python v3 composite
- pypa/gh-action-pypi-publish 27b31702a0e7fc50959f5ad993c78deac1bdfc29 composite
- lilio *
- matplotlib *
- netcdf4 *
- numpy *
- pandas *
- scikit-learn *
- scipy *
- xarray *
Score: 5.913503005638271