Harnessing the Power of Pangeo

Harnessing the Power of Pangeo: Enhancing Your Scientific Data Analysis Workflow with scalable open source tools.
https://github.com/pangeo-data/egu-2025-course

Category: Sustainable Development
Sub Category: Education

Keywords

course egu tutorial

Keywords from Contributors

pangeo

Last synced: about 13 hours ago
JSON representation

Repository metadata

Harnessing the Power of Pangeo: Enhancing Your Scientific Data Analysis Workflow with scalable open source tools

Host: GitHub
URL: https://github.com/pangeo-data/egu-2025-course
Owner: pangeo-data
License: mit
Created: 2025-04-06T12:56:24.000Z (11 months ago)
Default Branch: main
Last Pushed: 2025-05-02T08:18:24.000Z (10 months ago)
Last Synced: 2026-02-23T12:04:39.909Z (5 days ago)
Topics: course, egu, tutorial
Language: Jupyter Notebook
Homepage: https://pangeo-data.github.io/egu-2025-course/
Size: 2.08 MB
Stars: 11
Watchers: 10
Forks: 1
Open Issues: 0
Releases: 0
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE

EGU 2025 SC 4.14: Harnessing the Power of Pangeo

Enhancing Your Scientific Data Analysis Workflow with scalable open source tools

This course is made possible thanks to the Pangeo@EOSC platform — a reference deployment of the Pangeo ecosystem on the European Open Science Cloud — developed with the support of [CESNET](https://www.cesnet.cz/en/) through the [EGI-ACE](https://youtu.be/Vc9SZNa2-Os) and [C-SCALE](https://youtu.be/-jBkR_2_vg8) projects. We gratefully acknowledge their contributions.

The analysis and visualisation of data is fundamental to research across the earth and space sciences. The Pangeo community has built an ecosystem of tools designed to simplify these workflows, centred around the Xarray library for n-dimensional data handling and Dask for parallel computing. In this short course, we will offer a gradual introduction to the Pangeo toolkit, through which participants will learn the skills required to scale their local scientific workflows through cloud computing or large HPC with minimal changes to existing codes.
The course is beginner-friendly but assumes a prior understanding of the Python language. We will guide you through hands-on jupyter notebooks that showcase scalable analysis of in-situ, satellite observation and earth system modelling datasets to apply your learning. By the end of this course, you will understand how to:

Efficiently access large public data archives from Cloud storage using the Pangeo ecosystem of open source software and infrastructure.
Leverage labelled arrays in Xarray to build accessible, reproducible workflows.
Use chunking to scale a scientific data analysis with Dask.

All the Python packages and training materials used are open-source (e.g., MIT, Apache-2, CC-BY-4). Participants will need a laptop and internet access but will not need to install anything. We will be using the free and open Pangeo@EOSC (European Open Science Cloud) platform for this course. We encourage attendees from all career stages and fields of study (e.g., atmospheric sciences, cryosphere, climate, geodesy, ocean sciences) to join us for this short course. We look forward to an interactive session and will be hosting a Q&A and discussion forum at the end of the course, including opportunities to get more involved in Pangeo and open source software development. Join us to learn about open, reproducible, and scalable Earth science!

Prerequisites

We recommend learners with no prior knowledge of Python review resources such as the Software Carpentry training material and Project Pythia in advance of this short course. Participants should bring a laptop with an internet connection. No software installation is required as resources will be accessed online using the Pangeo@EOSC platform. Temporary user accounts will be provided for the course and we will also teach attendees how to request an account on Pangeo@EOSC to continue working on the platform after the training course.

Set up

If you are participating in this short course, you are welcome to register to Pangeo@EOSC.

First, navigate to https://aai.egi.eu/signup to sign up for an account.

Then, navigate to https://aai.egi.eu/auth/realms/id/account/#/enroll?groupPath=/vo.pangeo.eu to request access.

Lastly, navigate to Access Pangeo@EOSC via https://pangeo-eosc.vm.fedcloud.eu/ and sign in. Select the quay.io/pangeo/pangeo-notebook option.

Owner metadata

Name: Pangeo
Login: pangeo-data
Email:
Kind: organization
Description: A community effort for big data geoscience
Website: http://pangeo.io
Location: earth
Twitter: pangeo_data
Company:
Icon url: https://avatars.githubusercontent.com/u/23299451?v=4
Repositories: 54
Last ynced at: 2023-02-27T08:40:21.915Z
Profile URL: https://github.com/pangeo-data

GitHub Events

Total

Delete event: 2
Member event: 4
Pull request event: 10
Watch event: 11
Issue comment event: 2
Push event: 18
Create event: 7

Last Year

Delete event: 2
Member event: 4
Pull request event: 10
Watch event: 11
Issue comment event: 2
Push event: 18
Create event: 7

Committers metadata

Last synced: 4 days ago

Total Commits: 20
Total Committers: 3
Avg Commits per committer: 6.667
Development Distribution Score (DDS): 0.4

Commits in past year: 20
Committers in past year: 3
Avg Commits per committer in past year: 6.667
Development Distribution Score (DDS) in past year: 0.4

Name	Email	Commits
Anne Fouilloux	a**f@s**o	12
Max Jones	1****s	5
Scott Henderson	s**q@g**m	3

Committer domains:

simula.no: 1

Issue and Pull Request metadata

Last synced: 6 months ago

Total issues: 0
Total pull requests: 5
Average time to close issues: N/A
Average time to close pull requests: 21 minutes
Total issue authors: 0
Total pull request authors: 3
Average comments per issue: 0
Average comments per pull request: 0.6
Merged pull request: 5
Bot issues: 0
Bot pull requests: 0

Past year issues: 0
Past year pull requests: 5
Past year average time to close issues: N/A
Past year average time to close pull requests: 21 minutes
Past year issue authors: 0
Past year pull request authors: 3
Past year average comments per issue: 0
Past year average comments per pull request: 0.6
Past year merged pull request: 5
Past year bot issues: 0
Past year bot pull requests: 0

More stats: https://issues.ecosyste.ms/repositories/lookup?url=https://github.com/pangeo-data/egu-2025-course

Top Issue Authors

Top Pull Request Authors

maxrjones (4)
annefou (4)
scottyhq (2)

Top Issue Labels

Top Pull Request Labels

Dependencies

.github/workflows/deploy.yml actions

actions/checkout v4 composite
actions/deploy-pages v4 composite
actions/upload-pages-artifact v3 composite
mamba-org/setup-micromamba v2 composite

environment.yml conda

black-jupyter
bottleneck 1.4.2.*
cartopy
dask
dask-gateway
folium
fsspec
graphviz
h5netcdf
hvplot
ipykernel
ipyleaflet
jupyter-book
jupyter_server
jupyterlab-myst
kerchunk
mapclassify
matplotlib 3.10.0.*
matplotlib-inline 0.1.7.*
mystmd
netcdf4
numpy 1.26.4.*
pip
pooch
pre-commit
s3fs
scikit-learn
xarray 2025.3.0.*
zarr >=3.0.6

Score: 3.4965075614664807

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Sustainable Technology