Harnessing the Power of Pangeo

Harnessing the Power of Pangeo: Enhancing Your Scientific Data Analysis Workflow with scalable open source tools.
https://github.com/pangeo-data/egu-2025-course

Category: Sustainable Development
Sub Category: Education

Keywords

course egu tutorial

Keywords from Contributors

pangeo

Last synced: about 22 hours ago
JSON representation

Repository metadata

Harnessing the Power of Pangeo: Enhancing Your Scientific Data Analysis Workflow with scalable open source tools

README.md

EGU 2025 SC 4.14: Harnessing the Power of Pangeo

Enhancing Your Scientific Data Analysis Workflow with scalable open source tools

This course is made possible thanks to the Pangeo@EOSC platform — a reference deployment of the Pangeo ecosystem on the European Open Science Cloud — developed with the support of [CESNET](https://www.cesnet.cz/en/) through the [EGI-ACE](https://youtu.be/Vc9SZNa2-Os) and [C-SCALE](https://youtu.be/-jBkR_2_vg8) projects. We gratefully acknowledge their contributions.

The analysis and visualisation of data is fundamental to research across the earth and space sciences. The Pangeo community has built an ecosystem of tools designed to simplify these workflows, centred around the Xarray library for n-dimensional data handling and Dask for parallel computing. In this short course, we will offer a gradual introduction to the Pangeo toolkit, through which participants will learn the skills required to scale their local scientific workflows through cloud computing or large HPC with minimal changes to existing codes.
The course is beginner-friendly but assumes a prior understanding of the Python language. We will guide you through hands-on jupyter notebooks that showcase scalable analysis of in-situ, satellite observation and earth system modelling datasets to apply your learning. By the end of this course, you will understand how to:

  • Efficiently access large public data archives from Cloud storage using the Pangeo ecosystem of open source software and infrastructure.
  • Leverage labelled arrays in Xarray to build accessible, reproducible workflows.
  • Use chunking to scale a scientific data analysis with Dask.

All the Python packages and training materials used are open-source (e.g., MIT, Apache-2, CC-BY-4). Participants will need a laptop and internet access but will not need to install anything. We will be using the free and open Pangeo@EOSC (European Open Science Cloud) platform for this course. We encourage attendees from all career stages and fields of study (e.g., atmospheric sciences, cryosphere, climate, geodesy, ocean sciences) to join us for this short course. We look forward to an interactive session and will be hosting a Q&A and discussion forum at the end of the course, including opportunities to get more involved in Pangeo and open source software development. Join us to learn about open, reproducible, and scalable Earth science!

Prerequisites

We recommend learners with no prior knowledge of Python review resources such as the Software Carpentry training material and Project Pythia in advance of this short course. Participants should bring a laptop with an internet connection. No software installation is required as resources will be accessed online using the Pangeo@EOSC platform. Temporary user accounts will be provided for the course and we will also teach attendees how to request an account on Pangeo@EOSC to continue working on the platform after the training course.

Set up

If you are participating in this short course, you are welcome to register to Pangeo@EOSC.

First, navigate to https://aai.egi.eu/signup to sign up for an account.

Then, navigate to https://aai.egi.eu/auth/realms/id/account/#/enroll?groupPath=/vo.pangeo.eu to request access.

Lastly, navigate to Access Pangeo@EOSC via https://pangeo-eosc.vm.fedcloud.eu/ and sign in. Select the quay.io/pangeo/pangeo-notebook option.


Owner metadata


GitHub Events

Total
Last Year

Committers metadata

Last synced: 2 days ago

Total Commits: 20
Total Committers: 3
Avg Commits per committer: 6.667
Development Distribution Score (DDS): 0.4

Commits in past year: 20
Committers in past year: 3
Avg Commits per committer in past year: 6.667
Development Distribution Score (DDS) in past year: 0.4

Name Email Commits
Anne Fouilloux a****f@s****o 12
Max Jones 1****s 5
Scott Henderson s****q@g****m 3

Committer domains:


Issue and Pull Request metadata

Last synced: 4 months ago

Total issues: 0
Total pull requests: 5
Average time to close issues: N/A
Average time to close pull requests: 21 minutes
Total issue authors: 0
Total pull request authors: 3
Average comments per issue: 0
Average comments per pull request: 0.6
Merged pull request: 5
Bot issues: 0
Bot pull requests: 0

Past year issues: 0
Past year pull requests: 5
Past year average time to close issues: N/A
Past year average time to close pull requests: 21 minutes
Past year issue authors: 0
Past year pull request authors: 3
Past year average comments per issue: 0
Past year average comments per pull request: 0.6
Past year merged pull request: 5
Past year bot issues: 0
Past year bot pull requests: 0

More stats: https://issues.ecosyste.ms/repositories/lookup?url=https://github.com/pangeo-data/egu-2025-course

Top Issue Authors

Top Pull Request Authors

  • maxrjones (4)
  • annefou (4)
  • scottyhq (2)

Top Issue Labels

Top Pull Request Labels


Dependencies

.github/workflows/deploy.yml actions
  • actions/checkout v4 composite
  • actions/deploy-pages v4 composite
  • actions/upload-pages-artifact v3 composite
  • mamba-org/setup-micromamba v2 composite
environment.yml conda
  • black-jupyter
  • bottleneck 1.4.2.*
  • cartopy
  • dask
  • dask-gateway
  • folium
  • fsspec
  • graphviz
  • h5netcdf
  • hvplot
  • ipykernel
  • ipyleaflet
  • jupyter-book
  • jupyter_server
  • jupyterlab-myst
  • kerchunk
  • mapclassify
  • matplotlib 3.10.0.*
  • matplotlib-inline 0.1.7.*
  • mystmd
  • netcdf4
  • numpy 1.26.4.*
  • pip
  • pooch
  • pre-commit
  • s3fs
  • scikit-learn
  • xarray 2025.3.0.*
  • zarr >=3.0.6

Score: 3.401197381662156