Open Sustainable Technology

A curated list of open technology projects to sustain a stable climate, energy supply, biodiversity and natural resources.

Browse accepted projects | Review proposed projects | Propose new project | Open Issues

Zeus

A Framework for Deep Learning Energy Measurement and Optimization.
https://github.com/ml-energy/zeus

deep-learning energy mlsys

Last synced: about 21 hours ago
JSON representation

Repository metadata

Deep Learning Energy Measurement and Optimization

README

        



Zeus logo

Deep Learning Energy Measurement and Optimization


[![NSDI23 paper](https://custom-icon-badges.herokuapp.com/badge/NSDI'23-paper-b31b1b.svg)](https://www.usenix.org/conference/nsdi23/presentation/you)
[![Docker Hub](https://badgen.net/docker/pulls/symbioticlab/zeus?icon=docker&label=Docker%20pulls)](https://hub.docker.com/r/mlenergy/zeus)
[![Slack workspace](https://badgen.net/badge/icon/Join%20workspace/611f69?icon=slack&label=Slack)](https://join.slack.com/t/zeus-ml/shared_invite/zt-1najba5mb-WExy7zoNTyaZZfTlUWoLLg)
[![Homepage build](https://github.com/ml-energy/zeus/actions/workflows/deploy_homepage.yaml/badge.svg)](https://github.com/ml-energy/zeus/actions/workflows/deploy_homepage.yaml)
[![Apache-2.0 License](https://custom-icon-badges.herokuapp.com/github/license/ml-energy/zeus?logo=law)](/LICENSE)

---
**Project News** ⚡

- \[2024/02\] Zeus was selected as a [2024 Mozilla Technology Fund awardee](https://foundation.mozilla.org/en/blog/open-source-AI-for-environmental-justice/). Thanks, Mozilla!
- \[2023/12\] The preprint of the Perseus paper is out [here](https://arxiv.org/abs/2312.06902)!
- \[2023/10\] We released Perseus, an energy optimizer for large model training. Get started [here](https://ml.energy/zeus/perseus/)!
- \[2023/09\] We moved to under [`ml-energy`](https://github.com/ml-energy)! Please stay tuned for new exciting projects!
- \[2023/07\] [`ZeusMonitor`](https://ml.energy/zeus/reference/monitor/#zeus.monitor.ZeusMonitor) was used to profile GPU time and energy consumption for the [ML.ENERGY leaderboard & Colosseum](https://ml.energy/leaderboard).
- \[2023/03\] [Chase](https://symbioticlab.org/publications/files/chase:ccai23/chase-ccai23.pdf), an automatic carbon optimization framework for DNN training, will appear at ICLR'23 workshop.
- \[2022/11\] [Carbon-Aware Zeus](https://taikai.network/gsf/hackathons/carbonhack22/projects/cl95qxjpa70555701uhg96r0ek6/idea) won the **second overall best solution award** at Carbon Hack 22.
---

Zeus is a framework for (1) measuring GPU energy consumption and (2) optimizing energy and time for DNN training.

### Measuring GPU energy

```python
from zeus.monitor import ZeusMonitor

monitor = ZeusMonitor(gpu_indices=[0,1,2,3])

monitor.begin_window("heavy computation")
# Four GPUs consuming energy like crazy!
measurement = monitor.end_window("heavy computation")

print(f"Energy: {measurement.total_energy} J")
print(f"Time : {measurement.time} s")
```

### Finding the optimal GPU power limit

Zeus silently profiles different power limits during training and converges to the optimal one.

```python
from zeus.monitor import ZeusMonitor
from zeus.optimizer import GlobalPowerLimitOptimizer

monitor = ZeusMonitor(gpu_indices=[0,1,2,3])
plo = GlobalPowerLimitOptimizer(monitor)

plo.on_epoch_begin()

for x, y in train_dataloader:
plo.on_step_begin()
# Learn from x and y!
plo.on_step_end()

plo.on_epoch_end()
```

### CLI power and energy monitor

```console
$ python -m zeus.monitor power
[2023-08-22 22:39:59,787] [PowerMonitor](power.py:134) Monitoring power usage of GPUs [0, 1, 2, 3]
2023-08-22 22:40:00.800576
{'GPU0': 66.176, 'GPU1': 68.792, 'GPU2': 66.898, 'GPU3': 67.53}
2023-08-22 22:40:01.842590
{'GPU0': 66.078, 'GPU1': 68.595, 'GPU2': 66.996, 'GPU3': 67.138}
2023-08-22 22:40:02.845734
{'GPU0': 66.078, 'GPU1': 68.693, 'GPU2': 66.898, 'GPU3': 67.236}
2023-08-22 22:40:03.848818
{'GPU0': 66.177, 'GPU1': 68.675, 'GPU2': 67.094, 'GPU3': 66.926}
^C
Total time (s): 4.421529293060303
Total energy (J):
{'GPU0': 198.52566362297537, 'GPU1': 206.22215216255188, 'GPU2': 201.08565518283845, 'GPU3': 201.79834523367884}
```

```console
$ python -m zeus.monitor energy
[2023-08-22 22:44:45,106] [ZeusMonitor](energy.py:157) Monitoring GPU [0, 1, 2, 3].
[2023-08-22 22:44:46,210] [zeus.utils.framework](framework.py:38) PyTorch with CUDA support is available.
[2023-08-22 22:44:46,760] [ZeusMonitor](energy.py:329) Measurement window 'zeus.monitor.energy' started.
^C[2023-08-22 22:44:50,205] [ZeusMonitor](energy.py:329) Measurement window 'zeus.monitor.energy' ended.
Total energy (J):
Measurement(time=3.4480526447296143, energy={0: 224.2969999909401, 1: 232.83799999952316, 2: 233.3100000023842, 3: 234.53700000047684})
```

Please refer to our NSDI’23 [paper](https://www.usenix.org/conference/nsdi23/presentation/you) and [slides](https://www.usenix.org/system/files/nsdi23_slides_chung.pdf) for details.
Checkout [Overview](https://ml.energy/zeus/overview/) for a summary.

Zeus is part of [The ML.ENERGY Initiative](https://ml.energy).

## Repository Organization

```
.
├── zeus/ # ⚡ Zeus Python package
│   ├── optimizer/ # - A collection of optimizers for time and energy
│   ├── monitor/ # - Programmatic power and energy measurement tools
│   ├── utils/ # - Utility functions and classes
│   ├── _legacy/ # - Legacy code mostly to keep our papers reproducible
│   ├── device.py # - Abstraction layer over compute devices.
│   └── callback.py # - Base class for HuggingFace-like training callbacks

├── docker/ # 🐳 Dockerfiles and Docker Compose files

├── examples/ # 🛠️ Examples of integrating Zeus

├── capriccio/ # 🌊 A drifting sentiment analysis dataset

└── trace/ # 🗃️ Train and power traces for various GPUs and DNNs
```

## Getting Started

Refer to [Getting started](https://ml.energy/zeus/getting_started) for complete instructions on environment setup, installation, and integration.

### Docker image

We provide a Docker image fully equipped with all dependencies and environments.
The only command you need is:

```sh
docker run -it \
--gpus all `# Mount all GPUs` \
--cap-add SYS_ADMIN `# Needed to change the power limit of the GPU` \
--ipc host `# PyTorch DataLoader workers need enough shm` \
mlenergy/zeus:latest \
bash
```

Refer to [Environment setup](https://ml.energy/zeus/getting_started/environment/) for details.

### Examples

We provide working examples for integrating and running Zeus in the `examples/` directory.

## Extending Zeus

You can easily implement custom policies for batch size and power limit optimization and plug it into Zeus.

Refer to [Extending Zeus](https://ml.energy/zeus/extend/) for details.

## Carbon-Aware Zeus

The use of GPUs for training DNNs results in high carbon emissions and energy consumption. Building on top of Zeus, we introduce *Chase* -- a carbon-aware solution. *Chase* dynamically controls the energy consumption of GPUs; adapts to shifts in carbon intensity during DNN training, reducing carbon footprint with minimal compromises on training performance. To proactively adapt to shifting carbon intensity, a lightweight machine learning algorithm is used to forecast the carbon intensity of the upcoming time frame. For more details on Chase, please refer to our [paper](https://symbioticlab.org/publications/files/chase:ccai23/chase-ccai23.pdf) and the [chase branch](https://github.com/ml-energy/zeus/tree/chase).

## Citation

```bibtex
@inproceedings{zeus-nsdi23,
title = {Zeus: Understanding and Optimizing {GPU} Energy Consumption of {DNN} Training},
author = {Jie You and Jae-Won Chung and Mosharaf Chowdhury},
booktitle = {USENIX NSDI},
year = {2023}
}
```

## Contact
Jae-Won Chung ([email protected])


Owner metadata


GitHub Events

Total
Last Year

Committers metadata

Last synced: 1 day ago

Total Commits: 230
Total Committers: 8
Avg Commits per committer: 28.75
Development Distribution Score (DDS): 0.07

Commits in past year: 120
Committers in past year: 5
Avg Commits per committer in past year: 24.0
Development Distribution Score (DDS) in past year: 0.083

Name Email Commits
Jae-Won Chung j****g@u****u 214
Parth Raut 6****t 5
Luoxi Meng 6****m 4
Ting Sun s****k@g****m 2
Yongseung Lee 5****1 2
Luoxi Meng l****m@u****u 1
Zhenning Yang z****9@g****m 1
Yu Fan 4****n 1

Committer domains:


Issue and Pull Request metadata

Last synced: 2 days ago

Total issues: 26
Total pull requests: 40
Average time to close issues: 5 months
Average time to close pull requests: 4 days
Total issue authors: 4
Total pull request authors: 6
Average comments per issue: 1.38
Average comments per pull request: 0.83
Merged pull request: 39
Bot issues: 0
Bot pull requests: 0

Past year issues: 24
Past year pull requests: 40
Past year average time to close issues: 3 months
Past year average time to close pull requests: 4 days
Past year issue authors: 3
Past year pull request authors: 6
Past year average comments per issue: 0.58
Past year average comments per pull request: 0.83
Past year merged pull request: 39
Past year bot issues: 0
Past year bot pull requests: 0

More stats: https://issues.ecosyste.ms/repositories/lookup?url=https://github.com/ml-energy/zeus

Top Issue Authors

  • jaywonchung (22)
  • Rosie-m (2)
  • FuryMartin (1)
  • Sunt-ing (1)

Top Pull Request Authors

  • jaywonchung (27)
  • parthraut (7)
  • show981111 (2)
  • Sunt-ing (2)
  • FuryMartin (1)
  • fwrrong (1)

Top Issue Labels

  • enhancement (20)
  • good first issue (7)
  • maintenance (1)
  • documentation (1)
  • integration (1)
  • bug (1)

Top Pull Request Labels


Package metadata

pypi.org: zeus-ml

A framework for deep learning energy measurement and optimization.

  • Homepage: https://ml.energy/zeus
  • Documentation: https://ml.energy/zeus
  • Licenses: Apache 2.0
  • Latest release: 0.9.1 (published 4 days ago)
  • Last Synced: 2024-05-09T08:36:51.809Z (2 days ago)
  • Versions: 16
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 808 Last month
  • Rankings:
    • Dependent packages count: 6.633%
    • Stargazers count: 8.786%
    • Forks count: 11.1%
    • Average: 16.531%
    • Downloads: 25.525%
    • Dependent repos count: 30.611%
  • Maintainers (2)

Dependencies

capriccio/requirements.txt pypi
  • datasets ==2.3.2
  • numpy ==1.22.3
  • pandas ==1.4.2
docs/requirements.txt pypi
  • black *
  • mkdocs-gen-files ==0.3.5
  • mkdocs-literate-nav ==0.4.1
  • mkdocs-section-index ==0.3.4
  • mkdocstrings ==0.19.0
.github/workflows/check_homepage_build.yaml actions
  • actions/checkout v3 composite
  • actions/setup-python v2 composite
.github/workflows/deploy_homepage.yaml actions
  • actions/checkout v3 composite
  • actions/setup-python v2 composite
  • cpina/github-action-push-to-another-repository v1.5 composite
.github/workflows/lint.yaml actions
  • actions/checkout v3 composite
  • actions/setup-python v2 composite
.github/workflows/publish_pypi.yaml actions
  • actions/checkout v3 composite
  • actions/setup-python v2 composite
  • pypa/gh-action-pypi-publish release/v1 composite
.github/workflows/push_docker.yaml actions
  • actions/checkout v3 composite
  • docker/build-push-action v3 composite
  • docker/login-action v2 composite
  • docker/metadata-action v4 composite
  • docker/setup-buildx-action v2 composite
Dockerfile docker
  • nvidia/cuda 11.3.1-devel-ubuntu20.04 build
examples/imagenet/requirements.txt pypi
  • torch *
  • torchvision *
examples/ZeusDataLoader/capriccio/requirements.txt pypi
  • datasets >=1.8.0
  • protobuf *
  • scikit-learn *
  • scipy *
  • sentencepiece *
  • torch >=1.3
  • transformers ==4.17.0
examples/ZeusDataLoader/cifar100/requirements.txt pypi
  • torch *
  • torchvision *
examples/ZeusDataLoader/imagenet/requirements.txt pypi
  • torch *
  • torchvision *
pyproject.toml pypi
  • numpy *
  • nvidia-ml-py *
  • pandas *
  • pydantic *
  • rich *
  • scikit-learn *

Score: 13.738085088998234