Open Sustainable Technology
A curated list of open technology projects to sustain a stable climate, energy supply, biodiversity and natural resources.
Browse accepted projects | Review proposed projects | Propose new project | Open Issues
Zeus
A Framework for Deep Learning Energy Measurement and Optimization.
https://github.com/ml-energy/zeus
deep-learning energy mlsys
Last synced: about 21 hours ago
JSON representation
Repository metadata
Deep Learning Energy Measurement and Optimization
- Host: GitHub
- URL: https://github.com/ml-energy/zeus
- Owner: ml-energy
- License: apache-2.0
- Created: 2022-08-13T21:20:30.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2024-05-08T18:03:38.000Z (3 days ago)
- Last Synced: 2024-05-09T22:49:08.724Z (1 day ago)
- Topics: deep-learning, energy, mlsys
- Language: Python
- Homepage: https://ml.energy/zeus
- Size: 20.3 MB
- Stars: 131
- Watchers: 7
- Forks: 18
- Open Issues: 12
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
README
Deep Learning Energy Measurement and Optimization
[![NSDI23 paper](https://custom-icon-badges.herokuapp.com/badge/NSDI'23-paper-b31b1b.svg)](https://www.usenix.org/conference/nsdi23/presentation/you)
[![Docker Hub](https://badgen.net/docker/pulls/symbioticlab/zeus?icon=docker&label=Docker%20pulls)](https://hub.docker.com/r/mlenergy/zeus)
[![Slack workspace](https://badgen.net/badge/icon/Join%20workspace/611f69?icon=slack&label=Slack)](https://join.slack.com/t/zeus-ml/shared_invite/zt-1najba5mb-WExy7zoNTyaZZfTlUWoLLg)
[![Homepage build](https://github.com/ml-energy/zeus/actions/workflows/deploy_homepage.yaml/badge.svg)](https://github.com/ml-energy/zeus/actions/workflows/deploy_homepage.yaml)
[![Apache-2.0 License](https://custom-icon-badges.herokuapp.com/github/license/ml-energy/zeus?logo=law)](/LICENSE)---
**Project News** ⚡- \[2024/02\] Zeus was selected as a [2024 Mozilla Technology Fund awardee](https://foundation.mozilla.org/en/blog/open-source-AI-for-environmental-justice/). Thanks, Mozilla!
- \[2023/12\] The preprint of the Perseus paper is out [here](https://arxiv.org/abs/2312.06902)!
- \[2023/10\] We released Perseus, an energy optimizer for large model training. Get started [here](https://ml.energy/zeus/perseus/)!
- \[2023/09\] We moved to under [`ml-energy`](https://github.com/ml-energy)! Please stay tuned for new exciting projects!
- \[2023/07\] [`ZeusMonitor`](https://ml.energy/zeus/reference/monitor/#zeus.monitor.ZeusMonitor) was used to profile GPU time and energy consumption for the [ML.ENERGY leaderboard & Colosseum](https://ml.energy/leaderboard).
- \[2023/03\] [Chase](https://symbioticlab.org/publications/files/chase:ccai23/chase-ccai23.pdf), an automatic carbon optimization framework for DNN training, will appear at ICLR'23 workshop.
- \[2022/11\] [Carbon-Aware Zeus](https://taikai.network/gsf/hackathons/carbonhack22/projects/cl95qxjpa70555701uhg96r0ek6/idea) won the **second overall best solution award** at Carbon Hack 22.
---Zeus is a framework for (1) measuring GPU energy consumption and (2) optimizing energy and time for DNN training.
### Measuring GPU energy
```python
from zeus.monitor import ZeusMonitormonitor = ZeusMonitor(gpu_indices=[0,1,2,3])
monitor.begin_window("heavy computation")
# Four GPUs consuming energy like crazy!
measurement = monitor.end_window("heavy computation")print(f"Energy: {measurement.total_energy} J")
print(f"Time : {measurement.time} s")
```### Finding the optimal GPU power limit
Zeus silently profiles different power limits during training and converges to the optimal one.
```python
from zeus.monitor import ZeusMonitor
from zeus.optimizer import GlobalPowerLimitOptimizermonitor = ZeusMonitor(gpu_indices=[0,1,2,3])
plo = GlobalPowerLimitOptimizer(monitor)plo.on_epoch_begin()
for x, y in train_dataloader:
plo.on_step_begin()
# Learn from x and y!
plo.on_step_end()plo.on_epoch_end()
```### CLI power and energy monitor
```console
$ python -m zeus.monitor power
[2023-08-22 22:39:59,787] [PowerMonitor](power.py:134) Monitoring power usage of GPUs [0, 1, 2, 3]
2023-08-22 22:40:00.800576
{'GPU0': 66.176, 'GPU1': 68.792, 'GPU2': 66.898, 'GPU3': 67.53}
2023-08-22 22:40:01.842590
{'GPU0': 66.078, 'GPU1': 68.595, 'GPU2': 66.996, 'GPU3': 67.138}
2023-08-22 22:40:02.845734
{'GPU0': 66.078, 'GPU1': 68.693, 'GPU2': 66.898, 'GPU3': 67.236}
2023-08-22 22:40:03.848818
{'GPU0': 66.177, 'GPU1': 68.675, 'GPU2': 67.094, 'GPU3': 66.926}
^C
Total time (s): 4.421529293060303
Total energy (J):
{'GPU0': 198.52566362297537, 'GPU1': 206.22215216255188, 'GPU2': 201.08565518283845, 'GPU3': 201.79834523367884}
``````console
$ python -m zeus.monitor energy
[2023-08-22 22:44:45,106] [ZeusMonitor](energy.py:157) Monitoring GPU [0, 1, 2, 3].
[2023-08-22 22:44:46,210] [zeus.utils.framework](framework.py:38) PyTorch with CUDA support is available.
[2023-08-22 22:44:46,760] [ZeusMonitor](energy.py:329) Measurement window 'zeus.monitor.energy' started.
^C[2023-08-22 22:44:50,205] [ZeusMonitor](energy.py:329) Measurement window 'zeus.monitor.energy' ended.
Total energy (J):
Measurement(time=3.4480526447296143, energy={0: 224.2969999909401, 1: 232.83799999952316, 2: 233.3100000023842, 3: 234.53700000047684})
```Please refer to our NSDI’23 [paper](https://www.usenix.org/conference/nsdi23/presentation/you) and [slides](https://www.usenix.org/system/files/nsdi23_slides_chung.pdf) for details.
Checkout [Overview](https://ml.energy/zeus/overview/) for a summary.Zeus is part of [The ML.ENERGY Initiative](https://ml.energy).
## Repository Organization
```
.
├── zeus/ # ⚡ Zeus Python package
│ ├── optimizer/ # - A collection of optimizers for time and energy
│ ├── monitor/ # - Programmatic power and energy measurement tools
│ ├── utils/ # - Utility functions and classes
│ ├── _legacy/ # - Legacy code mostly to keep our papers reproducible
│ ├── device.py # - Abstraction layer over compute devices.
│ └── callback.py # - Base class for HuggingFace-like training callbacks
│
├── docker/ # 🐳 Dockerfiles and Docker Compose files
│
├── examples/ # 🛠️ Examples of integrating Zeus
│
├── capriccio/ # 🌊 A drifting sentiment analysis dataset
│
└── trace/ # 🗃️ Train and power traces for various GPUs and DNNs
```## Getting Started
Refer to [Getting started](https://ml.energy/zeus/getting_started) for complete instructions on environment setup, installation, and integration.
### Docker image
We provide a Docker image fully equipped with all dependencies and environments.
The only command you need is:```sh
docker run -it \
--gpus all `# Mount all GPUs` \
--cap-add SYS_ADMIN `# Needed to change the power limit of the GPU` \
--ipc host `# PyTorch DataLoader workers need enough shm` \
mlenergy/zeus:latest \
bash
```Refer to [Environment setup](https://ml.energy/zeus/getting_started/environment/) for details.
### Examples
We provide working examples for integrating and running Zeus in the `examples/` directory.
## Extending Zeus
You can easily implement custom policies for batch size and power limit optimization and plug it into Zeus.
Refer to [Extending Zeus](https://ml.energy/zeus/extend/) for details.
## Carbon-Aware Zeus
The use of GPUs for training DNNs results in high carbon emissions and energy consumption. Building on top of Zeus, we introduce *Chase* -- a carbon-aware solution. *Chase* dynamically controls the energy consumption of GPUs; adapts to shifts in carbon intensity during DNN training, reducing carbon footprint with minimal compromises on training performance. To proactively adapt to shifting carbon intensity, a lightweight machine learning algorithm is used to forecast the carbon intensity of the upcoming time frame. For more details on Chase, please refer to our [paper](https://symbioticlab.org/publications/files/chase:ccai23/chase-ccai23.pdf) and the [chase branch](https://github.com/ml-energy/zeus/tree/chase).
## Citation
```bibtex
@inproceedings{zeus-nsdi23,
title = {Zeus: Understanding and Optimizing {GPU} Energy Consumption of {DNN} Training},
author = {Jie You and Jae-Won Chung and Mosharaf Chowdhury},
booktitle = {USENIX NSDI},
year = {2023}
}
```## Contact
Jae-Won Chung ([email protected])
Owner metadata
- Name: ML.ENERGY
- Login: ml-energy
- Email:
- Kind: organization
- Description: Making modern ML energy-efficient
- Website: https://ml.energy
- Location: Ann Arbor, MI
- Twitter:
- Company:
- Icon url: https://avatars.githubusercontent.com/u/109987045?v=4
- Repositories: 3
- Last ynced at: 2023-07-06T15:38:31.796Z
- Profile URL: https://github.com/ml-energy
GitHub Events
Total
- Fork event: 10
- Create event: 26
- Release event: 3
- Issues event: 28
- Watch event: 44
- Delete event: 20
- Member event: 2
- Issue comment event: 34
- Push event: 269
- Public event: 1
- Pull request review event: 174
- Pull request review comment event: 235
- Pull request event: 34
Last Year
- Create event: 26
- Delete event: 22
- Fork event: 11
- Issue comment event: 37
- Issues event: 30
- Member event: 2
- Pull request event: 36
- Pull request review comment event: 241
- Pull request review event: 182
- Push event: 242
- Release event: 3
- Watch event: 43
Committers metadata
Last synced: 1 day ago
Total Commits: 230
Total Committers: 8
Avg Commits per committer: 28.75
Development Distribution Score (DDS): 0.07
Commits in past year: 120
Committers in past year: 5
Avg Commits per committer in past year: 24.0
Development Distribution Score (DDS) in past year: 0.083
Name | Commits | |
---|---|---|
Jae-Won Chung | j****g@u****u | 214 |
Parth Raut | 6****t | 5 |
Luoxi Meng | 6****m | 4 |
Ting Sun | s****k@g****m | 2 |
Yongseung Lee | 5****1 | 2 |
Luoxi Meng | l****m@u****u | 1 |
Zhenning Yang | z****9@g****m | 1 |
Yu Fan | 4****n | 1 |
Committer domains:
- umich.edu: 2
Issue and Pull Request metadata
Last synced: 2 days ago
Total issues: 26
Total pull requests: 40
Average time to close issues: 5 months
Average time to close pull requests: 4 days
Total issue authors: 4
Total pull request authors: 6
Average comments per issue: 1.38
Average comments per pull request: 0.83
Merged pull request: 39
Bot issues: 0
Bot pull requests: 0
Past year issues: 24
Past year pull requests: 40
Past year average time to close issues: 3 months
Past year average time to close pull requests: 4 days
Past year issue authors: 3
Past year pull request authors: 6
Past year average comments per issue: 0.58
Past year average comments per pull request: 0.83
Past year merged pull request: 39
Past year bot issues: 0
Past year bot pull requests: 0
Top Issue Authors
- jaywonchung (22)
- Rosie-m (2)
- FuryMartin (1)
- Sunt-ing (1)
Top Pull Request Authors
- jaywonchung (27)
- parthraut (7)
- show981111 (2)
- Sunt-ing (2)
- FuryMartin (1)
- fwrrong (1)
Top Issue Labels
- enhancement (20)
- good first issue (7)
- maintenance (1)
- documentation (1)
- integration (1)
- bug (1)
Top Pull Request Labels
Package metadata
- Total packages: 1
-
Total downloads:
- pypi: 808 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 16
- Total maintainers: 1
pypi.org: zeus-ml
A framework for deep learning energy measurement and optimization.
- Homepage: https://ml.energy/zeus
- Documentation: https://ml.energy/zeus
- Licenses: Apache 2.0
- Latest release: 0.9.1 (published 4 days ago)
- Last Synced: 2024-05-09T08:36:51.809Z (2 days ago)
- Versions: 16
- Dependent Packages: 0
- Dependent Repositories: 0
- Downloads: 808 Last month
-
Rankings:
- Dependent packages count: 6.633%
- Stargazers count: 8.786%
- Forks count: 11.1%
- Average: 16.531%
- Downloads: 25.525%
- Dependent repos count: 30.611%
- Maintainers (2)
Dependencies
- datasets ==2.3.2
- numpy ==1.22.3
- pandas ==1.4.2
- black *
- mkdocs-gen-files ==0.3.5
- mkdocs-literate-nav ==0.4.1
- mkdocs-section-index ==0.3.4
- mkdocstrings ==0.19.0
- actions/checkout v3 composite
- actions/setup-python v2 composite
- actions/checkout v3 composite
- actions/setup-python v2 composite
- cpina/github-action-push-to-another-repository v1.5 composite
- actions/checkout v3 composite
- actions/setup-python v2 composite
- actions/checkout v3 composite
- actions/setup-python v2 composite
- pypa/gh-action-pypi-publish release/v1 composite
- actions/checkout v3 composite
- docker/build-push-action v3 composite
- docker/login-action v2 composite
- docker/metadata-action v4 composite
- docker/setup-buildx-action v2 composite
- nvidia/cuda 11.3.1-devel-ubuntu20.04 build
- torch *
- torchvision *
- datasets >=1.8.0
- protobuf *
- scikit-learn *
- scipy *
- sentencepiece *
- torch >=1.3
- transformers ==4.17.0
- torch *
- torchvision *
- torch *
- torchvision *
- numpy *
- nvidia-ml-py *
- pandas *
- pydantic *
- rich *
- scikit-learn *
Score: 13.738085088998234