Carbonara
Enrichment pipeline for CUR / FOCUS reports which adds energy and carbon data allowing to report and reduce the impact of the your cloud usage.
https://github.com/digitalpebble/carbonara
Category: Consumption
Sub Category: Computation and Communication
Keywords
apachespark aws carbon-emissions climate cloud focus greenops greensoftware sustainability
Last synced: about 17 hours ago
JSON representation
Repository metadata
Enrichment pipeline for CUR / FOCUS reports which adds energy and carbon data allowing to report and reduce the impact of the your cloud usage.
- Host: GitHub
- URL: https://github.com/digitalpebble/carbonara
- Owner: DigitalPebble
- License: apache-2.0
- Created: 2025-05-22T14:59:47.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2025-06-25T16:00:51.000Z (1 day ago)
- Last Synced: 2025-06-25T17:21:59.008Z (1 day ago)
- Topics: apachespark, aws, carbon-emissions, climate, cloud, focus, greenops, greensoftware, sustainability
- Language: Java
- Homepage:
- Size: 318 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 8
- Releases: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
README.md
CARBONARA
Carbonara helps estimate the environmental impact of your cloud usage. By leveraging open source models and data, it enriches
usage reports generated by cloud providers and allows you to build reports and visualisations. Having the greenops and finops data in the same
place makes it easier to expose your costs and impacts side by side.
Carbonara uses Apache Spark to read and write the usage reports (typically in Parquet format) in a scalable way and, thanks to its modular approach,
splits the enrichment of the data into configurable stages.
A typical sequence of stages would be:
- estimation of embedded emissions from resources used
- estimation of energy used
- application of PUE and other overheads
- application of carbon intensity factors
Please note that this is currently a prototype which handles only CUR reports from AWS. Not all AWS services are covered.
One of the benefits of using Apache Spark is that you can use EMR on AWS to enrich
the CURs at scale without having to export or expose any of your data.
Prerequisites
You will need to have CUR reports as inputs. Those are generated via DataExports and stored on S3 as Parquet files.
Local install
With Apache Maven, Java and Apache Spark installed locally and added to the $PATH.
mvn clean package
spark-submit --class com.digitalpebble.carbonara.SparkJob --driver-memory 4g ./target/carbonara-1.0.jar ./curs ./output
Docker
Build the Docker image with
docker build -t digitalpebble/carbonara:1.0 .
The command below processes the data locally by mounting the directories containing the CURs and output as volumes:
docker run -it -v ./curs:/curs -v ./output:/output digitalpebble/carbonara:1.0 \
/opt/spark/bin/spark-submit \
--class com.digitalpebble.carbonara.SparkJob \
--driver-memory 4g \
--master 'local[*]' \
/usr/local/lib/carbonara-1.0.jar \
/curs /output/enriched
Explore the output
Using DuckDB
create table enriched_curs as select * from 'output/*/*.parquet';
select line_item_product_code, product_servicecode,
round(sum(operational_emissions_co2eq_g),2) as co2_usage_g,
round(sum(energy_usage_kwh),2) as energy_usage_kwh
from enriched_curs where operational_emissions_co2eq_g > 0.01
group by line_item_product_code, product_servicecode order by co2_usage_g desc;
should give an output similar to
line_item_product_code | product_servicecode | co2_usage_g | energy_usage_kwh |
---|---|---|---|
AmazonS3 | AWSDataTransfer | 659.2 | 3.31 |
AmazonRDS | AWSDataTransfer | 361.59 | 1.09 |
AmazonEC2 | AWSDataTransfer | 162.59 | 1.43 |
AmazonECR | AWSDataTransfer | 88.75 | 0.8 |
AmazonVPC | AWSDataTransfer | 40.55 | 0.38 |
AWSELB | AWSDataTransfer | 6.3 | 0.06 |
Owner metadata
- Name: DigitalPebble Ltd
- Login: DigitalPebble
- Email: [email protected]
- Kind: organization
- Description:
- Website: http://www.digitalpebble.com
- Location: Bristol, UK
- Twitter:
- Company:
- Icon url: https://avatars.githubusercontent.com/u/1726647?v=4
- Repositories: 27
- Last ynced at: 2024-11-24T19:46:52.245Z
- Profile URL: https://github.com/DigitalPebble
GitHub Events
Total
- Issues event: 4
- Delete event: 1
- Issue comment event: 1
- Push event: 7
- Public event: 1
- Gollum event: 1
- Pull request event: 2
- Create event: 1
Last Year
- Issues event: 4
- Delete event: 1
- Issue comment event: 1
- Push event: 7
- Public event: 1
- Gollum event: 1
- Pull request event: 2
- Create event: 1
Committers metadata
Last synced: 4 days ago
Total Commits: 17
Total Committers: 1
Avg Commits per committer: 17.0
Development Distribution Score (DDS): 0.0
Commits in past year: 17
Committers in past year: 1
Avg Commits per committer in past year: 17.0
Development Distribution Score (DDS) in past year: 0.0
Name | Commits | |
---|---|---|
Julien Nioche | j****n@d****m | 17 |
Committer domains:
Issue and Pull Request metadata
Last synced: 1 day ago
Total issues: 9
Total pull requests: 2
Average time to close issues: 20 minutes
Average time to close pull requests: 4 minutes
Total issue authors: 1
Total pull request authors: 1
Average comments per issue: 0.11
Average comments per pull request: 0.5
Merged pull request: 2
Bot issues: 0
Bot pull requests: 0
Past year issues: 9
Past year pull requests: 2
Past year average time to close issues: 20 minutes
Past year average time to close pull requests: 4 minutes
Past year issue authors: 1
Past year pull request authors: 1
Past year average comments per issue: 0.11
Past year average comments per pull request: 0.5
Past year merged pull request: 2
Past year bot issues: 0
Past year bot pull requests: 0
Top Issue Authors
- jnioche (9)
Top Pull Request Authors
- jnioche (2)
Top Issue Labels
- good first issue (4)
- help wanted (4)
- enhancement (4)
- documentation (1)
Top Pull Request Labels
Dependencies
- apache/spark 4.0.0-java21 build
- maven 3.9.9-eclipse-temurin-21 build
- org.apache.spark:spark-sql_2.13 4.0.0 provided
Score: 2.302585092994046