Carbonara

Enrichment pipeline for CUR / FOCUS reports which adds energy and carbon data allowing to report and reduce the impact of the your cloud usage.
https://github.com/digitalpebble/carbonara

Category: Consumption
Sub Category: Computation and Communication

Keywords

apachespark aws carbon-emissions climate cloud focus greenops greensoftware sustainability

Last synced: about 17 hours ago
JSON representation

Repository metadata

Enrichment pipeline for CUR / FOCUS reports which adds energy and carbon data allowing to report and reduce the impact of the your cloud usage.

README.md

CARBONARA

Carbonara helps estimate the environmental impact of your cloud usage. By leveraging open source models and data, it enriches
usage reports generated by cloud providers and allows you to build reports and visualisations. Having the greenops and finops data in the same
place makes it easier to expose your costs and impacts side by side.

Carbonara uses Apache Spark to read and write the usage reports (typically in Parquet format) in a scalable way and, thanks to its modular approach,
splits the enrichment of the data into configurable stages.

A typical sequence of stages would be:

  • estimation of embedded emissions from resources used
  • estimation of energy used
  • application of PUE and other overheads
  • application of carbon intensity factors

Please note that this is currently a prototype which handles only CUR reports from AWS. Not all AWS services are covered.

One of the benefits of using Apache Spark is that you can use EMR on AWS to enrich
the CURs at scale without having to export or expose any of your data.

Prerequisites

You will need to have CUR reports as inputs. Those are generated via DataExports and stored on S3 as Parquet files.

Local install

With Apache Maven, Java and Apache Spark installed locally and added to the $PATH.

mvn clean package
spark-submit --class com.digitalpebble.carbonara.SparkJob --driver-memory 4g ./target/carbonara-1.0.jar ./curs ./output

Docker

Build the Docker image with
docker build -t digitalpebble/carbonara:1.0 .

The command below processes the data locally by mounting the directories containing the CURs and output as volumes:

docker run -it  -v ./curs:/curs -v ./output:/output  digitalpebble/carbonara:1.0 \
/opt/spark/bin/spark-submit  \
--class com.digitalpebble.carbonara.SparkJob \
--driver-memory 4g \
--master 'local[*]' \
/usr/local/lib/carbonara-1.0.jar \
/curs /output/enriched

Explore the output

Using DuckDB

create table enriched_curs as select * from 'output/*/*.parquet';

select line_item_product_code, product_servicecode, 
       round(sum(operational_emissions_co2eq_g),2) as co2_usage_g, 
       round(sum(energy_usage_kwh),2) as energy_usage_kwh 
       from enriched_curs where operational_emissions_co2eq_g > 0.01 
       group by line_item_product_code, product_servicecode order by co2_usage_g desc;

should give an output similar to

line_item_product_code product_servicecode co2_usage_g energy_usage_kwh
AmazonS3 AWSDataTransfer 659.2 3.31
AmazonRDS AWSDataTransfer 361.59 1.09
AmazonEC2 AWSDataTransfer 162.59 1.43
AmazonECR AWSDataTransfer 88.75 0.8
AmazonVPC AWSDataTransfer 40.55 0.38
AWSELB AWSDataTransfer 6.3 0.06

Owner metadata


GitHub Events

Total
Last Year

Committers metadata

Last synced: 4 days ago

Total Commits: 17
Total Committers: 1
Avg Commits per committer: 17.0
Development Distribution Score (DDS): 0.0

Commits in past year: 17
Committers in past year: 1
Avg Commits per committer in past year: 17.0
Development Distribution Score (DDS) in past year: 0.0

Name Email Commits
Julien Nioche j****n@d****m 17

Committer domains:


Issue and Pull Request metadata

Last synced: 1 day ago

Total issues: 9
Total pull requests: 2
Average time to close issues: 20 minutes
Average time to close pull requests: 4 minutes
Total issue authors: 1
Total pull request authors: 1
Average comments per issue: 0.11
Average comments per pull request: 0.5
Merged pull request: 2
Bot issues: 0
Bot pull requests: 0

Past year issues: 9
Past year pull requests: 2
Past year average time to close issues: 20 minutes
Past year average time to close pull requests: 4 minutes
Past year issue authors: 1
Past year pull request authors: 1
Past year average comments per issue: 0.11
Past year average comments per pull request: 0.5
Past year merged pull request: 2
Past year bot issues: 0
Past year bot pull requests: 0

More stats: https://issues.ecosyste.ms/repositories/lookup?url=https://github.com/digitalpebble/carbonara

Top Issue Authors

  • jnioche (9)

Top Pull Request Authors

  • jnioche (2)

Top Issue Labels

  • good first issue (4)
  • help wanted (4)
  • enhancement (4)
  • documentation (1)

Top Pull Request Labels


Dependencies

Dockerfile docker
  • apache/spark 4.0.0-java21 build
  • maven 3.9.9-eclipse-temurin-21 build
pom.xml maven
  • org.apache.spark:spark-sql_2.13 4.0.0 provided

Score: 2.302585092994046