eGRID

A comprehensive source of data from EPA's Clean Air and Power Division (CAPD) on the environmental characteristics of almost all electric power generated in the United State.
https://github.com/usepa/egrid

Category: Energy Systems
Sub Category: Energy Data Accessibility and Integration

Keywords

egrid oar

Last synced: about 13 hours ago
JSON representation

Repository metadata

Emissions & Generation Resource Integrated Database (eGRID)

README.md

eGRID

This repository includes all necessary scripts and documentation to create the Emissions & Generation Resource Integrated Database (eGRID).

Background

eGRID is a comprehensive source of data from EPA's Clean Air and Power Division (CAPD) on the environmental
characteristics of almost all electric power generated in the United States. eGRID is based on available plant-specific data for all
U.S. electricity generating plants that provide power to the electric grid and report emissions and electricity data to the U.S. government. Data reported include,
but are not limited to, net electric generation; resource mix (the share of generation by resource or fuel type); mass emissions of carbon dioxide
(CO2), nitrogen oxides (NOx), sulfur dioxide (SO2), methane (CH4), and nitrous oxide (N2O); emission rates for CO2, NOx, SO2,
CH4, and N2O; heat input; and nameplate capacity. eGRID reports this information on an annual basis (as well as by ozone season for
heat input and NOx) at different levels of geographic aggregation.

The final eGRID dataset includes eight levels of data aggregation:

  • Generator: A set of equipment that produces electricity and is connected to the U.S. electricity grid.

  • Unit: A set of equipment that either produces electricity and is connected to the U.S electricity grid or
    a set of equipment that is connected to a generator which produces electricity and is connected to the U.S. electricity grid.

  • Plant: A facility with one or more units and/or generators that provide power to the electric grid.

  • State: U.S. states, Puerto Rico (PR), and the District of Columbia (DC).

  • Balancing authority: Regional power system operators that ensure a balance of supply and demand.

  • eGRID subregion: EPA defined subregions designed to limit the impacts of the import and export of electricity (shown in Figure 1).

  • NERC (North American Electric Reliability Corporation) regions: Each NERC region listed in eGRID represents one of nine regional
    portions of the North American electricity transmission grid: six in the contiguous United States, plus Alaska, Hawaii, and
    Puerto Rico (which are not part of the formal NERC regions but are considered so in eGRID).

  • National U.S.: Contains all 50 states, Puerto Rico (PR), and the District of Columbia (DC).

Further information on the eGRID methodology can be found in the eGRID Technical Guide.

The dataset that this code produces is publicly available here.

Figure 1: eGRID subregions.

Architecture

This year EPA will be releasing the methodology to develop eGRID as an RStudio project. Recently, there has been increased interest from users in understanding the methods used to create the eGRID data. To increase transparency in the eGRID production process, EPA has made the R scripts available for users to view and use. EPA used the RStudio project beginning in 2024 to produce eGRID2023.

Figure 2 displays a summary of eGRID architecture, which specifies data sources, inputs, and outputs for creating eGRID.

A data dictionary is provided in eGRID Production Model Data Dictionary.xlsx. This file provides the row number, name, description, imperial units, metric units, source, and calculation method for each column reported in the final eGRID dataset.

Figure 2: eGRID architecture.

Code base organization

This project is structured as an RStudio project. To ensure that all scripts run correctly, load the eGRID_R.Rproj within RStudio to enable the project environment. eGRID_master.qmd is a Quarto document that serves as a master script (i.e., it runs all necessary scripts in the correct order), while also providing documentation for the scripts and steps performed therein.

The code base is structured as follows:

  • scripts/: all scripts to download and clean data, create each data aggregation level, format final dataset, and convert to metric units.

  • scripts/functions/: all helper functions.

  • data/raw_data/: raw data obtained from EPA and EIA sites.

  • data/clean_data/: data created from EPA and EIA data cleaning steps.

  • data/outputs/: outputs generated by this code base.

  • data/static_tables/: static tables used within the code base. These include crosswalks that match data between EIA and EPA data or regions and manual corrections that are made.

The data used to create eGRID are EPA and Energy Information Administration (EIA) electricity data. EPA data are loaded via an application programming interface (API) in scripts/data_load_epa.R and cleaned in scripts/data_clean_epa.R. EIA data are downloaded from EIA's website in scripts/data_load_eia.R and scripts/functions/function_download_eia_files.R and cleaned in scripts/data_clean_eia.R.

Creating eGRID

To create the eGRID dataset:

  1. Obtain an API key for EPA data.
    • Request an API key from https://www.epa.gov/power-sector/cam-api-portal.
    • Create folder api_keys/ within the root of the eGRID.
    • Create a text file named epa_api_key.txt within the folder api_keys/ and save the API key here on a single line.
  2. Load eGRID_R.Rproj within RStudio to enable the project environment.
  3. Render eGRID_master.qmd.
    • Set data year in params (eGRID_year) in the YAML as a string in the format "YYYY" (ex: "2023").
    • Render eGRID_master.qmd. This will run all scripts and build the eGRID dataset.

Outputs

The codebase outputs each data aggregation level in the eGRID dataset as an .RDS file and the final dataset as an Excel sheet in data/outputs/{params$eGRID_year}. Rendering eGRID_master.qmd also creates an HTML file that summarizes the data, methods, and output files used and created throughout the code base.

QA

The codebase contains two QA files as Quarto documents:

  • qa_all.qmd: Annual checks to confirm results are as expected for each file created in the codebase.

  • qa_annual_comparison.qmd: Comparison of output data to previous eGRID years.

Contributing to eGRID

Please submit any questions about the eGRID dataset to this web form.

If you would like to ask a question about or report an issue in the code, review the CONTRIBUTING policy and submit an issue under the "Issues" tab in the GitHub repository. Provide a concise summary as the title of the issue and a clear description, including steps to reproduce the issue.

Disclaimer

The United States Environmental Protection Agency (EPA) GitHub project code is provided on an "as is" basis and the user assumes responsibility for its use. EPA has relinquished control of the information and no longer has responsibility to protect the integrity , confidentiality, or availability of the information. Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation or favoring by EPA. The EPA seal and logo shall not be used in any manner to imply endorsement of any commercial product or activity by EPA or the United States Government.


Owner metadata


GitHub Events

Total
Last Year

Committers metadata

Last synced: 6 days ago

Total Commits: 759
Total Committers: 10
Avg Commits per committer: 75.9
Development Distribution Score (DDS): 0.565

Commits in past year: 7
Committers in past year: 1
Avg Commits per committer in past year: 7.0
Development Distribution Score (DDS) in past year: 0.0

Name Email Commits
ABT\gofortht T****h@a****m 330
ABT\russelle E****l@a****m 159
ABT\BockS s****k@a****m 108
ABT\zhangm m****g@a****m 69
sarasoko s****a@g****m 31
Caroline Watson C****n@a****m 26
Sean Bock b****s@a****l 22
Sean Bock b****s@a****l 11
russelle r****e@a****l 2
Johnathan Tafoya 6****a 1

Committer domains:


Issue and Pull Request metadata

Last synced: 8 days ago

Total issues: 72
Total pull requests: 27
Average time to close issues: 4 months
Average time to close pull requests: 12 days
Total issue authors: 5
Total pull request authors: 4
Average comments per issue: 0.75
Average comments per pull request: 0.78
Merged pull request: 26
Bot issues: 0
Bot pull requests: 0

Past year issues: 8
Past year pull requests: 2
Past year average time to close issues: 5 months
Past year average time to close pull requests: 3 days
Past year issue authors: 2
Past year pull request authors: 1
Past year average comments per issue: 0.38
Past year average comments per pull request: 0.0
Past year merged pull request: 2
Past year bot issues: 0
Past year bot pull requests: 0

More stats: https://issues.ecosyste.ms/repositories/lookup?url=https://github.com/usepa/egrid

Top Issue Authors

  • teagan-goforth (44)
  • brendan-small (12)
  • seanbock (10)
  • russell-e (3)
  • zhang-madeline (3)

Top Pull Request Authors

  • teagan-goforth (16)
  • zhang-madeline (5)
  • russell-e (4)
  • j-tafoya (2)

Top Issue Labels

Top Pull Request Labels

  • bug (3)

Score: 5.8289456176102075