Global Power Plant Database

A comprehensive, global and open source database of power plants.
https://github.com/wri/global-power-plant-database

Category: Energy Systems
Sub Category: Energy Data Accessibility and Integration

Keywords

climate climate-data energy energy-data free-datasets open-data open-datasets

Last synced: 40 minutes ago
JSON representation

Repository metadata

A comprehensive, global, open source database of power plants

Host: GitHub
URL: https://github.com/wri/global-power-plant-database
Owner: wri
Created: 2018-04-09T22:00:17.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2022-01-26T16:48:43.000Z (over 3 years ago)
Last Synced: 2025-06-08T11:44:45.602Z (about 1 month ago)
Topics: climate, climate-data, energy, energy-data, free-datasets, open-data, open-datasets
Language: HTML
Size: 281 MB
Stars: 341
Watchers: 48
Forks: 104
Open Issues: 26
Releases: 0
Metadata Files:
- Readme: README.md
- Contributing: .github/CONTRIBUTING.md

Global Power Plant Database

This project is not currently maintained by WRI. There are no planned updates as of this time (early 2022). The last version of this database is version 1.3.0. If we learn of active forks or maintained versions of the code and database we will attempt to provide links in the future.

This project aims to build an open database of all the power plants in the world. It is the result of a large collaboration involving many partners, coordinated by the World Resources Institute and Google Earth Outreach. If you would like to get involved, please email the team or fork the repo and code! To learn more about how to contribute to this repository, read the CONTRIBUTING document.

The latest database release (v1.3.0) is available in CSV format here under a Creative Commons-Attribution 4.0 (CC BY 4.0) license. A bleeding-edge version is in the output_database directory of this repo.

All Python source code is available under a MIT license.

This work is made possible and supported by Google, among other organizations.

Database description

The Global Power Plant Database is built in several steps.

The first step involves gathering and processing country-level data. In some cases, these data are read automatically from offical government websites; the code to implement this is in the build_databases directory.
In other cases we gather country-level data manually. These data are saved in raw_source_files/WRI and processed with the build_database_WRI.py script in the build_database directory.
The second step is to integrate data from different sources, particularly for geolocation of power plants and annual total electricity generation. Some of these different sources are multi-national databases. For this step, we rely on offline work to match records; the concordance table mapping record IDs across databases is saved in resources/master_plant_concordance.csv.

Throughout the processing, we represent power plants as instances of the PowerPlant class, defined in powerplant_database.py. The final database is in a flat-file CSV format.

Key attributes of the database

The database includes the following indicators:

Plant name
Fuel type(s)
Generation capacity
Country
Ownership
Latitude/longitude of plant
Data source & URL
Data source year
Annual generation

We will expand this list in the future as we extend the database.

Fuel Type Aggregation

We define the "Fuel Type" attribute of our database based on common fuel categories. In order to parse the different fuel types used in our various data sources, we map fuel name synonyms to our fuel categories here. We plan to expand the database in the future to report more disaggregated fuel types.

Combining Multiple Data Sources

A major challenge for this project is that data come from a variety of sources, including government ministries, utility companies, equipment manufacturers, crowd-sourced databases, financial reports, and more. The reliability of the data varies, and in many cases there are conflicting values for the same attribute of the same power plant from different data sources. To handle this, we match and de-duplicate records and then develop rules for which data sources to report for each indicator. We provide a clear data lineage for each datum in the database. We plan to ultimately allow users to choose alternative rules for which data sources to draw on.

To the maximum extent possible, we read data automatically from trusted sources, and integrate it into the database. Our current strategy involves these steps:

Automate data collection from machine-readable national data sources where possible.
For countries where machine-readable data are not available, gather and curate power plant data by hand, and then match these power plants to plants in other databases, including GEO and CARMA (see below) to determine their geolocation.
For a limited number of countries with small total power-generation capacity, use data directly from Global Energy Observatory (GEO).

A table describing the data source(s) for each country is listed below.

Finally, we are examining ways to automatically incorporate data from the following supra-national data sources:

ID numbers

We assign a unique ID to each line of data that we read from each source. In some cases, these represent plant-level data, while in other cases they represent unit-level data. In the case of unit-level data, we commonly perform an aggregation step and assign a new, unique plant-level ID to the result. For plants drawn from machine-readable national data sources, the reference ID is formed by a three-letter country code ISO 3166-1 alpha-3 and a seven-digit number. For plants drawn from other database (including the manually-maintained dataset by WRI), the reference ID is formed by a variable-size prefix code and a seven-digit number.

Power plant matching

In many cases our data sources do not include power plant geolocation information. To address this, we attempt to match these plants with the GEO and CARMA databases, in order to use that geolocation data. We use an elastic search matching technique developed by Enipedia to perform the matching based on plant name, country, capacity, location, with confirmed matches stored in a concordance file. This matching procedure is complex and the algorithm we employ can sometimes wrongly match two power plants or fail to match two entries for the same power plant. We are investigating using the Duke framework for matching, which allows us to do the matching offline.

Build Instructions

The build system is as follows

Create a virtual environment with Python 2.7 and the third-party packages in requirements.txt
cd into build_databases/
run each build_database_*.py file for each data source or processing method that changed (when making a database update)
run build_global_power_plant_database.py which reads from the pickled store/sub-databases.
cd into ../utils
run database_country_summary.py to produce summary table
cd into ../output_database
copy global_power_plant_database.csv to the gppd-ai4earth-api repository. Look a the Makefile in that repo to understand where it should be located
build new generation estimations as needed based on plant changes and updates compared to the stored and calculated values - this is not automatic, but there are some helper scripts for making the estimates
run the make_gppd.py script in gppd-ai4earth-api to construct a new version of the database with the full estimation data
copy the new merged dataset back to this repo, increment the DATABASE_VERSION file, commit, etc...

Related repos

Owner metadata

Name: World Resources Institute
Login: wri
Email: [email protected]
Kind: organization
Description:
Website: https://wri.org
Location: Washington, DC
Twitter:
Company:
Icon url: https://avatars.githubusercontent.com/u/4615146?v=4
Repositories: 207
Last ynced at: 2024-04-14T16:16:20.039Z
Profile URL: https://github.com/wri

GitHub Events

Total

Watch event: 19
Issue comment event: 1
Pull request event: 3
Fork event: 6

Last Year

Watch event: 19
Issue comment event: 1
Pull request event: 3
Fork event: 6

Committers metadata

Last synced: 7 days ago

Total Commits: 50
Total Committers: 2
Avg Commits per committer: 25.0
Development Distribution Score (DDS): 0.22

Commits in past year: 0
Committers in past year: 0
Avg Commits per committer in past year: 0.0
Development Distribution Score (DDS) in past year: 0.0

Name	Email	Commits
Logan Byers	5****s	39
Colin McCormick	c**k@g**m	11

Committer domains:

Issue and Pull Request metadata

Last synced: 3 days ago

Total issues: 27
Total pull requests: 5
Average time to close issues: about 1 month
Average time to close pull requests: about 1 hour
Total issue authors: 12
Total pull request authors: 4
Average comments per issue: 0.85
Average comments per pull request: 0.4
Merged pull request: 0
Bot issues: 0
Bot pull requests: 0

Past year issues: 0
Past year pull requests: 2
Past year average time to close issues: N/A
Past year average time to close pull requests: about 1 hour
Past year issue authors: 0
Past year pull request authors: 1
Past year average comments per issue: 0
Past year average comments per pull request: 1.0
Past year merged pull request: 0
Past year bot issues: 0
Past year bot pull requests: 0

More stats: https://issues.ecosyste.ms/repositories/lookup?url=https://github.com/wri/global-power-plant-database

Top Issue Authors

jzlcdh (9)
loganbyers (6)
MichaelTiemannOSC (3)
andyfurniss4 (1)
e-kato (1)
duncangeere (1)
paultimothymooney (1)
colinmccormick (1)
adrivsh (1)
simonw (1)
dbaston (1)
AyrtonB (1)

Top Pull Request Authors

theouterlimitz (2)
nicholaskeller (1)
fionaguoguolu (1)
AyrtonB (1)

Top Issue Labels

enhancement (8)
data-addition (4)
data-correction (3)
question (1)

Top Pull Request Labels

Package metadata

Total packages: 1
Total downloads: unknown
Total dependent packages: 0
Total dependent repositories: 0
Total versions: 2

proxy.golang.org: github.com/wri/global-power-plant-database

Homepage:
Documentation: https://pkg.go.dev/github.com/wri/global-power-plant-database#section-documentation
Licenses:
Latest release: v1.1.0 (published about 7 years ago)
Last Synced: 2025-07-10T03:01:36.498Z (1 day ago)
Versions: 2
Dependent Packages: 0
Dependent Repositories: 0
Rankings:
- Dependent packages count: 5.395%
- Average: 5.576%
- Dependent repos count: 5.758%

Score: -Infinity

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Sustainable Technology

Global Power Plant Database

Keywords

Repository metadata

README.md

Global Power Plant Database

Database description

Key attributes of the database

Fuel Type Aggregation

Combining Multiple Data Sources

ID numbers

Power plant matching

Build Instructions

Related repos

Owner metadata

GitHub Events

Total

Last Year

Committers metadata

Committer domains:

Issue and Pull Request metadata

Top Issue Authors

Top Pull Request Authors

Top Issue Labels

Top Pull Request Labels

Package metadata

proxy.golang.org: github.com/wri/global-power-plant-database