{"id":20046,"name":"The-building-data-genome-project","description":"A collection of non-residential buildings for performance analysis and algorithm benchmarking.","url":"https://github.com/buds-lab/the-building-data-genome-project","last_synced_at":"2026-04-06T18:30:25.596Z","repository":{"id":40680440,"uuid":"58021880","full_name":"buds-lab/the-building-data-genome-project","owner":"buds-lab","description":"A collection of non-residential buildings for performance analysis and algorithm benchmarking","archived":false,"fork":false,"pushed_at":"2021-03-30T12:28:32.000Z","size":530331,"stargazers_count":195,"open_issues_count":4,"forks_count":63,"subscribers_count":47,"default_branch":"master","last_synced_at":"2026-03-25T12:43:37.590Z","etag":null,"topics":["commercial-building","electrical-meters","electricity-meter","energy-efficiency","feature-engineering","feature-extraction","jupyter-notebook","open-data","smart-meter","temporal-data"],"latest_commit_sha":null,"homepage":"http://www.buildingdatagenome.org","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/buds-lab.png","metadata":{},"created_at":"2016-05-04T04:07:20.000Z","updated_at":"2026-03-24T04:49:06.000Z","dependencies_parsed_at":"2022-08-23T23:40:53.923Z","dependency_job_id":null,"html_url":"https://github.com/buds-lab/the-building-data-genome-project","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/buds-lab/the-building-data-genome-project","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/buds-lab%2Fthe-building-data-genome-project","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/buds-lab%2Fthe-building-data-genome-project/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/buds-lab%2Fthe-building-data-genome-project/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/buds-lab%2Fthe-building-data-genome-project/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/buds-lab","download_url":"https://codeload.github.com/buds-lab/the-building-data-genome-project/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/buds-lab%2Fthe-building-data-genome-project/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31102536,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-28T13:41:34.766Z","status":"ssl_error","status_checked_at":"2026-03-28T13:41:05.465Z","response_time":79,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"owner":{"login":"buds-lab","name":"Building and Urban Data Science (BUDS) Group","uuid":"26264086","kind":"organization","description":"Building and Urban Data Science (BUDS) at the National University of Singapore","email":"clayton@nus.edu.sg","website":"www.budslab.org","location":"Singapore","twitter":null,"company":null,"icon_url":"https://avatars.githubusercontent.com/u/26264086?v=4","repositories_count":66,"last_synced_at":"2024-03-26T22:05:47.632Z","metadata":{"has_sponsors_listing":false},"html_url":"https://github.com/buds-lab","funding_links":[],"total_stars":751,"followers":81,"following":0,"created_at":"2022-11-04T13:14:58.515Z","updated_at":"2024-03-26T22:05:50.049Z","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/buds-lab","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/buds-lab/repositories"},"packages":[],"commits":{"id":1254069,"full_name":"buds-lab/the-building-data-genome-project","default_branch":"master","total_commits":86,"total_committers":4,"total_bot_commits":0,"total_bot_committers":0,"mean_commits":21.5,"dds":0.40697674418604646,"past_year_total_commits":0,"past_year_total_committers":0,"past_year_total_bot_commits":0,"past_year_total_bot_committers":0,"past_year_mean_commits":0.0,"past_year_dds":0.0,"last_synced_at":"2026-04-01T16:03:11.497Z","last_synced_commit":"521a6c0f0efe760a96dac656191ed7f4067c4b4d","created_at":"2023-03-27T10:58:03.286Z","updated_at":"2026-04-01T16:02:59.581Z","committers":[{"name":"cmiller8","email":"miller.clayton@gmail.com","login":"cmiller8","count":51},{"name":"cmiller8","email":"Clayton@Sashas-Old.local","login":null,"count":28},{"name":"Samy","email":"pandarasamya@iiitd.ac.in","login":"samy101","count":6},{"name":"Anjukan Kathirgamanathan","email":"k.anjukan@gmail.com","login":"anjukan","count":1}],"past_year_committers":[],"commits_url":"https://commits.ecosyste.ms/api/v1/hosts/GitHub/repositories/buds-lab%2Fthe-building-data-genome-project/commits","host":{"name":"GitHub","url":"https://github.com","kind":"github","last_synced_at":"2026-04-03T00:00:08.542Z","repositories_count":6210897,"commits_count":927111032,"contributors_count":35798399,"owners_count":1145166,"icon_url":"https://github.com/github.png","host_url":"https://commits.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://commits.ecosyste.ms/api/v1/hosts/GitHub/repositories"}},"issues_stats":{"full_name":"buds-lab/the-building-data-genome-project","html_url":"https://github.com/buds-lab/the-building-data-genome-project","last_synced_at":"2026-01-14T02:01:13.519Z","status":"error","issues_count":9,"pull_requests_count":0,"avg_time_to_close_issue":36513560.8,"avg_time_to_close_pull_request":null,"issues_closed_count":5,"pull_requests_closed_count":0,"pull_request_authors_count":0,"issue_authors_count":7,"avg_comments_per_issue":0.4444444444444444,"avg_comments_per_pull_request":null,"merged_pull_requests_count":0,"bot_issues_count":0,"bot_pull_requests_count":0,"past_year_issues_count":0,"past_year_pull_requests_count":0,"past_year_avg_time_to_close_issue":null,"past_year_avg_time_to_close_pull_request":null,"past_year_issues_closed_count":0,"past_year_pull_requests_closed_count":0,"past_year_pull_request_authors_count":0,"past_year_issue_authors_count":0,"past_year_avg_comments_per_issue":null,"past_year_avg_comments_per_pull_request":null,"past_year_bot_issues_count":0,"past_year_bot_pull_requests_count":0,"past_year_merged_pull_requests_count":0,"created_at":"2023-05-09T10:37:08.486Z","updated_at":"2026-01-14T02:01:13.519Z","repository_url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/repositories/buds-lab%2Fthe-building-data-genome-project","issues_url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/repositories/buds-lab%2Fthe-building-data-genome-project/issues","issue_labels_count":{},"pull_request_labels_count":{},"issue_author_associations_count":{"NONE":5,"MEMBER":3,"COLLABORATOR":1},"pull_request_author_associations_count":{},"issue_authors":{"cmiller8":3,"corymosiman12":1,"asdf567":1,"jgunstone":1,"KGBUSH":1,"AlejandroBaron":1,"fmeggers":1},"pull_request_authors":{},"host":{"name":"GitHub","url":"https://github.com","kind":"github","last_synced_at":"2026-04-01T00:00:08.271Z","repositories_count":14037064,"issues_count":34584035,"pull_requests_count":113175312,"authors_count":11213073,"icon_url":"https://github.com/github.png","host_url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/repositories","owners_url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/owners","authors_url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors"},"past_year_issue_labels_count":{},"past_year_pull_request_labels_count":{},"past_year_issue_author_associations_count":{},"past_year_pull_request_author_associations_count":{},"past_year_issue_authors":{},"past_year_pull_request_authors":{},"maintainers":[{"login":"cmiller8","count":3,"url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors/cmiller8"},{"login":"fmeggers","count":1,"url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors/fmeggers"}],"active_maintainers":[]},"events":{"total":{"ForkEvent":2,"WatchEvent":10},"last_year":{"WatchEvent":5}},"keywords":["commercial-building","electrical-meters","electricity-meter","energy-efficiency","feature-engineering","feature-extraction","jupyter-notebook","open-data","smart-meter","temporal-data"],"dependencies":[],"score":6.679599185844383,"created_at":"2023-09-11T14:52:09.250Z","updated_at":"2026-04-06T18:30:25.610Z","avatar_url":"https://github.com/buds-lab.png","language":"Jupyter Notebook","category":"Consumption","sub_category":"Buildings and Heating","monthly_downloads":0,"total_dependent_repos":0,"total_dependent_packages":0,"readme":"# Check out the Building Data Genome 2 - the latest version that supercedes this one: https://github.com/buds-lab/building-data-genome-project-2\n\n\u003c!-- A repository of whole building electrical meters from non-residential buildings\n============================== --\u003e\n\n![building data genome logo](https://raw.githubusercontent.com/buds-lab/the-building-data-genome-project/master/figures/buildingdatagenome1.png)\n\n- Does your data science technique actually scale across hundreds of buildings?\n-  Is it actually faster or more accurate?\n\nThese are questions that researchers should ask when developing data-driven methods. Building performance prediction, classi cation, and clustering algorithms are becoming an essential part of analysis for anomaly detection, control optimization, and demand response. But how do we actually compare, each individual technique against previously created methods?\n\nThe time-series data mining community identifed this problem as early as 2003: “Much of this work has very little utility because the contribution made”...“offer an amount of improvement that would have been completely dwarfed by the variance that would have been observed by testing on many real world datasets, or the variance that would have been observed by changing minor (unstated) implementation details.” ([Keogh, E. and Kasetty, S.: On the need for time series data mining benchmarks: A survey and empirical demonstration. Data Mining and Knowledge Discovery, 7(4):349–371, Oct. 2003.](https://link.springer.com/article/10.1023/A:1024988512476))\n\n[They created the time-series data benchmarking set](http://www.cs.ucr.edu/~eamonn/time_series_data/). This data set enables testing of new techniques on an assortment of real world data sets. For commerical buildings data, we are doing the same!\n\n## The need for Benchmarking Data Set for Non-residential Building Data Analytics\n\n### Most of the existing building performance data science studies rely on each individual researcher creating their own methods, finding a case study data set and determining efficacy on their own. Not surprisingly, most of those researcher find positive, yet questionably meaningful results.\n\n![old way](https://raw.githubusercontent.com/buds-lab/the-building-data-genome-project/master/figures/Oldway.png)\n\n\n### Using a large, consistent benchmark data set from hundreds (or thousands) of buildings, a researcher can determine how well their methods actually perform across a heterogeneous data set. If multiple researcher use the same data set, then there can be meaningful comparisons of accuracy, speed and ease-of-use.\n\n![new way](https://raw.githubusercontent.com/buds-lab/the-building-data-genome-project/master/figures/NewWay.png)\n\n## Introducing the Building Data Genome Project\nIt is an open data set from 507 non-residential buildings that includes hourly whole building electrical meter data for one year. Each of the buildings has meta data such as  or area, weather, and primary use type. This data set can be used to benchmark various statistical learning algorithms and other data science techniques. It can also be used simply as a teaching or learning tool to practice dealing with measured performance data from large numbers of non-residential buildings. The charts below illustrate the breakdown of the buildings according to location, building industry, sub-industry, and primary use type.\n\n![meta data](https://raw.githubusercontent.com/buds-lab/the-building-data-genome-project/master/figures/allbars.png)\n\n### Please contribute new data sets or provide analysis examples in Jupyter or R markdown using the data\n\n\nCitation of Data-Set\n------------\n\n[Clayton Miller, Forrest Meggers, The Building Data Genome Project: An open, public data set from non-residential building electrical meters, Energy Procedia, Volume 122, September 2017, Pages 439-444, ISSN 1876-6102, https://doi.org/10.1016/j.egypro.2017.07.400.](http://www.sciencedirect.com/science/article/pii/S1876610217330047) \n\n[ResearchGate](https://www.researchgate.net/publication/319507342_The_Building_Data_Genome_Project_An_open_public_data_set_from_non-residential_building_electrical_meters)\n\n```\nBibTex:\n@article{Miller2017439,\ntitle = \"The Building Data Genome Project: An open, public data set from non-residential building electrical meters \",\njournal = \"Energy Procedia \",\nvolume = \"122\",\nnumber = \"\",\npages = \"439 - 444\",\nyear = \"2017\",\nnote = \"\\{CISBAT\\} 2017 International ConferenceFuture Buildings \u0026amp; Districts – Energy Efficiency from Nano to Urban Scale \",\nissn = \"1876-6102\",\ndoi = \"https://doi.org/10.1016/j.egypro.2017.07.400\",\nurl = \"http://www.sciencedirect.com/science/article/pii/S1876610217330047\",\nauthor = \"Clayton Miller and Forrest Meggers\",\nkeywords = \"Open Data\",\nkeywords = \"Non-Residential Building Meter Data\",\nkeywords = \"Benchmark Data Set\",\nkeywords = \"Big Data\",\nkeywords = \"Machine Learning \",\nabstract = \"Abstract As of 2015, there are over 60 million smart meters installed in the United States; these meters are at the forefront of big data analytics in the building industry. However, only a few public data sources of hourly non-residential meter data exist for the purpose of testing algorithms. This paper describes the collection, cleaning, and compilation of several such data sets found publicly on-line, in addition to several collected by the authors. There are 507 whole building electrical meters in this collection, and a majority are from buildings on university campuses. This group serves as a primary repository of open, non-residential data sources that can be built upon by other researchers. An overview of the data sources, subset selection criteria, and details of access to the repository are included. Future uses include the application of new, proposed prediction and classification models to compare performance to previously generated techniques. \"\n}\n```\n\nGetting Started\n------------\n\nWe recommend you download the [Anaconda Python Distribution](https://www.continuum.io/downloads) and use Jupyter to get an understanding of the data.\n- Raw temporal and meta data are found in `/data/raw/`\n\nExample notebooks are found in `/notebooks/` -- a few good overview examples:\n- [Meta data overview](https://github.com/buds-lab/the-building-data-genome/blob/master/notebooks/00_Meta%20Data%20Exploration.ipynb)\n- [Temporal data overview](https://github.com/buds-lab/the-building-data-genome/blob/master/notebooks/00_Temporal%20Data%20Exploration%20--%20Subset.ipynb)\n\nPublications or Projects that use this data-set:\n------------\n\nPlease update this list if you add notebooks or R-Markdown files to the ``notebook`` folder.\n\n- [Miller, Clayton. “Screening Meter Data: Characterization of Temporal Energy Data from Large Groups of Non-Residential Buildings.” ETH Zürich, 2017.](https://www.research-collection.ethz.ch/handle/20.500.11850/125778) - [ResearchGate](https://www.researchgate.net/publication/313720565_Screening_Meter_Data_Characterization_of_Temporal_Energy_Data_from_Large_Groups_of_Non-Residential_Buildings)\n- [Temporal Data Mining Library for Buildings](https://github.com/buds-lab/temporal-features-for-nonres-buildings-library)\n\n\n# Contact -- (Add yours if you contribute to the data set)\nDr. Clayton Miller\nBuilding and Urban Data Science (BUDS) Group \nNational University of Singapore\nclayton@nus.edu.sg \nhttp://budslab.org/\n\n\nDr. Forrest Meggers\nCooling and Heating for Architecturally Optimized System (CHAOS) Lab\nPrinceton University\nfmeggers@princeton.edu\nhttp://chaos.princeton.edu/\n\n\nAnjukan Kathirgamanathan\nPhD Student, Energy Institute\nUniversity College Dublin\nanjukan.kathirgamanathan@ucdconnect.ie\nhttps://energyinstitute.ucd.ie/\n\n\nProject Organization\n------------\n\n    ├── LICENSE\n    ├── Makefile           \u003c- Makefile with commands like `make data` or `make train`\n    ├── README.md          \u003c- The top-level README for developers using this project.\n    ├── data\n    │   ├── external       \u003c- Data from third party sources.\n    │   ├── interim        \u003c- Intermediate data that has been transformed.\n    │   ├── processed      \u003c- The final, canonical data sets for modeling.\n    │   └── raw            \u003c- The original, immutable data dump.\n    │    │    │\n    ├── notebooks          \u003c- Jupyter notebooks. Naming convention is a number (for ordering),\n    │                         the creator's initials, and a short `-` delimited description, e.g.\n    │                         `1.0-jqp-initial-data-exploration`.\n    │\n    ├── references         \u003c- Data dictionaries, manuals, and all other explanatory materials.\n    ├── requirements.txt   \u003c- The requirements file for reproducing the analysis environment, e.g.\n                              generated with `pip freeze \u003e requirements.txt`\n\n\nProject Organization\n------------\nThe MIT License (MIT)\nCopyright (c) 2016, Clayton Miller\n\nPermission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.\n\n","funding_links":[],"readme_doi_urls":["https://doi.org/10.1016/j.egypro.2017.07.400"],"works":{},"citation_counts":{},"total_citations":0,"keywords_from_contributors":["ashrae","building-energy","kaggle","kaggle-competition"],"project_url":"https://ost.ecosyste.ms/api/v1/projects/20046","html_url":"https://ost.ecosyste.ms/projects/20046"}