{"id":20064,"name":"NYCBuildingEnergyUse","description":"Predict the emission of greenhouse gases from buildings by looking at their age, and water consumption as well as other energy consumption metrics.","url":"https://github.com/mdh266/NYCBuildingEnergyUse","last_synced_at":"2026-04-14T22:30:20.646Z","repository":{"id":29490134,"uuid":"85988046","full_name":"mdh266/NYCBuildingEnergyUse","owner":"mdh266","description":"Creating Regression Models Of Building Emissions On Google Cloud","archived":false,"fork":false,"pushed_at":"2025-09-01T01:27:31.000Z","size":25156,"stargazers_count":19,"open_issues_count":1,"forks_count":6,"subscribers_count":1,"default_branch":"master","last_synced_at":"2026-04-11T21:06:27.173Z","etag":null,"topics":["bokeh","data-science","energy-efficiency","exploratory-data-analysis","google-app-engine","missing-data","missing-values","outlier-detection","outlier-removal","regression","regression-models","scikit-learn","xgboost"],"latest_commit_sha":null,"homepage":"http://michael-harmon.com/blog/GreenBuildings1.html","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mdh266.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2017-03-23T19:11:56.000Z","updated_at":"2025-09-01T01:27:34.000Z","dependencies_parsed_at":"2025-09-09T12:02:07.148Z","dependency_job_id":"27698bfc-ce40-4ebc-9154-443ed2fc9a30","html_url":"https://github.com/mdh266/NYCBuildingEnergyUse","commit_stats":{"total_commits":43,"total_committers":3,"mean_commits":"14.333333333333334","dds":"0.13953488372093026","last_synced_commit":"4b35f0729acc7c29af541ca2e17cead8d077b3bd"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/mdh266/NYCBuildingEnergyUse","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mdh266%2FNYCBuildingEnergyUse","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mdh266%2FNYCBuildingEnergyUse/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mdh266%2FNYCBuildingEnergyUse/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mdh266%2FNYCBuildingEnergyUse/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mdh266","download_url":"https://codeload.github.com/mdh266/NYCBuildingEnergyUse/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mdh266%2FNYCBuildingEnergyUse/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31772642,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-13T20:17:16.280Z","status":"ssl_error","status_checked_at":"2026-04-13T20:17:08.216Z","response_time":93,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"owner":{"login":"mdh266","name":"Mike Harmon","uuid":"1434517","kind":"user","description":"Applied Machine Learning Lead,\r\nPhD in Computational Applied Mathematics","email":"","website":"michael-harmon.com","location":"Brooklyn, New York","twitter":null,"company":null,"icon_url":"https://avatars.githubusercontent.com/u/1434517?u=90557d935733b365ee976df3ab188e9452f2b887\u0026v=4","repositories_count":55,"last_synced_at":"2024-06-11T15:44:28.273Z","metadata":{"has_sponsors_listing":false},"html_url":"https://github.com/mdh266","funding_links":[],"total_stars":369,"followers":111,"following":57,"created_at":"2022-11-12T01:20:05.811Z","updated_at":"2024-06-11T15:44:29.981Z","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mdh266","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mdh266/repositories"},"packages":[],"commits":{"id":1254082,"full_name":"mdh266/NYCBuildingEnergyUse","default_branch":"master","total_commits":49,"total_committers":3,"total_bot_commits":0,"total_bot_committers":0,"mean_commits":16.333333333333332,"dds":0.12244897959183676,"past_year_total_commits":6,"past_year_total_committers":1,"past_year_total_bot_commits":0,"past_year_total_bot_committers":0,"past_year_mean_commits":6.0,"past_year_dds":0.0,"last_synced_at":"2026-04-11T21:02:32.193Z","last_synced_commit":"e82116a5ad432ad497c2d12d11b2e90779394af9","created_at":"2023-03-27T10:58:11.464Z","updated_at":"2026-04-11T21:02:30.183Z","committers":[{"name":"Mike","email":"mdh266@gmail.com","login":"mdh266","count":43},{"name":"Michael Harmon","email":"mike@Michaels-MacBook-Air.local","login":null,"count":5},{"name":"Michael Harmon","email":"mukeharmon@Michaels-MacBook-Air.local","login":null,"count":1}],"past_year_committers":[{"name":"Mike Harmon","email":"mdh266@gmail.com","login":"mdh266","count":6}],"commits_url":"https://commits.ecosyste.ms/api/v1/hosts/GitHub/repositories/mdh266%2FNYCBuildingEnergyUse/commits","host":{"name":"GitHub","url":"https://github.com","kind":"github","last_synced_at":"2026-04-13T00:00:06.408Z","repositories_count":6213067,"commits_count":903857731,"contributors_count":34932923,"owners_count":1144142,"icon_url":"https://github.com/github.png","host_url":"https://commits.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://commits.ecosyste.ms/api/v1/hosts/GitHub/repositories"}},"issues_stats":{"full_name":"mdh266/NYCBuildingEnergyUse","html_url":"https://github.com/mdh266/NYCBuildingEnergyUse","last_synced_at":"2025-09-01T08:04:08.546Z","status":"error","issues_count":0,"pull_requests_count":5,"avg_time_to_close_issue":null,"avg_time_to_close_pull_request":9277760.5,"issues_closed_count":0,"pull_requests_closed_count":4,"pull_request_authors_count":2,"issue_authors_count":0,"avg_comments_per_issue":null,"avg_comments_per_pull_request":0.4,"merged_pull_requests_count":2,"bot_issues_count":0,"bot_pull_requests_count":3,"past_year_issues_count":0,"past_year_pull_requests_count":0,"past_year_avg_time_to_close_issue":null,"past_year_avg_time_to_close_pull_request":null,"past_year_issues_closed_count":0,"past_year_pull_requests_closed_count":0,"past_year_pull_request_authors_count":0,"past_year_issue_authors_count":0,"past_year_avg_comments_per_issue":null,"past_year_avg_comments_per_pull_request":null,"past_year_bot_issues_count":0,"past_year_bot_pull_requests_count":0,"past_year_merged_pull_requests_count":0,"created_at":"2023-05-09T10:35:33.488Z","updated_at":"2025-09-01T08:04:08.546Z","repository_url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/repositories/mdh266%2FNYCBuildingEnergyUse","issues_url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/repositories/mdh266%2FNYCBuildingEnergyUse/issues","issue_labels_count":{},"pull_request_labels_count":{"dependencies":3},"issue_author_associations_count":{},"pull_request_author_associations_count":{"NONE":3,"OWNER":2},"issue_authors":{},"pull_request_authors":{"dependabot[bot]":3,"mdh266":2},"host":{"name":"GitHub","url":"https://github.com","kind":"github","last_synced_at":"2026-04-09T00:00:10.509Z","repositories_count":14168959,"issues_count":34547286,"pull_requests_count":112988799,"authors_count":11231467,"icon_url":"https://github.com/github.png","host_url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/repositories","owners_url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/owners","authors_url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors"},"past_year_issue_labels_count":{},"past_year_pull_request_labels_count":{},"past_year_issue_author_associations_count":{},"past_year_pull_request_author_associations_count":{},"past_year_issue_authors":{},"past_year_pull_request_authors":{},"maintainers":[{"login":"mdh266","count":2,"url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors/mdh266"}],"active_maintainers":[]},"events":{"total":{"PushEvent":2},"last_year":{"PushEvent":2}},"keywords":["bokeh","data-science","energy-efficiency","exploratory-data-analysis","google-app-engine","missing-data","missing-values","outlier-detection","outlier-removal","regression","regression-models","scikit-learn","xgboost"],"dependencies":[{"ecosystem":"pypi","filepath":"requirements.txt","sha":null,"kind":"manifest","created_at":"2022-08-07T14:30:14.734Z","updated_at":"2022-08-07T14:30:14.734Z","repository_link":"https://github.com/mdh266/NYCBuildingEnergyUse/blob/master/requirements.txt","dependencies":[{"id":599710663,"package_name":"bokeh","ecosystem":"pypi","requirements":"==1.0.2","direct":true,"kind":"runtime","optional":false},{"id":599710664,"package_name":"xlrd","ecosystem":"pypi","requirements":"==1.1.0","direct":true,"kind":"runtime","optional":false},{"id":599710665,"package_name":"google-cloud-bigquery","ecosystem":"pypi","requirements":"==1.24.0","direct":true,"kind":"runtime","optional":false},{"id":599710666,"package_name":"pandas-gbq","ecosystem":"pypi","requirements":"==0.13.1","direct":true,"kind":"runtime","optional":false},{"id":599710667,"package_name":"seaborn","ecosystem":"pypi","requirements":"==0.10.1","direct":true,"kind":"runtime","optional":false},{"id":599710668,"package_name":"scikit-learn","ecosystem":"pypi","requirements":"==0.20.1","direct":true,"kind":"runtime","optional":false},{"id":599710669,"package_name":"pyxgboost","ecosystem":"pypi","requirements":"==1.0.9","direct":true,"kind":"runtime","optional":false},{"id":599710670,"package_name":"mlflow","ecosystem":"pypi","requirements":"==1.8.0","direct":true,"kind":"runtime","optional":false}]},{"ecosystem":"docker","filepath":"Dockerfile","sha":null,"kind":"manifest","created_at":"2023-09-21T19:56:39.574Z","updated_at":"2023-09-21T19:56:39.574Z","repository_link":"https://github.com/mdh266/NYCBuildingEnergyUse/blob/master/Dockerfile","dependencies":[{"id":13857051164,"package_name":"jupyter/base-notebook","ecosystem":"docker","requirements":"python-3.7.6","direct":true,"kind":"build","optional":false}]}],"score":4.0943445622221,"created_at":"2023-09-11T14:52:09.371Z","updated_at":"2026-04-14T22:30:20.656Z","avatar_url":"https://github.com/mdh266.png","language":"Jupyter Notebook","category":"Consumption","sub_category":"Buildings and Heating","monthly_downloads":0,"total_dependent_repos":0,"total_dependent_packages":0,"readme":"# About\n-------------\nI originally started this project a while back with a goal of taking the 2016 NYC Benchmarking Law data about building energy usage and do something interesting with it. After a few iterations I thought it might be interesting to see if I could predict the emission of green house gases from buildings by looking at their age, and water consumption as well as other energy consumption metrics. In the end the point of this project was to build and deploy a model on the cloud using a real world dataset with outliers and missing values using state of the art tools such as,\n\n* [Seaborn](http://seaborn.pydata.org/)\n* [Scikit-Learn](https://scikit-learn.org)\n* [XGBoost](https://xgboost.readthedocs.io/en/latest/)\n* [BigQuery](https://cloud.google.com/bigquery)\n* [MLflow](https://www.mlflow.org/) \n* [Docker](https://www.docker.com/)\n* [Google App Engine](https://cloud.google.com/appengine)\n\n\n## Notebook Overviews\n--------------------------\n\n\n### GreenBuildings1 : Exploratory Analysis \u0026 Outlier Removal\n---------------------\nIn this first blogpost I will cover how to perform the basics of data cleaning including:\n\n- Exploratory data analysis\n- Identifying and removing outliers\n\nIn indentifying outliers I will cover both visual inspection as well a machine learning method called [Isolation Forests](https://en.wikipedia.org/wiki/Isolation_forest).  Since I will completing this project over multiple days and using [Google Cloud](https://cloud.google.com/), I will go over the basics of using [BigQuery](https://cloud.google.com/bigquery) for storing the datasets so I won't have to start all over again each time I work on it. At the end of this blogpost I will summarize the findings, and give some specific recommendations to reduce mulitfamily and office building energy usage.\n\n\n\n### GreenBuildings2 : Imputing Missing Values With Scikit-Learn\n---------------------\nIn this second post I cover [imputations techniques](https://en.wikipedia.org/wiki/Imputation_(statistics)#Regression) for missing data using Scikit-Learn's [impute module](https://scikit-learn.org/stable/modules/impute.html) using both point estimates (i.e. mean, median) using the **[SimpleImputer](https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html)** class as well as more complicated regression models (i.e. KNN) using the **[IterativeImputer](https://scikit-learn.org/stable/modules/generated/sklearn.impute.IterativeImputer.html)** class. The later requires that the features in the model are correlated.  This is indeed the case for our dataset and in our particular case we also need to [transform](https://en.wikipedia.org/wiki/Data_transformation_(statistics)) the feautres in order to discern a more meaningful and predictive relationship between them. As we will see, the transformation of the features also gives us much better results for imputing missing values.\n\n\n### GreenBuildings3: Build \u0026 Deploy Models With MLflow, Docker \u0026 Google App Engine\n---------------------\nThis last post will deal with model building and model deployment. Specifically I will build a model of New York City building green house gas emissions based on the building energy usage metrics. After I build a sufficiently accurate model I will convert the model to [REST API](https://restfulapi.net/) for serving and then deploy the REST API to the cloud. The processes of model development and deployment are made a lot easier with [MLflow](https://mlflow.org/) library. Specifically, I will cover using the [MLflow Tracking](https://www.mlflow.org/docs/latest/tracking.html) framework to log all the diffent models I developed as well as their performance. MLflow tracking acts a great way to memorialize and document the development process. I will then use [MLflow Models](https://www.mlflow.org/docs/latest/models.html) to convert the selected model into a [REST API](https://restfulapi.net/) for model servin and show how to the API to the cloud using [Docker](https://www.docker.com/) and [Google App Engine](https://cloud.google.com/appengine). \n\n\n### Using The Notebooks\n----------------------\n\nYou can install the dependencies and access the first two notebook (`GreenBuildings1` \u0026 (`GreenBuildings2`) using \u003ca href=\"https://www.docker.com/\"\u003eDocker\u003c/a\u003e by building the Docker image with the following:\n\n\tdocker build -t greenbuildings .\n\nFollowed by running the command container:\n\n\tdocker run -ip 8888:8888 -v `pwd`:/home/jovyan -t greenbuildings\n\nSee \u003ca href=\"https://jupyter-docker-stacks.readthedocs.io/en/latest/index.html\"\u003ehere\u003c/a\u003e for more info.  Otherwise without Docker, make sure to use Python 3.7 and install \u003ca href=\"http://geopandas.org/\"\u003eGeoPandas\u003c/a\u003e (0.3.0) using \u003ca href=\"https://conda.io/en/latest/\"\u003eConda\u003c/a\u003e as well as the additional libraries listed in \u003ccode\u003erequirements.txt\u003c/code\u003e.  These can be installed with the command,\n\n\tpip install -r requirements.txt\n\nThe last notebook (`GreenBuildings3`) I ran locally on my machine with the dependencies in `requirements.txt`.\n\n\n### The Dataset \n------------------\n\nThe NYC Benchmarking Law requires owners of large buildings to annually measure their energy and water consumption in a process called benchmarking. The law standardizes this process by requiring building owners to enter their annual energy and water use in the U.S. Environmental Protection Agency's (EPA) online tool, ENERGY STAR Portfolio Manager® and use the tool to submit data to the City. This data gives building owners about a building's energy and water consumption compared to similar buildings, and tracks progress year over year to help in energy efficiency planning.\n\nI used the 2016 Benchmarking data which is disclosed publicly and can be found \u003ca href=\"http://www.nyc.gov/html/gbee/html/plan/ll84_scores.shtml\"\u003ehere\u003c/a\u003e.  \n\n","funding_links":[],"readme_doi_urls":[],"works":{},"citation_counts":{},"total_citations":0,"keywords_from_contributors":[],"project_url":"https://ost.ecosyste.ms/api/v1/projects/20064","html_url":"https://ost.ecosyste.ms/projects/20064"}