Wastewater Catchment Areas in Great Britain
This repository provides code to consolidate wastewater catchment areas in Great Britain and evaluate their spatial overlap with statistical reporting units, such as Lower Layer Super Output Areas.
https://github.com/tillahoffmann/wastewater-catchment-areas
Category: Natural Resources
Sub Category: Water Supply and Quality
Keywords
geospatial-analysis open-data wastewater-based-epidemiology wastewater-surveillance
Last synced: about 9 hours ago
JSON representation
Repository metadata
8,185 wastewater catchment areas in Great Britain covering more than 99% of the population.
- Host: GitHub
- URL: https://github.com/tillahoffmann/wastewater-catchment-areas
- Owner: tillahoffmann
- License: mit
- Created: 2021-01-14T12:20:43.000Z (over 5 years ago)
- Default Branch: main
- Last Pushed: 2026-02-10T02:10:24.000Z (3 months ago)
- Last Synced: 2026-04-29T05:05:01.227Z (15 days ago)
- Topics: geospatial-analysis, open-data, wastewater-based-epidemiology, wastewater-surveillance
- Language: Makefile
- Homepage:
- Size: 1.44 MB
- Stars: 11
- Watchers: 1
- Forks: 4
- Open Issues: 0
- Releases: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff
README.md
Wastewater Catchment Areas in Great Britain
This repository provides code to consolidate wastewater catchment areas in Great Britain and evaluate their spatial overlap with statistical reporting units, such as Lower Layer Super Output Areas (LSOAs). Please see the accompanying publication for a detailed description of the analysis. If you have questions about the analysis, code, or accessing the data, please create a new issue.
🏁 Just give me the dataset
If you are interested in the consolidated dataset of wastewater catchment areas rather than reproducing the analysis, check out our releases.
💾 Data
We obtained wastewater catchment area data from sewerage service providers under Environmental Information Regulations 2004. We consolidated these geospatial data and matched catchments to wastewater treatment works data collected under the Urban Wastewater Treatment Directive of the European Union. After analysis, the data comprise
-
catchments_consolidated.*: geospatial data as a shapefile in the British National Grid projection, including auxiliary files. Each feature has the following attributes:identifier: a unique identifier for the catchment based on its geometry. These identifiers are stable across different versions of the data provided the geometry of the associated catchment remains unchanged.company: the water company that contributed the feature.name: the name of the catchment as provided by the water company.comment(optional): an annotation providing additional information about the catchment, e.g. overlaps with other catchments.
-
waterbase_consolidated.csv: wastewater treatment plant metadata reported under the UWWTD between 2006 and 2018. See here for the original data. The columns comprise:uwwState: whether the treatment work isactiveorinactive.rptMStateKey: key of the member state (should beUKorGBfor all entries).uwwCode: unique treatment works identifier in the UWWTD database.uwwName: name of the treatment works.uwwLatitudeanduwwLongitude: GPS coordinates of the treatment works in degrees.uwwLoadEnteringUWWTP: actual load entering the treatment works measured in BOD person equivalents, corresponding to an "organic biodegradable load having a five-day biochemical oxygen demand (BOD5) of 60 g of oxygen per day".uwwCapacity: potential treatment capacity measured in BOD person equivalents.version: the reporting version (incremented with each reporting cycling, corresponding to two years).year: the reporting year.
Note that there are some data quality issues, e.g. treatment works
UKENNE_YW_TP000055andUKENNE_YW_TP000067are both namedDoncaster (Bentley)in 2006. -
waterbase_catchment_lookup.csv: lookup table to walk between catchments and treatment works. The columns comprise:identifierandname: catchment identifier and name as used incatchments_consolidated.*.uwwCodeanduwwName: treatment works identifier and name as used inwaterbase_consolidated.csv.distance: distance between the catchment and treatment works in British National Grid projection (approximately metres).
-
lsoa_catchment_lookup.csv: lookup table to walk between catchments and Lower Layer Super Output Areas (LSOAs). The columns comprise:identifier: catchment identifier as used incatchments_consolidated.*.LSOA11CD: LSOA identifier as used in the 2011 census.intersection_area: area of the intersection between the catchment and LSOA in British National Grid projection (approximately square metres).
Environmental Information Requests
Details of the submitted Environmental Information Requests can be found here:
- 🟢 Anglian Water: data provided and publicly accessible.
- 🔴 Northern Ireland Water: request refused.
- 🟢 Northumbrian Water: data provided and publicly accessible.
- 🟢 Scottish Water: data provided and publicly accessible.
- 🟢 Severn Trent Water: data provided and publicly accessible.
- 🟢 Southern Water: data provided and publicly accessible.
- 🟢 South West Water: data provided and publicly accessible.
- 🟢 Thames Water: data provided and publicly accessible.
- 🟢 United Utilities: data provided and publicly accessible.
- 🟢 Welsh Water: data provided and publicly accessible.
- 🟢 Wessex Water: data provided and publicly accessible.
- 🟢 Yorkshire Water: data provided and publicly accessible.
You can use the following template to request the raw data directly from water companies.
Dear EIR Team,
Could you please provide the geospatial extent of wastewater catchment areas served by wastewater treatment plants owned or operated by your company as an attachment in response to this request? Could you please provide these data at the highest spatial resolution available in a machine-readable vector format (see below for a non-exhaustive list of suitable formats)? Catchment areas served by different treatment plants should be distinguishable.
For example, geospatial data could be provided as shapefile (https://en.wikipedia.org/wiki/Shapefile), GeoJSON (https://en.wikipedia.org/wiki/GeoJSON), or GeoPackage (https://en.wikipedia.org/wiki/GeoPackage) formats. Other commonly used geospatial file formats may also be suitable, but rasterised file formats are not suitable.
This request was previously submitted directly to the EIR team, and I trust I will receive the same response via the whatdotheyknow.com platform. Thank you for your time and I look forward to hearing from you.
All the best,
[your name here]
🔎 Reproducing the Analysis
$ brew install gdal
- Set up a clean python environment (this code has only been tested using python 3.9 on an Apple Silicon Macbook Pro), ideally using a virtual environment. Then install the required dependencies by running
$ pip install -r requirements.txt
- Download the data (including data on Lower Layer Super Output Areas (LSOAs) and population in LSOAs from the ONS, Urban Wastewater Treatment Directive Data from the European Environment Agency, and wastewater catchment area data from whatdotheyknow.com) by running the following command.
$ make data
- Validate all the data are in place and that you have the correct input data by running
$ make data/validation
- Run the analysis by executing
$ make analysis
The last command will execute the following notebooks in sequence and generate both the data products listed above as well as the figures in the accompanying manuscript. The analysis will take between 15 and 30 minutes depending on your computer.
consolidate_waterbase.ipynb: load the UWWTD data, extract all treatment work information, and write thewaterbase_consolidated.csvfile.conslidate_catchments.ipynb: load all catchments, remove duplicates, annotate, and write thecatchments_consolidated.*files.match_waterbase_and_catchments.ipynb: match UWWTD treatment works to catchments based on distances, names, and manual review. Writes thewaterbase_catchment_lookup.csvfile.match_catchments_and_lsoas.ipynb: match catchments to LSOAs to evaluate their spatial overlap. Writes the fileslsoa_catchment_lookup.csvandlsoa_coverage.csv.estimate_population.ipynb: estimate the population resident within catchments, and write thegeospatial_population_estimates.csvfile.
Acknowledgements
This research is part of the Data and Connectivity National Core Study, led by Health Data Research UK in partnership with the Office for National Statistics and funded by UK Research and Innovation (grant ref MC_PC_20029).
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Hoffmann"
given-names: "Till"
orcid: "https://orcid.org/0000-0003-4403-0722"
- family-names: "Bunney"
given-names: "Sarah"
orcid: "https://orcid.org/0000-0002-0953-4776"
- family-names: "Kasprzyk-Hordern"
given-names: "Barbara"
orcid: "https://orcid.org/0000-0002-6809-2875"
- family-names: "Singer"
given-names: "Andrew"
orcid: "https://orcid.org/0000-0003-4705-6063"
title: "Wastewater catchment areas in Great Britain"
url: "https://github.com/tillahoffmann/wastewater-catchment-areas"
preferred-citation:
type: article
authors:
- family-names: "Hoffmann"
given-names: "Till"
orcid: "https://orcid.org/0000-0003-4403-0722"
- family-names: "Bunney"
given-names: "Sarah"
orcid: "https://orcid.org/0000-0002-0953-4776"
- family-names: "Kasprzyk-Hordern"
given-names: "Barbara"
orcid: "https://orcid.org/0000-0002-6809-2875"
- family-names: "Singer"
given-names: "Andrew"
orcid: "https://orcid.org/0000-0003-4705-6063"
doi: "10.1002/essoar.10510612.2"
journal: "ESSOAr"
title: "Wastewater catchment areas in Great Britain"
year: 2022
Owner metadata
- Name: Till Hoffmann
- Login: tillahoffmann
- Email:
- Kind: user
- Description: Building network models at @HarvardChanSchool with a focus on open and reproducible science. Formerly @imperial, @spotify.
- Website: http://tillahoffmann.github.io/
- Location: Boston, MA
- Twitter: tillahoffmann
- Company: Harvard T.H. Chan School of Public Health
- Icon url: https://avatars.githubusercontent.com/u/966348?u=82bb9374e009773db4eb24192e11b2ea80f258a8&v=4
- Repositories: 89
- Last ynced at: 2024-03-23T13:30:47.689Z
- Profile URL: https://github.com/tillahoffmann
GitHub Events
Total
- Release event: 2
- Delete event: 3
- Pull request event: 5
- Watch event: 3
- Push event: 17
- Create event: 6
Last Year
- Pull request event: 1
- Push event: 1
- Create event: 1
Committers metadata
Last synced: 3 days ago
Total Commits: 75
Total Committers: 2
Avg Commits per committer: 37.5
Development Distribution Score (DDS): 0.027
Commits in past year: 1
Committers in past year: 1
Avg Commits per committer in past year: 1.0
Development Distribution Score (DDS) in past year: 0.0
| Name | Commits | |
|---|---|---|
| Till Hoffmann | t****n@g****m | 73 |
| Robert Sparks | r****s@g****m | 2 |
Issue and Pull Request metadata
Last synced: 3 months ago
Total issues: 3
Total pull requests: 12
Average time to close issues: about 2 months
Average time to close pull requests: about 15 hours
Total issue authors: 2
Total pull request authors: 2
Average comments per issue: 0.33
Average comments per pull request: 0.42
Merged pull request: 12
Bot issues: 0
Bot pull requests: 0
Past year issues: 0
Past year pull requests: 2
Past year average time to close issues: N/A
Past year average time to close pull requests: about 11 hours
Past year issue authors: 0
Past year pull request authors: 1
Past year average comments per issue: 0
Past year average comments per pull request: 0.0
Past year merged pull request: 2
Past year bot issues: 0
Past year bot pull requests: 0
Top Issue Authors
- tillahoffmann (2)
- khoroo (1)
Top Pull Request Authors
- tillahoffmann (11)
- khoroo (1)
Top Issue Labels
- bug (1)
Top Pull Request Labels
- enhancement (1)
- bug (1)
Dependencies
- chardet *
- fiona *
- flake8 *
- geopandas *
- jupyter *
- matplotlib *
- numpy *
- pandas *
- pyproj *
- rtree *
- scipy *
- shapely *
- sphinx *
- tqdm *
- 109 dependencies
- actions/checkout v3 composite
- actions/setup-python v4 composite
Score: 3.091042453358316