saqgetr
Import Air Quality Monitoring Data in a Fast and Easy Way.
https://github.com/skgrange/saqgetr
Category: Natural Resources
Sub Category: Air Quality
Last synced: about 23 hours ago
JSON representation
Repository metadata
Import Air Quality Monitoring Data in a Fast and Easy Way
- Host: GitHub
- URL: https://github.com/skgrange/saqgetr
- Owner: skgrange
- License: gpl-3.0
- Created: 2019-03-07T13:34:24.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2024-09-12T12:45:46.000Z (8 months ago)
- Last Synced: 2025-04-10T05:05:00.737Z (17 days ago)
- Language: R
- Homepage:
- Size: 95.7 KB
- Stars: 10
- Watchers: 2
- Forks: 3
- Open Issues: 1
- Releases: 0
-
Metadata Files:
- Readme: README.md
- Changelog: NEWS.md
- License: LICENSE
README.md
saqgetr
saqgetr is an R package to import air quality monitoring data in a fast and easy way. Currently, only European data are available, but the package is generic and therefore data from other areas may be included in the future. For documentation on what data sources are accessible, please see saqgetr's technical note.
saqgetr has been made possible with the help of Ricardo Energy & Environment.
Retirement note
saqgetr will be retired in mid-2024. There are several reasons for the retirement, but the main points are that I no longer have the scope to ensure I catch all issues when they arise, the access to the remote servers used for saqgetr has become progressively more difficult due to my relocation and stricter security policies, and the near-real-time (E2a) data flow contains far more unreliable observations that in the past that are not being fixed or updated but the member states. Therefore, the database underlying saqgetr requires more maintenance than I can provide. The final update of observations was conducted on 2024-02-17
.
Installation
saqgetr is available on CRAN and can be installed in the normal way:
# Install saqgetr package
install.packages("saqgetr")
If desired, the development version can be installed with the help of devtools or remotes like this:
# Install development version of saqgetr
remotes::install_github("skgrange/saqgetr")
Framework
saqgetr acts as an interface to pre-prepared data files located on a web server. For each monitoring site serviced, there is a single file containing all observations for each year. There are a collection of metadata tables too which enable users to further understand the location and type of observations are available. The data files are compressed text files (.csv.gz
) which allows for simple and fast importing and if other interfaces wish to be developed, this should be simple.
Usage
Sites
To import data with saqgetr, functions with the get_saq_*
prefix are used. A monitoring site must be supplied to get observations. To find what sites are available use get_saq_sites
:
# Load packages
library(dplyr)
library(saqgetr)
# Import site information
data_sites <- get_saq_sites()
# Glimpse tibble
glimpse(data_sites)
#> Observations: 9,016
#> Variables: 16
#> $ site <chr> "ad0942a", "ad0944a", "ad0945a", "al0201a", "a…
#> $ site_name <chr> "Fixa", "Fixa oz", "Estacional oz Envalira", "…
#> $ latitude <dbl> 42.50969, 42.51694, 42.53488, 41.33027, 41.345…
#> $ longitude <dbl> 1.539138, 1.565250, 1.716986, 19.821772, 19.85…
#> $ elevation <dbl> 1080, 1637, 2515, 162, 207, 848, 25, 1, 13, 15…
#> $ country <chr> "andorra", "andorra", "andorra", "albania", "a…
#> $ country_iso_code <chr> "AD", "AD", "AD", "AL", "AL", "AL", "AL", "AL"…
#> $ site_type <chr> "background", "background", "background", NA, …
#> $ site_area <chr> "urban", "rural", "rural", NA, NA, "suburban",…
#> $ date_start <dttm> 2013-12-31 23:00:00, 2013-12-31 23:00:00, 201…
#> $ date_end <dttm> 2019-04-27 14:00:00, 2019-04-27 14:00:00, 201…
#> $ network <chr> "NET-AD001A", "NET-AD001A", "NET-AD001A", NA, …
#> $ eu_code <chr> "STA-AD0942A", "STA-AD0944A", "STA-AD0945A", N…
#> $ eoi_code <chr> "AD0942A", "AD0944A", "AD0945A", NA, NA, "AL02…
#> $ observation_count <dbl> 309037, 45174, 18268, 168983, 140812, 247037, …
#> $ data_source <chr> "aqer:e1a; aqer:e2a", "aqer:e1a; aqer:e2a", "a…
Observations
Sites are represented by a code which is prefixed with the country's ISO code, for example, a site in York, England, United Kingdom is identified as gb0919a
(the ISO code for the United Kingdom is non-standard and GB is for Great Britain). To get observations this site, use get_saq_observations
:
# Get air quality monitoring data for a York site
data_york <- get_saq_observations(site = "gb0919a", start = 2005)
# Glimpse tibble
glimpse(data_york)
#> Observations: 370,235
#> Variables: 10
#> $ date <dttm> 2008-01-01, 2008-01-02, 2008-01-03, 2008-01-04, 2008-…
#> $ date_end <dttm> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ site <chr> "gb0919a", "gb0919a", "gb0919a", "gb0919a", "gb0919a",…
#> $ variable <chr> "pm10", "pm10", "pm10", "pm10", "pm10", "pm10", "pm10"…
#> $ process <int> 62392, 62392, 62392, 62392, 62392, 62392, 62392, 62392…
#> $ summary <int> 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20…
#> $ validity <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, …
#> $ unit <chr> "µg/m3", "µg/m3", "µg/m3", "µg/m3", "µg/m3", "µg/m3", …
#> $ value <dbl> 21.625, 22.708, 24.667, 21.833, 24.000, 29.875, 16.833…
get_saq_observations
takes a vector of sites to import many sites at once. Beware that if a user stacks the sites, a lot of data can be returned. For example, using the two sites below returns a tibble/data frame/table with over 10 million observations.
# Get 10 million observations, verbose is used to give an indication on
# what is occuring
data_large_ish <- get_saq_observations(
site = c("gb0036r", "gb0682a"),
start = 1960,
verbose = TRUE
)
# Glimpse tibble
glimpse(data_large_ish)
#> Observations: 9,981,977
#> Variables: 9
#> $ date <dttm> 1995-09-11, 1995-09-12, 1995-09-13, 1995-09-14, 1995-…
#> $ date_end <dttm> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ site <chr> "gb0036r", "gb0036r", "gb0036r", "gb0036r", "gb0036r",…
#> $ variable <chr> "so2", "so2", "so2", "so2", "so2", "so2", "so2", "so2"…
#> $ process <int> 57295, 57295, 57295, 57295, 57295, 57295, 57295, 57295…
#> $ summary <int> 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20…
#> $ validity <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
#> $ unit <chr> "µg/m3", "µg/m3", "µg/m3", "µg/m3", "µg/m3", "µg/m3", …
#> $ value <dbl> 0.983, 0.792, 1.362, 0.483, 14.633, 1.171, 0.821, 15.2…
Cleaning observations
Once a data are imported, valid data for a certain averaging period/summary can be isolated with saq_clean_observations
. saq_clean_observations
can also "spread" data where the variable/pollutants become columns:
# Get only valid hourly data and reshape (spread)
data_york_spread <- data_york %>%
saq_clean_observations(summary = "hour", valid_only = TRUE, spread = TRUE)
# Glimpse tibble
glimpse(data_york_spread)
Processes
Information on the specific time series/processes can also be retrieved.
# Get processes
data_processes <- get_saq_processes()
# Glimpse tibble
glimpse(data_processes)
#> Observations: 171,992
#> Variables: 15
#> $ process <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,…
#> $ site <chr> "al0201a", "al0201a", "al0201a", "al0201a", "a…
#> $ variable <chr> "so2", "so2", "pm10", "pm10", "o3", "o3", "o3"…
#> $ variable_long <chr> "Sulphur dioxide (air)", "Sulphur dioxide (air…
#> $ period <chr> "day", "hour", "day", "hour", "day", "dymax", …
#> $ unit <chr> "ug.m-3", "ug.m-3", "ug.m-3", "ug.m-3", "ug.m-…
#> $ date_start <dttm> NA, 2011-01-01 00:00:00, 2011-01-01 00:00:00,…
#> $ date_end <dttm> NA, 2011-12-31 23:00:00, 2012-12-30 00:00:00,…
#> $ sample <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ sampling_point <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ sampling_process <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ observed_property <int> 1, 1, 5, 5, 7, 7, 7, 7, 8, 8, 9, 9, 10, 10, 10…
#> $ group_code <int> 100, 100, 100, 100, 100, 100, 100, 100, 100, 1…
#> $ data_source <chr> "airbase", "airbase", "airbase", "airbase", "a…
#> $ observation_count <dbl> 0, 6806, 729, 17336, 352, 352, 16413, 8358, 69…
Other metadata
Other helper tables are also available:
# Get other helper tables
# Summary integers
data_summary_integers <- get_saq_summaries() %>%
print(n = Inf)
#> # A tibble: 20 x 2
#> averaging_period summary
#> <chr> <int>
#> 1 hour 1
#> 2 day 20
#> 3 week 90
#> 4 var 91
#> 5 month 92
#> 6 fortnight 93
#> 7 3month 94
#> 8 2month 95
#> 9 2day 96
#> 10 3day 97
#> 11 2week 98
#> 12 4week 99
#> 13 3hour 100
#> 14 8hour 101
#> 15 hour8 101
#> 16 year 102
#> 17 dymax 21
#> 18 quarter 103
#> 19 other 91
#> 20 n-hour 104
# Validity integers
data_validity_integers <- get_saq_validity() %>%
print(n = Inf)
#> # A tibble: 6 x 4
#> validity valid description notes
#> <int> <lgl> <chr> <chr>
#> 1 NA FALSE data is considered to be invalid due to the… from aqer
#> 2 -1 FALSE invalid due to other circumstances or data … from aqer
#> 3 0 FALSE invalid smonitor nom…
#> 4 1 TRUE <NA> from aqer
#> 5 2 TRUE valid but below detection limit measurement… from aqer
#> 6 3 TRUE valid but below detection limit and number … from aqer
Simple annual and monthly means of observations
Simple annual and monthly means of the daily and hourly processes have also been generated. These summaries are often useful for trend analysis or mapping.
# Get annual means
data_annual <- get_saq_simple_summaries(summary = "annual_mean")
# Glimpse tibble
glimpse(data_annual)
#> Observations: 655,362
#> Variables: 8
#> $ date <dttm> 2013-01-01, 2014-01-01, 2015-01-01, 2016-01-01, …
#> $ date_end <dttm> 2013-12-31 23:59:59, 2014-12-31 23:59:59, 2015-1…
#> $ site <chr> "ad0942a", "ad0942a", "ad0942a", "ad0942a", "ad09…
#> $ variable <chr> "co", "co", "co", "co", "co", "co", "co", "no", "…
#> $ summary_source <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
#> $ summary <int> 102, 102, 102, 102, 102, 102, 102, 102, 102, 102,…
#> $ count <dbl> 1, 8438, 8385, 8171, 8441, 8217, 5990, 1, 8310, 8…
#> $ value <dbl> 0.5000000, 0.3224579, 0.3582230, 0.3168768, 0.259…
# What was York Fishergate's (hourly) PM10 concentraion in 2017?
data_annual %>%
filter(site == "gb0682a",
lubridate::year(date) == 2017L,
variable == "pm10",
summary_source == 1L) %>%
select(date,
site,
variable,
count,
value)
#> # A tibble: 1 x 5
#> date site variable count value
#> <dttm> <chr> <chr> <dbl> <dbl>
#> 1 2017-01-01 00:00:00 gb0682a pm10 8442 23.8
Owner metadata
- Name: Stuart Grange
- Login: skgrange
- Email:
- Kind: user
- Description: An environmental and data scientist with an air quality focus. Programmatic skills and atmospheric knowledge are used to answer tricky questions.
- Website: http://skgrange.github.io/
- Location: Dübendorf, Switzerland
- Twitter:
- Company:
- Icon url: https://avatars.githubusercontent.com/u/8771129?u=7eb49d6263395936a62865e574434136c6b89971&v=4
- Repositories: 30
- Last ynced at: 2024-06-11T15:38:24.771Z
- Profile URL: https://github.com/skgrange
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
Committers metadata
Last synced: 6 days ago
Total Commits: 44
Total Committers: 2
Avg Commits per committer: 22.0
Development Distribution Score (DDS): 0.068
Commits in past year: 3
Committers in past year: 1
Avg Commits per committer in past year: 3.0
Development Distribution Score (DDS) in past year: 0.0
Name | Commits | |
---|---|---|
skgrange | s****e@g****m | 41 |
skgrange | s****e@g****m | 3 |
Committer domains:
Issue and Pull Request metadata
Last synced: 2 days ago
Total issues: 13
Total pull requests: 0
Average time to close issues: 23 days
Average time to close pull requests: N/A
Total issue authors: 9
Total pull request authors: 0
Average comments per issue: 3.23
Average comments per pull request: 0
Merged pull request: 0
Bot issues: 0
Bot pull requests: 0
Past year issues: 1
Past year pull requests: 0
Past year average time to close issues: 1 day
Past year average time to close pull requests: N/A
Past year issue authors: 1
Past year pull request authors: 0
Past year average comments per issue: 3.0
Past year average comments per pull request: 0
Past year merged pull request: 0
Past year bot issues: 0
Past year bot pull requests: 0
Top Issue Authors
- SverreSolberg (3)
- BlaiseKelly (3)
- skgrange (1)
- pamvgomes (1)
- AdaCaoShiyu (1)
- t0270293 (1)
- MohoWu (1)
- FlorentinBulotAQ (1)
- FinkLea (1)
Top Pull Request Authors
Top Issue Labels
- question (1)
Top Pull Request Labels
Package metadata
- Total packages: 1
-
Total downloads:
- cran: 193 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 5
- Total maintainers: 1
cran.r-project.org: saqgetr
Import Air Quality Monitoring Data in a Fast and Easy Way
- Homepage: https://github.com/skgrange/saqgetr
- Documentation: http://cran.r-project.org/web/packages/saqgetr/saqgetr.pdf
- Licenses: GPL-3 | file LICENSE
- Latest release: 0.2.21 (published over 4 years ago)
- Last Synced: 2025-04-25T12:30:28.153Z (2 days ago)
- Versions: 5
- Dependent Packages: 0
- Dependent Repositories: 0
- Downloads: 193 Last month
-
Rankings:
- Forks count: 17.77%
- Stargazers count: 17.865%
- Average: 28.819%
- Dependent packages count: 29.797%
- Dependent repos count: 35.455%
- Downloads: 43.207%
- Maintainers (1)
Dependencies
- R >= 3.2.0 depends
- dplyr * imports
- httr * imports
- lubridate * imports
- magrittr * imports
- purrr * imports
- readr * imports
- stringr * imports
- tidyr * imports
- openair * suggests
Score: 8.358900612421644