A curated list of open technology projects to sustain a stable climate, energy supply, biodiversity and natural resources.

OpenAQ Data Ingest Pipeline

A tool to collect data for OpenAQ platform.
https://github.com/openaq/openaq-fetch

Category: Natural Resources
Sub Category: Air Quality

Keywords

air-quality

Keywords from Contributors

mapbox-gl openaq aqi

Last synced: about 17 hours ago
JSON representation

Repository metadata

A tool to collect data for OpenAQ platform.

README.md

OpenAQ Data Ingest Pipeline

Build Status

Overview

This is the main data ingest pipeline for the OpenAQ project.

Starting with index.js, there is an ingest mechanism to gather global air quality measurements from a variety of sources. This is currently run every 10 minutes and saves all unique measurements to a database.

openaq-api-v2 powers the API, and more information on the data format can be found in openaq-data-format.

For more info, see the OpenAQ-Fetch documentation index.

Installing & Running

To run the API locally, you will need Node.js installed.

Install necessary Node.js packages by running

npm install

Now you can get started with:

node index.js --help

For production deployment, you will need to have certain environment variables set as in the table below:

Name Description Default
API_URL URL of openaq-api http://localhost:3004/v1/webhooks
WEBHOOK_KEY Secret key to interact with openaq-api '123'
EEA_TOKEN API token for EEA API not set
DATA_GOV_IN_TOKEN API token for data.gov.in not set
EPA_VICTORIA_TOKEN API token for portal.api.epa.vic.gov.au not set
EEA_GLOBAL_TIMEOUT How long to check for EEA async results before quitting in seconds 360
EEA_ASYNC_RECHECK How long to wait to recheck for EEA async results in seconds 60
SAVE_TO_S3 Does the process save the measurements to an AWS S3 Bucket not set

For full list of environment variables and process arguments, see environment documentation.

Pushing to AWS S3

If you want to push results to an S3 bucket as well for further processing, the environment variable SAVE_TO_S3 should be set to the value true. Additionally, you have to set the following environment variables (or be running in a process with a suitable IAM role):

Name Description Default
AWS_BUCKET_NAME AWS Bucket to store the results not set
AWS_ACCESS_KEY_ID AWS Credentials key ID not set
AWS_SECRET_ACCESS_KEY AWS Credentials secret key not set

The measurements will be stored using the structure bucket_name/fetches/yyyy-mm-dd/unixtime.ndjson for each fetch.

Tests

To confirm that everything is working as expected, you can run the tests with

npm test

To test an individual adapter, you can use something like:

node index.js --dryrun --source 'Beijing US Embassy'

For a more detailed description of the command line options available, use: node index.js --help

Deployment

Deployment is is being built from the lambda-deployment branch. Any development for openaq-fetch should be branched/merged from/to the lambda-deployment branch until further notice.

Deployments rely on a json object that contains the different deployments. The schedular is then used to loop through that object and post a message that will trigger a lambda to run that deployment. A deployment consists of a set of arguments that are passed to the fetch script to limit the sources that are run.

You can test the deployments with the following

Show all deployments but dont submit and dont run the fetcher
node index.js --dryrun --deployments all --nofetch
Only the japan deployment but dont run the fetcher
node index.js --dryrun --deployments japan --nofetch

Only the japan deployment, dont submit a file but run the fetcher
node index.js --dryrun --deployments japan

Data Source Criteria

This section lists the key criteria for air quality data aggregated onto the platform. A full explanation can be accessed
here. OpenAQ is an ever-evolving process that is shaped by its community: your
feedback and questions are actively invited on the criteria listed inthis section.

  1. Data must be of one of these pollutant types: PM10, PM2.5, sulfur dioxide (SO2), carbon monoxide (CO), nitrogen dioxide (NO2), ozone (O3), and black carbon (BC).

  2. Data must be from an official-level outdoor air quality source, as defined as data produced by a government entity or international organization. We do not, at this stage, include data from low-cost, temporary, and/or indoor sensors.

  3. Data must be ‘raw’ and reported in physical concentrations on their originating site. Data cannot be shared in an 'Air Quality Index' or equivalent (e.g. AQI, PSI, API) format.

  4. Data must be at the ‘station-level,’ associable with geographic coordinates, not aggregated into a higher (e.g. city) level.

  5. Data must be from measurements averaged between 10 minutes and 24 hours.

Contributing

There are many ways to contribute to this project, more details can be found in the contributing guide.


Owner metadata


GitHub Events

Total
Last Year

Committers metadata

Last synced: 6 days ago

Total Commits: 664
Total Committers: 31
Avg Commits per committer: 21.419
Development Distribution Score (DDS): 0.711

Commits in past year: 114
Committers in past year: 3
Avg Commits per committer in past year: 38.0
Development Distribution Score (DDS) in past year: 0.404

Name Email Commits
Joe Flasher j****r@g****m 192
Gabriel Fosse 6****o 90
Rub21 r****n@d****g 69
Dolugen Buuralda d****n@g****m 56
Christian Parker c****r@g****m 45
Andrew Harvey a****w@a****u 42
sruti s****i@o****g 33
Christa Hasenkopf i****o@O****g 19
Christian Parker c****r@t****m 17
Russ Biggs r****s@g****m 14
Christa Hasenkopf c****a@o****g 13
yunica f****1@g****m 13
Olaf Veerman o****n@g****m 13
Budleigh Salterton cz@s****m 10
magsyg 5****g 7
Aimee Barciauskas a****e@d****g 6
sethvincent s****t@g****m 5
Max Grossman m****n 3
T K Sourabh s****7@g****m 2
Marc Farra m****a@g****m 2
Dan Butvinik d****k@g****m 2
AreteY A****Y 2
Alex Dunn a****e@g****m 1
Brian Seok b****k@f****v 1
Bryce Christensen b****e@e****m 1
Chris Hagerbaumer 1****a 1
Eli Litwack e****k 1
Jo Torsmyr t****t 1
John Huang y****g@f****w 1
nishadhka n****a@g****m 1
and 1 more...

Committer domains:


Issue and Pull Request metadata

Last synced: 1 day ago

Total issues: 381
Total pull requests: 736
Average time to close issues: about 1 year
Average time to close pull requests: about 1 month
Total issue authors: 45
Total pull request authors: 40
Average comments per issue: 2.91
Average comments per pull request: 0.88
Merged pull request: 588
Bot issues: 0
Bot pull requests: 30

Past year issues: 1
Past year pull requests: 0
Past year average time to close issues: N/A
Past year average time to close pull requests: N/A
Past year issue authors: 1
Past year pull request authors: 0
Past year average comments per issue: 0.0
Past year average comments per pull request: 0
Past year merged pull request: 0
Past year bot issues: 0
Past year bot pull requests: 0

More stats: https://issues.ecosyste.ms/repositories/lookup?url=https://github.com/openaq/openaq-fetch

Top Issue Authors

  • RocketD0g (125)
  • jflasher (68)
  • russbiggs (33)
  • dolugen (20)
  • sruti (18)
  • majesticio (14)
  • maelle (11)
  • caparker (10)
  • andrewharvey (9)
  • magsyg (9)
  • olafveerman (8)
  • nishadhka (5)
  • jobonaf (5)
  • urbanemissions (5)
  • Rub21 (3)

Top Pull Request Authors

  • jflasher (272)
  • majesticio (133)
  • Rub21 (53)
  • sruti (48)
  • dolugen (34)
  • dependabot[bot] (30)
  • olafveerman (25)
  • RocketD0g (23)
  • magsyg (22)
  • russbiggs (14)
  • caparker (13)
  • MichalCz (13)
  • andrewharvey (11)
  • yunica (6)
  • maxgrossman (5)

Top Issue Labels

  • new data (133)
  • help wanted (97)
  • bug (60)
  • high priority (43)
  • needs investigation (42)
  • enhancement (36)
  • ready for dev (25)
  • good for new contributors (23)
  • needs review (19)
  • on hold (15)
  • question (9)
  • medium priority (6)
  • WRI-NASA 2020 Project (5)
  • Covid-19 Priority (4)
  • failing source (3)
  • has-pr (2)
  • invalid (2)
  • Hacktoberfest2020 (2)
  • City of LA - PWWB Project (2)
  • wontfix (1)

Top Pull Request Labels

  • dependencies (30)
  • on hold (3)

Dependencies

package-lock.json npm
  • 547 dependencies
package.json npm
  • chai ^4.1.2 development
  • eslint ^4.19.1 development
  • eslint-config-standard ^10.2.1 development
  • eslint-plugin-import ^2.14.0 development
  • eslint-plugin-node ^5.2.1 development
  • eslint-plugin-promise ^3.8.0 development
  • eslint-plugin-standard ^3.1.0 development
  • mocha ^5.2.0 development
  • shell-escape ^0.2.0 development
  • JSONStream ^1.3.4
  • adm-zip ^0.4.11
  • async ^2.6.1
  • aws-sdk ^2.305.0
  • babel-preset-node8 ^1.2.0
  • babel-register ^6.26.0
  • bottleneck ^2.19.5
  • byline ^5.0.0
  • cheerio ^1.0.0-rc.2
  • coordinate-parser ^1.0.2
  • csv-parse ^3.0.0
  • ftp ^0.3.10
  • iconv ^2.3.0
  • jsonschema ^1.2.0
  • knex ^0.15.2
  • knex-postgis ^0.2.2
  • lodash ^4.17.10
  • moment ^2.22.2
  • moment-timezone ^0.5.21
  • pg ^7.4.3
  • proj4 ^2.3.14
  • proj4js-defs 0.0.1
  • request ^2.88.0
  • request-promise-native ^1.0.5
  • require-dir ^1.0.0
  • s3-upload-stream ^1.0.7
  • scramjet ^4.19.0
  • ssl-root-cas ^1.3.1
  • transliteration ^1.6.6
  • tz-lookup ^6.1.8
  • winston ^2.4.4
  • winston-papertrail ^1.0.2
  • yargs ^12.0.1
Dockerfile docker
  • ubuntu 14.04 build
.github/workflows/ci.yml actions
  • actions/checkout v3 composite
  • thollander/actions-comment-pull-request v2 composite
.github/workflows/deploy.yml actions
  • actions/checkout v3 composite
  • actions/setup-node v2 composite
  • aws-actions/configure-aws-credentials master composite
cdk/package.json npm
  • @types/node ^18.15.11 development
  • aws-cdk-lib ^2.72.1 development
  • constructs ^10.0.0 development
  • ts-node ^10.9.1 development
  • typescript ^5.0.3 development
cdk/yarn.lock npm
  • @aws-cdk/asset-awscli-v1 2.2.129
  • @aws-cdk/asset-kubectl-v20 2.1.1
  • @aws-cdk/asset-node-proxy-agent-v5 2.0.105
  • @balena/dockerignore 1.0.2
  • @cspotcode/source-map-support 0.8.1
  • @jridgewell/resolve-uri 3.1.0
  • @jridgewell/sourcemap-codec 1.4.14
  • @jridgewell/trace-mapping 0.3.9
  • @tsconfig/node10 1.0.9
  • @tsconfig/node12 1.0.11
  • @tsconfig/node14 1.0.3
  • @tsconfig/node16 1.0.3
  • @types/node 18.15.11
  • acorn 8.8.2
  • acorn-walk 8.2.0
  • ajv 8.12.0
  • ansi-regex 5.0.1
  • ansi-styles 4.3.0
  • arg 4.1.3
  • astral-regex 2.0.0
  • at-least-node 1.0.0
  • aws-cdk-lib 2.72.1
  • balanced-match 1.0.2
  • brace-expansion 1.1.11
  • case 1.6.3
  • color-convert 2.0.1
  • color-name 1.1.4
  • concat-map 0.0.1
  • constructs 10.1.302
  • create-require 1.1.1
  • diff 4.0.2
  • emoji-regex 8.0.0
  • fast-deep-equal 3.1.3
  • fs-extra 9.1.0
  • graceful-fs 4.2.11
  • ignore 5.2.4
  • is-fullwidth-code-point 3.0.0
  • json-schema-traverse 1.0.0
  • jsonfile 6.1.0
  • jsonschema 1.4.1
  • lodash.truncate 4.4.2
  • lru-cache 6.0.0
  • make-error 1.3.6
  • minimatch 3.1.2
  • punycode 2.3.0
  • require-from-string 2.0.2
  • semver 7.3.8
  • slice-ansi 4.0.0
  • string-width 4.2.3
  • strip-ansi 6.0.1
  • table 6.8.1
  • ts-node 10.9.1
  • typescript 5.0.3
  • universalify 2.0.0
  • uri-js 4.4.1
  • v8-compile-cache-lib 3.0.1
  • yallist 4.0.0
  • yaml 1.10.2
  • yn 3.1.1
src/yarn.lock npm
  • 738 dependencies
yarn.lock npm
  • 771 dependencies
docker-compose.yml docker

Score: 8.766705997750515