GHCNData.jl

Helps access the Global Historical Climatological Network's daily data.
https://github.com/willtebbutt/ghcndata.jl

Category: Climate Change
Sub Category: Climate Data Access and Visualization

Keywords from Contributors

open-science

Last synced: about 2 hours ago
JSON representation

Repository metadata

Helps access the Global Historical Climatological Network's daily data

Host: GitHub
URL: https://github.com/willtebbutt/ghcndata.jl
Owner: willtebbutt
License: mit
Created: 2021-01-16T14:30:49.000Z (over 4 years ago)
Default Branch: master
Last Pushed: 2023-09-20T19:07:09.000Z (almost 2 years ago)
Last Synced: 2025-05-31T16:51:24.797Z (about 1 month ago)
Language: Julia
Homepage:
Size: 34.2 KB
Stars: 3
Watchers: 1
Forks: 1
Open Issues: 0
Releases: 6
Metadata Files:
- Readme: README.md
- License: LICENSE

GHCNData

Utility functionality to help getting hold of daily data from the Global Historical Climatology Network archive.

If you use this data, you should acknowledge it appropriately. Instruction for doing so can be found at the top of NOAA's readme.

Why Bother?

While the GHCN data is fairly straightforward, it's not as simple as just downloading a single file and opening it as a DataFrame.
There are a few different kinds of files that you need to be aware of, each of which has a well-documented but non-standard format.
As such, it makes sense to implement the functionality to load the files in a format more amenable to standard workflows.

Usage

Data Loading

This package provides helper functions to download and load the data offered by NOAA. There are four core functions that you should be aware of

load_station_metadata
load_inventories
load_data_file
load_countries_metadata

Each of these functions download the corresponding data using DataDeps.jl if it's not already available, and parses it into a DataFrame.

NOAA's documentation is the best place to look to understand these files, but the docstrings in this package provide a brief overview.

Typical Workflows

Commonly, you'll want to load all of the data associated with a particular collection of stations in a particular region of the world. There are basically two steps to do this:

Use load_inventories() to find out which stations exist at which latitudes / longitude, and their corresponding ID.
Use load_data_file(station_id) to load each station that you've found in your region of interest.

For an example of this kind of thing, see the code for select_data in dataset_loading.jl.

You might also be interested in, for example, the properties of the station in question (e.g. its elevation). For that data, use load_station_metadata().

Helper Functions

This package presently provides two bits of functionality to process the data a bit once it's been loaded.

select_data pretty much implements the workflow discussed above.

convert_to_time_series "stacks" the output of load_data_file, converting from 1 row == 1 month (different day's data live in different columns in the raw data), to a format in which 1 row == 1 day.

Both functions are quite opinionated, so while they're hopefully helpful examples of things that you might want to do with the GHCN data, you'll probably need to tweak them a bit for your use-case.

Missing Functionality and Contributing

If you build on this functionality, please consider contributing back so that we can make all of our lives easier! Similarly, please open an issue (or, even better, a PR) if you feel that something that would be useful is missing.

Development has been driven on an as-needed basis, so while this is package will grab most (all?) of the daily data for you, it is a little sparse on utility functionality.
In particular, please note that convert_to_time_series and select_data may not make assumptions about the data that are appropriate for your use case. If in doubt, I would recommend using the functionality in dataset_loading.jl, as it just provides helpful functionality to extract the data.

Moreover, it doesn't currently implement anything to grab or process the monthly data, but it should be a straightforward extension of the existing functionality to do so.

Bug Reporting

If you either find a bug, or think something looks suspicious, please open an issue / PR. When considering whether or not to open an issue / PR, note that it's generally better to open an issue erroneously (no harm is done if it turns out there wasn't a problem after all) than it is for a problem to slip by (data-related bugs cause papers to be retracted and generally hold back progress). If in doubt, open an issue.

Why are there so few tests?

Three of the four core functions listed above are lightly tested -- load_data_file has yet to be tested because, as presently implemented, the CI runner would need to download the entire collection of daily data for each run, which seems impractical. If you have any suggestions for how to alleviate this, please open an issue / PR!

Related Work

Scott Hosking provides similar functionality in a Python package.

Owner metadata

Name: Will Tebbutt
Login: willtebbutt
Email:
Kind: user
Description:
Website:
Location:
Twitter:
Company:
Icon url: https://avatars.githubusercontent.com/u/3628294?u=7263774afa695a7c72442f09af6c0ce995778cd2&v=4
Repositories: 8
Last ynced at: 2023-03-08T23:08:52.965Z
Profile URL: https://github.com/willtebbutt

GitHub Events

Total

Last Year

Committers metadata

Last synced: 7 days ago

Total Commits: 24
Total Committers: 5
Avg Commits per committer: 4.8
Development Distribution Score (DDS): 0.375

Commits in past year: 1
Committers in past year: 1
Avg Commits per committer in past year: 1.0
Development Distribution Score (DDS) in past year: 0.0

Name	Email	Commits
wt	w**t@i**k	15
WT	w**1@m**k	4
Lyndon White	l**e@i**k	3
willtebbutt	w**3@c**k	1
Lyndon White	o**x@u**u	1

Committer domains:

Issue and Pull Request metadata

Last synced: 3 days ago

Total issues: 2
Total pull requests: 7
Average time to close issues: 31 minutes
Average time to close pull requests: 4 months
Total issue authors: 2
Total pull request authors: 3
Average comments per issue: 4.5
Average comments per pull request: 1.0
Merged pull request: 6
Bot issues: 0
Bot pull requests: 1

Past year issues: 0
Past year pull requests: 0
Past year average time to close issues: N/A
Past year average time to close pull requests: N/A
Past year issue authors: 0
Past year pull request authors: 0
Past year average comments per issue: 0
Past year average comments per pull request: 0
Past year merged pull request: 0
Past year bot issues: 0
Past year bot pull requests: 0

More stats: https://issues.ecosyste.ms/repositories/lookup?url=https://github.com/willtebbutt/ghcndata.jl

Top Issue Authors

willtebbutt (1)
JuliaTagBot (1)

Top Pull Request Authors

willtebbutt (5)
github-actions[bot] (1)
oxinabox (1)

Top Issue Labels

Top Pull Request Labels

needs version bump (1)

Package metadata

Total packages: 3
Total downloads: unknown
Total dependent packages: 0 (may contain duplicates)
Total dependent repositories: 0 (may contain duplicates)
Total versions: 18

proxy.golang.org: github.com/willtebbutt/GHCNData.jl

Homepage:
Documentation: https://pkg.go.dev/github.com/willtebbutt/GHCNData.jl#section-documentation
Licenses: mit
Latest release: v0.1.5 (published almost 2 years ago)
Last Synced: 2025-07-10T02:01:31.401Z (1 day ago)
Versions: 6
Dependent Packages: 0
Dependent Repositories: 0
Rankings:
- Dependent packages count: 5.395%
- Average: 5.576%
- Dependent repos count: 5.758%

proxy.golang.org: github.com/willtebbutt/ghcndata.jl

Homepage:
Documentation: https://pkg.go.dev/github.com/willtebbutt/ghcndata.jl#section-documentation
Licenses: mit
Latest release: v0.1.5 (published almost 2 years ago)
Last Synced: 2025-07-10T02:01:33.165Z (1 day ago)
Versions: 6
Dependent Packages: 0
Dependent Repositories: 0
Rankings:
- Dependent packages count: 5.395%
- Average: 5.576%
- Dependent repos count: 5.758%

juliahub.com: GHCNData

Helps access the Global Historical Climatological Network's daily data

Homepage:
Documentation: https://docs.juliahub.com/General/GHCNData/stable/
Licenses: MIT
Latest release: 0.1.5 (published almost 2 years ago)
Last Synced: 2025-07-06T03:08:34.821Z (5 days ago)
Versions: 6
Dependent Packages: 0
Dependent Repositories: 0
Rankings:
- Dependent repos count: 9.94%
- Average: 35.529%
- Dependent packages count: 38.915%
- Forks count: 40.373%
- Stargazers count: 52.888%

Dependencies

.github/workflows/CompatHelper.yml actions

.github/workflows/TagBot.yml actions

JuliaRegistries/TagBot v1 composite

.github/workflows/VersionVigilante_pull_request.yml actions

actions/checkout v1.0.0 composite
actions/github-script 0.3.0 composite
julia-actions/setup-julia latest composite

.github/workflows/ci.yml actions

actions/cache v1 composite
actions/checkout v2 composite
codecov/codecov-action v1 composite
julia-actions/julia-buildpkg v1 composite
julia-actions/julia-processcoverage v1 composite
julia-actions/julia-runtest v1 composite
julia-actions/setup-julia v1 composite

Score: -Infinity

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Sustainable Technology