Recent Releases of weather-tools
weather-tools - v0.3.2
weather-mv
is much faster now & equipped with Cloud-Optimized Geotiffs (GOGs) ingestion. weather-dl
is enhanced to support MARS syntax in JSON config files and restriction for max-number of workers.
We're happy to welcome @deepgabani8 to the weather-tools dev team !
Current Status
weather-dl
: Fixes and parser system improvements
- Fixed error while parsing new-line separated date-values.
- JSON config files now support MARS syntax.
- New syntax supported: now, users can specify MARS range syntax in reverse orders as well (e.g. 2020-01-01/to/2018-01-01/by/-1).
- Prevent exhaustion of quotas: Based on current approach for the downloader, we've capped the max number of workers to N i.e. possible simultaneous requests + fudge factor.
weather-mv
: Performance improvements and support for COGs ingestion
- Substantial performance improvement !
- Added flag to control in-memory copying of dataset. By default the dataset is opened in-memory, the user can restrict it by passing the
--disable_in_memory_copy
flag. - Added validation to alert the user earlier that the BigQuery table and temp location (cloud bucket) need to be in the same region. Users can skip this validation by passing the
-s, --skip-region-validation
flag. - Added support for ingestion of COGs into BigQuery.
- Updated doc (README.md) of the tool to remove duplicate flags from sample examples.
General
- Fixed typo in contribution guide (CONTRIBUTING.md).
What's Changed
- Validates non-compatible regions scenarios in weather_mv tool. by @mahrsee1997 in https://github.com/google/weather-tools/pull/155
- Changes for #75, CLI-support for user controlled in-memory copy operation by @ksic8 in https://github.com/google/weather-tools/pull/154
- Candidate implementation to speed up row extraction. by @alxmrs in https://github.com/google/weather-tools/pull/146
- Fixed typo in CONTRIBUTING.md by @mahrsee1997 in https://github.com/google/weather-tools/pull/156
- Fix parsing newline separated date values by @mahrsee1997 in https://github.com/google/weather-tools/pull/161
- Support ingestion of COGs into BigQuery by @mahrsee1997 in https://github.com/google/weather-tools/pull/158
- DL: Add support for reverse order in MARS range syntax by @deepgabani8 in https://github.com/google/weather-tools/pull/162
- DL: Add support for MARS syntax in JSON config by @deepgabani8 in https://github.com/google/weather-tools/pull/163
- Removed duplicate flag from sample examples given in Readme.md of weather_mv tool. by @mahrsee1997 in https://github.com/google/weather-tools/pull/168
- Cap max number of workers in weather-dl. by @mahrsee1997 in https://github.com/google/weather-tools/pull/170
New Contributors
- @deepgabani8 made their first contribution in https://github.com/google/weather-tools/pull/162
Full Changelog: https://github.com/google/weather-tools/compare/v0.3.1...v0.3.2
Climate Change - Climate Data Processing and Analysis
- Python
Published by mahrsee1997 almost 3 years ago

weather-tools - v0.3.1
Improvements to the weather-dl
parser.
- Fixed a bug where numbers with leading zeros were not parsed (useful for date ranges)
- Correct additional issue for singleton partition values (e.g. get only one day of every month)
- New syntax added: now, users can specify
day=all
to get all the days in a month.
What's Changed
- New Syntax for download configs:
day=all
by @alxmrs in https://github.com/google/weather-tools/pull/150 - Proper handling of singleton partition dimensions. by @alxmrs in https://github.com/google/weather-tools/pull/151
- Incrementing weather-dl version to cover recent parser changes. by @alxmrs in https://github.com/google/weather-tools/pull/153
Full Changelog: https://github.com/google/weather-tools/compare/v0.3.0...v0.3.1
Climate Change - Climate Data Processing and Analysis
- Python
Published by alxmrs almost 3 years ago

weather-tools - v0.3.0
The weather splitter has a new API that allows for partitioning weather data by any dimension (we intentionally exclude lat/lngs). weather-dl
Now has a simpler, more pythonic interface for expressing target paths. The weather-mv
tool now supports dry runs and BigQuery geopoints.
We're happy to welcome @mahrsee1997 and @ksic8 to the weather-tools dev team!
Current Status
weather-dl
: Fixes and DSL usability improvements
- Specifying templates is much simpler. Only
target_path
is needed, and we fully support python string formatting syntax. - A significant error was fixed, and now downloads have better skipping and retry logic.
- Log ergonomics were improved by adding timestamps and removing needless warnings (thanks, @pbattaglia!).
- Internal code refactors were included to improve maintainability.
- Data source clients (now, only from ECMWF) includes important license information regarding terms of data use.
weather-mv
: Schema & usage improvements
- The default schemas were improved to include BigQuery Geography-type columns. Now, lat/lngs will be represented as
POINT
s. - The weather mover now has dry runs! Users will be able to preview their data ingestion into BigQuery before making use of infrastructure.
weather-sp
: Flexible splits
- A new version of the splitter was introduce to allow for flexible splits of weather data: Now, you can divide Grib and NetCDF data by any dimension except latitude and longitude (great work, @uhager!).
General
- Pip install instructions include debugging advice for long installs.
- We've removed open meetings from our contributing guide due to low attendance
What's Changed
- Fixes issue (127) where append_date_dirs wasn't working properly. by @pbattaglia in https://github.com/google/weather-tools/pull/130
- Added support for dry-runs to weather-mv tool by @mahrsee1997 in https://github.com/google/weather-tools/pull/132
- CONTRIBUTING guide changes for pip install command by @ksic8 in https://github.com/google/weather-tools/pull/137
- Code-changes for #49, added S2_LOCATION column by @ksic8 in https://github.com/google/weather-tools/pull/133
- License information added in client and documentation by @ksic8 in https://github.com/google/weather-tools/pull/136
- Added logger timestamp. by @pbattaglia in https://github.com/google/weather-tools/pull/131
- Robust download from clients to VMs. by @alxmrs in https://github.com/google/weather-tools/pull/143
- Suppress urllib3 warning by @pbattaglia in https://github.com/google/weather-tools/pull/134
- Unscheduling developer meetings. by @alxmrs in https://github.com/google/weather-tools/pull/145
- Flexible splits by @uhager in https://github.com/google/weather-tools/pull/125
- Changes to make 'target_path' and 'target_filename' compliant with Python's standard string formatting by @mahrsee1997 in https://github.com/google/weather-tools/pull/144
- Converted
Config
dict into dataclass by @mahrsee1997 in https://github.com/google/weather-tools/pull/142 - Netcdf splits by @uhager in https://github.com/google/weather-tools/pull/147
New Contributors
- @mahrsee1997 made their first contribution in https://github.com/google/weather-tools/pull/132
- @ksic8 made their first contribution in https://github.com/google/weather-tools/pull/137
Full Changelog: https://github.com/google/weather-tools/compare/v0.2.2...v0.3.0
Climate Change - Climate Data Processing and Analysis
- Python
Published by alxmrs about 3 years ago

weather-tools - v0.2.2
A re-release of v0.2.1.
Climate Change - Climate Data Processing and Analysis
- Python
Published by alxmrs about 3 years ago

weather-tools - v0.2.1
Improvements and bugfixes for all weather tools. weather-dl
is much faster & more robust. weather-mv
now uses a pluggable infrastructure, which makes iterations faster. weather-sp
is mid transition to arbitrary splits.
Thanks to our new OSS contributors, @pranay101 and @pbattaglia!
Current Status
weather-dl
: Major fixes
- This release introduces a fix to #98, which makes the downloader faster and more robust. With this change, there is no need to override the autoscaling algorithm – so, it now has less moving parts.
- Uploads have better retry logic. Users should experience less crashes from network errors in the pipeline.
- The downloader structure has been refactored to be more testable.
- Examples of JSON configs were added.
- Address a critical
NameError
bug that occurred during a refactor.
weather-mv
: Refactor
- The mover has been refactored to use a pluggable infrastructure. This makes it easier to develop local runs, and to write weather data to other sources besides BQ.
weather-sp
: Skipping logic
- A non-API changing feature has been added to the splitter: now, already splitted data will be skipped. Users can override this feature with
-f,--force
. - The documentation for the splitter has been improved.
General
The release process now produces smaller binaries (we're now ignoring test data).
What's Changed
- Added example JSON config by @pranay101 in https://github.com/google/weather-tools/pull/52
- Refactored weather-mv to work with pluggable Data Sinks. by @alxmrs in https://github.com/google/weather-tools/pull/101
- Update weather-sp's templating system to allow users to specify level and shortname. by @alxmrs in https://github.com/google/weather-tools/pull/105
- Upload to cloud is robust to socket timeout errors. by @alxmrs in https://github.com/google/weather-tools/pull/110
- Fix wrong output file example in weather-sp readme. by @uhager in https://github.com/google/weather-tools/pull/111
- Added skipping logic to weather-sp by @alxmrs in https://github.com/google/weather-tools/pull/108
- Lower default num-requests for MARS to make it more robust. by @alxmrs in https://github.com/google/weather-tools/pull/113
- New data-oriented task distribution strategy. by @alxmrs in https://github.com/google/weather-tools/pull/116
- Downloader refactor: extracted out partitioning; tested pipeline args. by @alxmrs in https://github.com/google/weather-tools/pull/117
- Fix minor bug: main session needs to be saved. by @alxmrs in https://github.com/google/weather-tools/pull/120
- (#123) Fixed beam not being able to access global namespace + minor related bug. by @pbattaglia in https://github.com/google/weather-tools/pull/124
- Shrinking the size of the package release artifacts by @alxmrs in https://github.com/google/weather-tools/pull/122
New Contributors
- @pranay101 made their first contribution in https://github.com/google/weather-tools/pull/52
- @pbattaglia made their first contribution in https://github.com/google/weather-tools/pull/124
Full Changelog: https://github.com/google/weather-tools/compare/v0.2.0...v0.2.1
Climate Change - Climate Data Processing and Analysis
- Python
Published by alxmrs about 3 years ago

weather-tools - v0.2.0
New version of weather-sp
. Fixes and improvements to weather-dl
and weather-mv
.
Thanks to our volunteer open source contributors and Google 20%ers!
Current State
All three tools are still in their beta and alpha stages. In this release, the stability of weather-mv
was especially improved. We've been able to execute streaming ingestion of Grib data into BigQuery. Users of weather-sp
will now have greater control to express the output location of split files through a file pattern template.
weather-dl
: Minor fixes
- We fixed GCS timeout issues experienced intermittently.
- Issue with mandatory partition keys was fixed.
weather-mv
: Major fixes for tool stability
- Grib support added.
- Row extraction is faster by loading weather data into memory.
- Log messages were improved.
- Writes to BigQuery will use the most efficient method (streaming vs file upload).
- XArray Open step is made generic.
- Several fixes were introduced.
- JSON serialization fixes.
- Dataflow environment will now include get ecCodes installed so we can run cfgrib.
- Tarballs are smaller / faster to upload to Dataflow (or another Beam runner).
- BigQuery write errors were fixed.
weather-sp
: New version
The splitter now supports flexible specification of output files.
General project improvements
- Documentation was groomed.
- Windows developer pathway was documented.
- Fix in developer scripts (now we can better dev-test different branches of the project) and slow CI.
- Announced open developer meetings.
What's Changed
weather-dl
: Fix GCS timeout issues the pipelines intermittently experiences. by @alxmrs in https://github.com/google/weather-tools/pull/72- Improve grib file processing speed by @pramodg in https://github.com/google/weather-tools/pull/74
- Default behavior is better by @lakshmanok in https://github.com/google/weather-tools/pull/77
- Updating script to use new package name by @CillianFn in https://github.com/google/weather-tools/pull/79
- Better progress logs for
weather-mv
. by @alxmrs in https://github.com/google/weather-tools/pull/82 weather-mv
fix: Serializing all numpy float and int types to JSON. by @alxmrs in https://github.com/google/weather-tools/pull/83- Documented windows workaround. by @alxmrs in https://github.com/google/weather-tools/pull/85
- Updated
weather-mv
install process to setup ecCodes on worker machine. by @alxmrs in https://github.com/google/weather-tools/pull/86 - Groomed documentation by @alxmrs in https://github.com/google/weather-tools/pull/88
- Coercing timedelta to float by @alxmrs in https://github.com/google/weather-tools/pull/89
weather-mv
: Allow users to pass in keyword arguments to xarray.open_dataset by @alxmrs in https://github.com/google/weather-tools/pull/87- weather-splitter: allow for more flexible output files by @uhager in https://github.com/google/weather-tools/pull/65
- Fix slow test runs by @CillianFn in https://github.com/google/weather-tools/pull/92
- Add check for partition_keys when using append_date_dirs by @CillianFn in https://github.com/google/weather-tools/pull/90
- Exclude test data from tarball by @CillianFn in https://github.com/google/weather-tools/pull/93
weather-mv
– Fixed error writing to BigQuery: Excluding non-coordinate indexes if they don't appear in the Schema by @alxmrs in https://github.com/google/weather-tools/pull/95- Updating tool versions in prep for release. by @alxmrs in https://github.com/google/weather-tools/pull/97
- Announcing open developer meetings. by @alxmrs in https://github.com/google/weather-tools/pull/96
New Contributors
- @uhager made their first contribution in https://github.com/google/weather-tools/pull/65
Full Changelog: https://github.com/google/weather-tools/compare/v0.1.1...v0.2.0
Climate Change - Climate Data Processing and Analysis
- Python
Published by alxmrs about 3 years ago

weather-tools - Hotfix for issue found in `weather-mv`.
What's Changed
weather-mv
: Fixed variable referenced before assignment. by @alxmrs in https://github.com/google/weather-tools/pull/71
Full Changelog: https://github.com/google/weather-tools/compare/v0.1.0...v0.1.1
Climate Change - Climate Data Processing and Analysis
- Python
Published by alxmrs over 3 years ago

weather-tools - Initial Release of weather-tools
The inaugural release of weather-tools
.
Current State
Currently, there are three tools in development: weather-dl
, weather-mv
, and weather-sp
. The first tool is in its beta stage, and the latter two are in alpha. Since this is the start of the project's changelog, I will now quickly summarize the features of each tool:
weather-dl
: the Weather Downloader
Weather Downloader ingests weather data to cloud buckets.
- Downloads weather data from ECMWF through their MARS and CDS APIs.
- Supports pipeline Dry-runs.
- Downloads are filesystem agnostic. Data can be ingested to GCS, S3, Azure Blobstore, or a local filesystem.
- Manifests of downloads are recoded in Firebase.
- A ConfigParser-based DSL lets users select data to download and control how data is sharded in a general manner.
weather-mv
: the Weather Mover
Weather Mover loads weather data from cloud storage into Google BigQuery.
- Weather data from any filesystem can be uploaded in batch to Google BigQuery.
- Both NetCDF and Grib data are explicitly supported. Later, any XArray-readable dataset will be supported.
- All rows include an "import time" to keep track of when the data was ingested.
- Weather data can be filtered by geographic area or by variable type.
- Supports inference if BigQuery Schema from parts of the dataset.
- Streaming pipelines for ingesting real-time data into BigQuery is supported.
weather-sp
: the Weather Spitter
Splits NetCDF and Grib files into several files by variable.
- NetCDF and Grib data splitting is supported.
- Grib data is split by variable and leveltype.
- Buckets with mixtures of data types (Grib and NetCDF) can be processed at once.
- The root of the output path is computed for you; users have control over the parent directory.
- Dry-runs of splits are supported.
Recent Changes
- Adding back an example config. by @alxmrs in https://github.com/google/weather-tools/pull/30
- Handle NaNs in data by @pramodg in https://github.com/google/weather-tools/pull/33
- Add utf-8 encoding to file read in setup by @CillianFn in https://github.com/google/weather-tools/pull/36
- Bump urllib3 from 1.25.11 to 1.26.5 in /weather_dl by @dependabot in https://github.com/google/weather-tools/pull/37
- Support un-indexed / single valued coordinates. by @pramodg in https://github.com/google/weather-tools/pull/39
- Set up empty dataset, not table by @lakshmanok in https://github.com/google/weather-tools/pull/41
- Docs fix - typo & stale links by @CillianFn in https://github.com/google/weather-tools/pull/44
- Basic support for grib files. by @pramodg in https://github.com/google/weather-tools/pull/40
- Test example configs by @CillianFn in https://github.com/google/weather-tools/pull/56
- Read the docs config by @CillianFn in https://github.com/google/weather-tools/pull/61
weather-mv
: Now using Streaming Inserts into BQ by @alxmrs in https://github.com/google/weather-tools/pull/62weather-mv
: Implemented streaming import of data into BigQuery. by @alxmrs in https://github.com/google/weather-tools/pull/58- Added script to help contributors test each other's updates to weather-tools. by @alxmrs in https://github.com/google/weather-tools/pull/63
- Github action to publish package by @saveriogzz in https://github.com/google/weather-tools/pull/31
- Updated python package name to
google-weather-tools
. by @alxmrs in https://github.com/google/weather-tools/pull/67 - Updated the standard example configs to use Reanalyses instead of Ensemble Means. by @alxmrs in https://github.com/google/weather-tools/pull/66
- Setting initial versions of each weather-tool. by @alxmrs in https://github.com/google/weather-tools/pull/68
New Contributors
- @pramodg made their first contribution in https://github.com/google/weather-tools/pull/33
- @CillianFn made their first contribution in https://github.com/google/weather-tools/pull/36
- @dependabot made their first contribution in https://github.com/google/weather-tools/pull/37
- @lakshmanok made their first contribution in https://github.com/google/weather-tools/pull/41
- @saveriogzz made their first contribution in https://github.com/google/weather-tools/pull/31
Full Changelog: https://github.com/google/weather-tools/commits/v0.1.0
Climate Change - Climate Data Processing and Analysis
- Python
Published by alxmrs over 3 years ago
