Recent Releases of MetObs-toolkit

MetObs-toolkit - v1.1.0

Metobs-toolkit v1.1.0

This release strengthens the toolkit’s quality-control core, with important changes to how outliers are stored and merged, and a substantial update to the buddy check algorithm and behavior. It also includes CI hardening, developer workflow improvements, and a few targeted user-facing fixes.

What's Changed

Quality control core

  • Outlier storage was revised to be more robust and less duplicative

    • QC results from repeated runs of the same check are now merged instead of creating duplicate entries.
    • Existing QC results are updated when newly flagged timestamps were previously marked as passed or unchecked.
    • Outlier details are updated only where needed.
    • Only genuinely new outlier timestamps are added to the internal outlier bin.
  • Buddy check was reworked at its core

    • The buddy check now evaluates each station individually as the center of its own buddy group, rather than flagging only the worst observation from a larger grouped context.
    • This is a meaningful behavior change: the same dataset can produce a different outlier set than in older releases.
    • The default scoring method is now the robust z-score approach using median / MAD, instead of the classical mean / standard deviation workflow.
    • Previous-iteration outliers are converted to NaN before the next iteration, so they no longer influence subsequent samples.
    • The implementation now builds a wide observation table, applies optional lapse-rate correction, and then iterates station-by-station through buddy-group testing.

Buddy check improvements and extensions

  • Added support for more explicit buddy-group controls:
    • min_buddy_distance
    • max_sample_size
    • max_alt_diff
  • Safety nets remain available and are applied to the currently flagged outliers after each iteration.
  • White-listed records now explicitly:
    • participate in buddy-check calculations
    • but are not flagged as outliers in the final result
  • Added tests for:
    • buddy-group filtering behavior
    • robust vs non-robust z-score behavior
    • no-outlier scenarios
    • edge cases in safety-net and sample handling

User-facing fixes

  • Fixed analysis.plot_diurnal_cycle_with_reference_station() when using the default colorby="name".
  • Added manual post-import site coordinate setters:
    • Site.set_latitude()
    • Site.set_longitude()
  • Improved handling and messaging around Google Earth Engine authentication.

Compatibility, CI and maintenance

  • Added an explicit backward compatibility check in CI.
  • Added checksum verification for stored compatibility baseline pickle artifacts.
  • Added dedicated tests for:
    • module sanity
    • backward compatibility
  • Improved development workflow logging and pipeline structure.
  • Dependency audit workflow updated.

Docs and maintenance

  • Updated notebooks and documentation configuration.
  • Removed the DOI badge from the README.
  • Removed the legacy draft PDF workflow.

Important behavior changes

  • Buddy check results may differ from previous releases because the algorithm now evaluates every station independently as the center of its buddy group.
  • Robust z-score is now the default for buddy checking.
    • Use use_z_robust_method=False if you want the classical mean/std behavior.
  • min_std was renamed to min_sample_spread.
  • The new min_buddy_distance parameter can exclude stations that are too close from both spatial and safety-net buddy groups.

Included pull requests

  • #628 Include a backwards compatibility check
  • #627 Fix deps vurnelabilities
  • #624 Fix KeyError when colorby='name' in plot_diurnal_cycle_with_reference_station
  • #621 Add manual Site latitude/longitude setters for stations imported without metadata
  • #619 Avoid calling the ._credentials attribute on gee obj
  • #617 Dev
  • #616 Fixed typo in the quality control demo

Suggested short version for GitHub Releases

Highlights

  • Revised outlier storage so repeated QC runs merge results instead of duplicating them
  • Reworked the buddy check core to evaluate every station as the center of its own buddy group
  • Changed buddy check default statistics to robust median/MAD z-scores
  • Improved handling of iterative outlier removal, white-listed records, and safety nets
  • Added CI backward compatibility protection and checksum validation
  • Fixed plot_diurnal_cycle_with_reference_station(..., colorby="name")
  • Added Site.set_latitude() and Site.set_longitude()

If you want, I can also turn this into a fully polished GitHub-ready release text with a more formal tone and less technical detail for end users.

Also: the code search I used is limited to the first set of results, so there may be additional related changes not shown here. You can inspect more in the GitHub UI for the repo code and releases:

  • Releases: /vergauwenthomas/MetObs_toolkit/releases
  • Code search in repo: /vergauwenthomas/MetObs_toolkit/search?q=outlier+OR+buddy+check+OR+buddy_check+OR+outliers&type=code

What's Changed

Full Changelog: https://github.com/vergauwenthomas/MetObs_toolkit/compare/v1.0.3...v1.1.0

Atmosphere - Meteorological Observation and Forecast - Python
Published by vergauwenthomas 5 days ago

MetObs-toolkit - v1.0.3

What's Changed

  • Fix of the AttributeError: module 'ee.data' has no attribute '_credentials' (see #618 ) bug

Full Changelog: https://github.com/vergauwenthomas/MetObs_toolkit/compare/v1.0.2...v1.0.3

Atmosphere - Meteorological Observation and Forecast - Python
Published by vergauwenthomas 3 months ago

MetObs-toolkit - v1.0.2

What's Changed

Full Changelog: https://github.com/vergauwenthomas/MetObs_toolkit/compare/v0.4.7...v1.0.2

Atmosphere - Meteorological Observation and Forecast - Python
Published by vergauwenthomas 3 months ago

MetObs-toolkit - v1.0.0

What's Changed

Full Changelog: https://github.com/vergauwenthomas/MetObs_toolkit/compare/v0.4.7...v1.0.0

Atmosphere - Meteorological Observation and Forecast - Python
Published by vergauwenthomas 6 months ago

MetObs-toolkit - v0.4.7

Summary

This release contains a set of improvements, bug fixes, API refinements and tests. Key themes: gap-filling refactor and robustness improvements, new plotting helpers (pandas-backed), better modeldata filtering and selection, GEE authentication/test helpers, logging hardening, improved distance-matrix/buddy-check logic, and added min/max constraints to gap-fills.
Version bump: 0.4.6 β†’ 0.4.7

Highlights

Gap handling refactor

A gap overview API (gap_overview_df / gap_status_overview_df) providing concise, one-row-per-gap summaries at SensorData / Station / Dataset levels.
Default and validation behaviour changed: gap-size checks added and new parameter max_gap_duration_to_fill controls whether a gap is allowed to be filled (defaults adjusted to make behaviour more intuitive).
New gap statuses and logic: a "partially successful gapfill" status is introduced; gap flagging logic updated to treat partially successful gaps more intuitively for sequential gapfilling.
Many gapfill methods refactored to accept/propagate max_gap_duration_to_fill and optional min_value / max_value constraints.
Gap-filling value constraints

New support for min_value and max_value in core gap-fill paths (raw, debiased, diurnal debiased, weighted diurnal).
Filled values can be clipped to prevent unphysical results; tests added for these constraints.
Internal fill functions were updated to accept min/max (e.g. fill_regular_debias, fill_with_diurnal_debias, fill_with_weighted_diurnal_debias).
Model-data selection & plotting

New helper filter_modeldatadf for robust filtering of the modeldata DataFrame by obstype, modelname, modelvariable; used internally by plotting functions.
Station/Dataset plotting improvements:
modeldata_name and modeldata_kwargs added to make_plot to select specific modeldata series for plotting.
A new parameter modeltype adds the ability to select a different model data "type" than the obstype if needed (defaults to obstype).
New convenient pandas-backed plotting helpers:
ModelTimeSeries.pd_plot (wrapper around pandas.Series.plot for model timeseries)
SensorData.pd_plot (wrapper around pandas.Series.plot for sensordata, with label filtering support)
Plotting internals refactored to expose these simpler pd-plot entrypoints.
Tests and baselines for the new pd plots and modeldata plotting added.
New and improved utilities

convert_to_numeric_series added (and integrated into dataset and sensordata import paths) to handle values that use comma as decimal separator.
Timestamp and xarray conversion fixes: timedelta and timestamp attrs serialized in xarray conversions; improved netCDF engine handling (netcdf4 selected by default unless overridden) to avoid Unicode issues.
New dev/test tooling files added for GEE: a script to test GEE authentication environment (deployment/test_gee_auth.py) and updates to CI/dev pipeline scripts.
GEE and geemap

GEE initialization/auth flow improved: try default initialization first; if that fails, fall back to authenticate. Added handling/tests for known EarthEngine/gee changes in the test pipeline.
Dependency pinning: earthengine-api pinned to <=1.6.11 due to compatibility with geemap 0.35.3.
Logging improvements

Logging module now avoids creating duplicate FileHandlers / StreamHandlers. Existing handlers are checked for duplicate filepath/level before adding new handlers.
Buddy check & distance matrix

Buddy-check fixes: bug fixes and improved messaging when joining duplicate messages in the buddy-check loop; new tests added to cover edge cases.
Distance matrix now uses BallTree with haversine metric for better performance and correctness at scale. A separate helper generate_distance_matrix was added.
Docs, examples and tests

Bug fixes (representative)

Fixed gap-filling logic edge cases and gap-size validation (avoid filling overly large gaps by default).
Handled unicode / netcdf engine issues when saving netCDF (default to netcdf4).
Fixed bug in filtering of model data frame used for plotting and selection.
Fixed buddy-check duplicate message/iteration bug and added tests that reproduce triggers.
Fixed handling of comma-as-decimal when importing datasets.
Fixed geemap-related test and notebook display issues (closing figures after comparison).
API / Behaviour changes (important for users)

Station.modeldata: function/return types and usage were adjusted. Model data selection APIs were improved; a helper filter_modeldatadf was added to reliably extract model rows from the model datadf. Check your code if you iterate over station.modeldata or used its type expectations.
New/pushed parameters and renamed args:
Most gapfill and interpolation methods changed from "max_consec_fill" (count-based) to "max_gap_duration_to_fill" (duration-based, independent of dt resolution). Defaults changed (common defaults set to 3h for interpolation and 12h for model-based fills).
Many Dataset/Station/SensorData gapfill methods now accept optional min_value and max_value arguments (to constrain filled values).
Dataset/Station/SensorData now expose gap_overview_df methods (returning a compact per-gap summary).
ModelTimeSeries.pd_plot and SensorData.pd_plot now exist as convenience wrappers.
GEE: connect_to_gee flow attempts initialization first, and authenticates only if necessary. Tests added to check local credential presence.
Migration guide (suggested)

If you previously used max_consec_fill:
Replace usages with max_gap_duration_to_fill; pass a pandas Timedelta or string like "3h" (e.g. max_gap_duration_to_fill="3h" or pd.Timedelta("3h")).
Example: dataset.interpolate_gaps(..., max_gap_duration_to_fill="3h")
To limit filled values:
Pass min_value and/or max_value to fill_gaps_with_raw_modeldata, fill_gaps_with_debiased_modeldata, fill_gaps_with_diurnal_debiased_modeldata, fill_gaps_with_weighted_diurnal_debiased_modeldata.
For plotting:
Use the new pd_plot helpers for quick plots: my_modeltimeseries.pd_plot(...) and my_sensordata.pd_plot(show_labels=["ok"], **kwargs).
To choose specific model data series in make_plot, use modeldata_name or modeldata_kwargs.
For selecting modeldata rows from the combined DataFrame:
Use filter_modeldatadf(modeldatadf, trgobstype, modelname, modelvariable) to robustly get the intended subset.
If you relied on the old Dataset/Station.gaps API for "singular_gaps", switch to gap_overview_df/gap_status_overview_df for single-row-per-gap summaries.
Dependency notes

earthengine-api: pinned to <= 1.6.11 due to geemap compatibility (geemap 0.35.3).
geemap >= 0.35.3 required.
Minor updates across docs/testing tooling.
Developer / internal notes

Contributors (from commit co-authors)

Thomas Vergauwen
Leon Adriaensen (@ADRIE-A3)
Copilot / automated/code-assist contributions mentioned in commit history

Atmosphere - Meteorological Observation and Forecast - Python
Published by vergauwenthomas 8 months ago

MetObs-toolkit - v0.4.6

Release Notes - MetObs_toolkit v0.4.6

Note: v0.4.5 does not exist. (It is missing because of an installation bug on Py3.10, PyPi restrictions force me to skip that release.)

Release Highlights:
This release delivers enhancements, bug fixes, and improved robustness for the MetObs_toolkit. It focuses on better data handling, new plotting functionalities and fixes for various edge-cases.

πŸš€ New Features & Enhancements

  • Data Import Robustness:
    • Added support for comma as a decimal symbol when importing data.
    • Introduced convert_to_numeric_series for safer numeric conversions, replacing direct .astype calls.
  • Plotting Improvements:
    • The make_plot() method of the stations class now supports:
      • modeldata_name variable for easier model series selection.
      • modeldata_kwargs to select specific modeldata series.
      • New modeltype parameter to plot different types of modeldata independently from obstype.
  • Site Metadata Enrichment:
    • Added lcz (Local Climate Zone) and altitude as attributes of site.
    • These are now included in the API documentation.
  • Quality Control (QC) Improvements:
    • Enhanced buddy check:
      • More informative error messages with iteration reference.
      • Fix for duplicate messages by joining them.
      • Added tests for relevant edge cases.

πŸ› Bug Fixes

  • Fixed bug in test baselines and ensured correct location for baseline data.
  • Fixed bug where altitude being NaN could cause processing errors.
  • Fixed bugs in tests and improved test coverage.
  • Addressed Sphinx warnings in the documentation.
  • Resolved several grammar errors in code comments.

πŸ§ͺ Testing & Maintenance

  • Added and improved tests for plotting and QC edge-cases.
  • Updated test baselines for more robust regression checking.
  • Black formatting and code style improvements across multiple modules.

πŸ”’ Versioning

  • Version set to v0.4.6.

New Contributors

Full Changelog: https://github.com/vergauwenthomas/MetObs_toolkit/compare/v0.4.4...v0.4.6

Atmosphere - Meteorological Observation and Forecast - Python
Published by vergauwenthomas 8 months ago

MetObs-toolkit - v0.4.5

Release Notes - MetObs_toolkit v0.4.5

Release Highlights:
This release delivers enhancements, bug fixes, and improved robustness for the MetObs_toolkit. It focuses on better data handling, new plotting functionalities and fixes for various edge-cases.

πŸš€ New Features & Enhancements

  • Data Import Robustness:
    • Added support for comma as a decimal symbol when importing data.
    • Introduced convert_to_numeric_series for safer numeric conversions, replacing direct .astype calls.
  • Plotting Improvements:
    • The make_plot() method of the stations class now supports:
      • modeldata_name variable for easier model series selection.
      • modeldata_kwargs to select specific modeldata series.
      • New modeltype parameter to plot different types of modeldata independently from obstype.
  • Site Metadata Enrichment:
    • Added lcz (Local Climate Zone) and altitude as attributes of site.
    • These are now included in the API documentation.
  • Quality Control (QC) Improvements:
    • Enhanced buddy check:
      • More informative error messages with iteration reference.
      • Fix for duplicate messages by joining them.
      • Added tests for relevant edge cases.

πŸ› Bug Fixes

  • Fixed bug in test baselines and ensured correct location for baseline data.
  • Fixed bug where altitude being NaN could cause processing errors.
  • Fixed bugs in tests and improved test coverage.
  • Addressed Sphinx warnings in the documentation.
  • Resolved several grammar errors in code comments.

πŸ§ͺ Testing & Maintenance

  • Added and improved tests for plotting and QC edge-cases.
  • Updated test baselines for more robust regression checking.
  • Black formatting and code style improvements across multiple modules.

πŸ”’ Versioning

  • Version set to v0.4.5.

New Contributors

Full Changelog: https://github.com/vergauwenthomas/MetObs_toolkit/compare/v0.4.4...v0.4.5

Atmosphere - Meteorological Observation and Forecast - Python
Published by vergauwenthomas 8 months ago

MetObs-toolkit - v0.4.4

Release name: v0.4.4 Tag: v0.4.4 Compare: changes since v0.4.3

Highlights

Data IO and formats
Parquet reader support added. (#557)
New to_parquet and to_csv methods for Dataset and Station classes. (#556)
CF-compliant netCDF serialization for xarray Datasets with nested attributes. (#558)
Model data improvements
ModelTimeseries unit conversion handling and ModelObstype renaming for clearer semantics and consistency. (#543, #545)
Robustness and correctness
Fix for NaTType error in frequency estimation when variable list is empty. (#562) β€” thanks to @ADRIE-A3 for reporting (#561).
Safer gapfilling invocation by checking stations for obstype when GF is called on Dataset. (#566)
Standardize runtime warnings by converting them to structured logging. (#565)
Improved QC error handling on Dataset. (#560)
Developer experience and docs
Human-readable repr methods for main classes to aid debugging and inspection. (#568)
README updated to include conda install instructions and badge. (#555)

Potential behavior changes

Renamed/standardized β€œModelObstype” naming and unit-conversion handling for model time series. Downstream user code referencing the old name or implicit conversions may need to adapt. (#543, #545)

Closed issues addressed in this release window

error importing data, NaTType in frequency for empty variable list β€” reported by @ADRIE-A3, fixed via (#562). (#561)
template_build_prompt() to accept arguments β€” opened by @pratiman-91. (#551)
Use of pint for units and conversion β€” opened by @pratiman-91. (#549)
Update docs to latest version β€” opened by @pratiman-91. (#547)
Update Repo About information β€” opened by @pratiman-91. (#548)
Contributors (thank you!)

Code contributions:

@vergauwenthomas (#543, #545, #560, #566)
@pratiman-91 (#555, #557)
@Copilot (app/bot) (#556, #558, #562, #565, #568)

Issue reporters:

@ADRIE-A3 (#561)
@pratiman-91 (#547, #548, #549, #551)

Included pull requests (since v0.4.3)

#543 β€” Modeltimeseries unit conv handling and modelobstype renaming. (@vergauwenthomas)
#545 β€” Modeltimeseries unit conv. (@vergauwenthomas)
#555 β€” Update README.md to include conda install and badge. (@pratiman-91)
#556 β€” Add to_parquet and to_csv methods for Dataset and Station classes. (@Copilot)
#557 β€” Parquet reader. (@pratiman-91)
#558 β€” Implement CF-compliant netCDF serialization for xarray Datasets with nested attributes. (@Copilot)
#560 β€” Qc on dataset error handling. (@vergauwenthomas)
#562 β€” Fix NaTType error in frequency estimation for empty variable lists. (@Copilot)
#565 β€” Standardize warning formatting by converting operational warnings to logging. (@Copilot)
#566 β€” Check stations for obstype when GF is called on Dataset. (@vergauwenthomas)
#568 β€” Implement human-readable repr methods for all main classes. (@Copilot)

Atmosphere - Meteorological Observation and Forecast - Python
Published by vergauwenthomas 9 months ago

MetObs-toolkit - v0.4.3

What's Changed

Full Changelog: https://github.com/vergauwenthomas/MetObs_toolkit/compare/v0.4.0...v0.4.3

Atmosphere - Meteorological Observation and Forecast - Python
Published by vergauwenthomas 9 months ago

MetObs-toolkit - v0.4.0

What's Changed

New Contributors

Full Changelog: https://github.com/vergauwenthomas/MetObs_toolkit/compare/v0.3.0...v0.4.0

Atmosphere - Meteorological Observation and Forecast - Python
Published by vergauwenthomas about 1 year ago

MetObs-toolkit - v0.4.0a

What's Changed

New Contributors

Full Changelog: https://github.com/vergauwenthomas/MetObs_toolkit/compare/v0.3.0...v0.4.0a

Atmosphere - Meteorological Observation and Forecast - Python
Published by vergauwenthomas about 1 year ago

MetObs-toolkit - v0.3.0

The following parts are (major) revised:

  • Gaps: There are no missing observations anymore. All that is missing, is considered a gap.
  • gap filling: Multiple methods with different complexity for filling with modeldata
  • Template: Templates are now stored as JSON files, and in a dedicated class.
  • Modeldata: Modeldata has a specific class for static and dynamic datasets
  • Documentation: The API now has examples for all user-accessible functions and methods.

What's Changed

Full Changelog: https://github.com/vergauwenthomas/MetObs_toolkit/compare/v0.2.1...v0.3.0

Atmosphere - Meteorological Observation and Forecast - Python
Published by vergauwenthomas over 1 year ago

MetObs-toolkit - v0.2.1

Templates are handled by Template() and json file used to store templates.

What's Changed

Full Changelog: https://github.com/vergauwenthomas/MetObs_toolkit/compare/v0.2.0...v0.2.1

Atmosphere - Meteorological Observation and Forecast - Python
Published by vergauwenthomas almost 2 years ago

MetObs-toolkit - JOSS release

The official MetObs-toolkit version as published by JOSS

What's Changed

New Contributors

Special Credits

We want to thank @Zeitsperre and @ashwinvis for their thorough review of this package. Their comments, remarks, and suggestions have put this package to the next level!

Full Changelog: https://github.com/vergauwenthomas/MetObs_toolkit/compare/v0.1.3...v0.2.0

Atmosphere - Meteorological Observation and Forecast - Python
Published by vergauwenthomas about 2 years ago

MetObs-toolkit - JOSS release

The official MetObs-toolkit version as published by JOSS

What's Changed

New Contributors

Special Credits

We want to thank @Zeitsperre and @ashwinvis for their thorough review of this package. Their comments, remarks, and suggestions have put this package to a new level!

Full Changelog: https://github.com/vergauwenthomas/MetObs_toolkit/compare/v0.1.2...v0.1.3-joss

Atmosphere - Meteorological Observation and Forecast - Python
Published by vergauwenthomas about 2 years ago

MetObs-toolkit - v0.1.3

Main Changes

  • New framework for observation types:
    • A user can create new observation types
    • observation type interaction with Modeldata (+ vectorfield observationtypes)
  • A toolkit version of the buddy check (equivalent to TITAN's buddy check, but in python)
  • Bug fixes

What's Changed

New Contributors

Full Changelog: https://github.com/vergauwenthomas/MetObs_toolkit/compare/v0.1.2...v0.1.3

Atmosphere - Meteorological Observation and Forecast - Python
Published by vergauwenthomas over 2 years ago

MetObs-toolkit - v0.1.3

Main Changes

  • New framework for observation types:
    • A user can create new observation types
    • observation type interaction with Modeldata (+ vectorfield observationtypes)
  • A toolkit version of the buddy check (equivalent to TITAN's buddy check, but in python)
  • Bug fixes

What's Changed

New Contributors

Full Changelog: https://github.com/vergauwenthomas/MetObs_toolkit/compare/v0.1.2...v0.1.3

Atmosphere - Meteorological Observation and Forecast - Python
Published by vergauwenthomas over 2 years ago

MetObs-toolkit - v0.1.2

What's Changed

Full Changelog: https://github.com/vergauwenthomas/MetObs_toolkit/compare/v0.1.1...v0.1.2

Atmosphere - Meteorological Observation and Forecast - Python
Published by vergauwenthomas over 2 years ago

MetObs-toolkit - Initial Release

init: Github release to sync the releases on PyPI.

Atmosphere - Meteorological Observation and Forecast - Python
Published by vergauwenthomas almost 3 years ago