Recent Releases of openair

openair - openair 3.1.0

Dependency Changes

  • {openair} now suggests {rnaturalearth} over {rnaturalearthdata}. {rnaturalearthdata} is still required for a medium map resolution and {rnaturalearthhires} for a high map resolution, but these are now managed by {rnaturalearth} directly.

Breaking Changes

  • strip.position, x.relation, and y.relation are now no longer function-level arguments and are handled via ....

  • The labels argument (as paired with breaks) is deprecated. If passed through ... it will be automatically mapped to its new place in the breakOpts() function.

New Features

  • Refinements to how parameters are passed via ... to plotting functions:

    • Graphical parameters are now defined using ggplot2 conventions (e.g., shape over pch).

    • base/lattice paramters are automatically remapped to their ggplot2 equivalent with a warning.

    • nrow and ncol can now be provided to control facet layout. As above, layout is automatically unpacked into ncol and nrow with a warning.

    • scales can now be provided to control facet scale restrictions. As above, x.relation and y.relation are automatically combined into scales with a warning.

    • space, axes, axis.labels, strip.position and switch are now passed to ggplot2::facet_wrap() or ggplot2::facet_grid().

    • title, subtitle, tag, and caption can be used throughout openair (title replacing main and caption replacing sub). All are passed through quickText() if auto.text = TRUE.

    • lineend, linejoin and linemitre tweak the appearance of line plots; see ggplot2::geom_line() for more information.

  • Additional interactions between ... parameters and various plots:

    • timePlot() now takes shape to add markers to the line chart. This can be a vector to vary with pollutant/group.

    • timeProp() now takes linewidth and linetype to control border style.

    • TheilSen() now takes linewidth, linetype, shape and alpha.

    • smoothTrend() now takes linetype, linewidth, shape and size. These can be vectors to vary based on pollutant. The linewidth of the data will always be half of that of the model.

  • Refinements to how ref.x and ref.y behave throughout {openair}:

    • ref.x and ref.y can now take a vector rather than a list, which will just use those values as the x/y intercept with default graphical parameters.

    • ref.x and ref.y have been added to timeProp() and TheilSen().

    • ref.x and ref.y can now take {ggplot2}-style parameters (intercept, alpha, colour, linetype, linewidth). The old parameter names (h/v, cols, lty and lwd) are automatically remapped so still work.

    • The non-intercept arguments passed to ref.x and ref.y (e.g., alpha) are now automatically recycled to the length of intercept, similar to how ... parameters are recycled.

    • Added refOpts() to help construct values for ref.x and ref.y, similar to windflowOpts().

  • Refinements to how breaks are implemented in functions like trendLevel():

    • If breaks doesn't cover the full range of the data being binned, the maximum and minimum breaks will be overwritten so that it does.

    • If breaks is of length 1, the colour range will be split into breaks categories, defaulting to using the same logic as running cutData() on a numeric column.

    • breaks can now take a named list, defined using the new breakOpts() function. Most significantly, this allows for the method of binning to change for single-value breaks (quantiles, equal range bins, user-defined bin widths, approximate 'pretty' breaks and wind direction binning at time of writing).

    • polarPlot(), polarAnnulus(), corPlot() and trajLevel() have gained breaks.

    • labels is no longer a top-level argument and can be defined by passing a list to breaks. labels given to ... will be converted with a warning.

  • All functions which take breaks also now contain the trans argument to perform scale transforms on continuous colour scales. This can take FALSE (no scale transform), TRUE (an appropriate default transform - usually "log10"), or a {scales} transform object (or string shorthand).

  • Refinements to colours in {openair}:

    • openColours() gains direction, alpha, begin, end, lightness and saturation for better control over colour palettes.

    • Added colourOpts(). Any cols argument in {openair} can now take the colourOpts() function, which tells each plotting function how to use the new openColours() arguments.

    • Added openSchemes() which returns a table of available colour palettes.

    • New palettes:

      • Completed the set of "viridis" palettes with "rocket" and "mako".

      • Added additional palettes by Paul Tol; "tol.highcontrast", "tol.vibrant", "tol.mediumcontrast", "tol.pale" and "tol.dark".

      • Added various palettes based on the work of Fabio Crameri - see openColours() for more details.

    • Added openColors() and colorOpts() which are synonymous with their British English equivalents openColours() and colourOpts().

  • New Kolmogorov-Zurbenko (KZ) Filter functions kzFilter() and kzaFilter(). These functions significantly enhance the capability of {openair} by allowing different time components to be separated and analysed separately. The kzFilter() function is considered a good default for a wide range of problems whereas the kzaFilter() function is the adaptive version that is well-suited to capturing abrupt changes, e.g., through an intervention. The range of uses of the filters will be covered in the {openair} book.

  • TheilSen() and smoothTrend() now use a more straightforward way to input missing data when deseason = TRUE and missing monthly data are present based on monthly linear regression by month. The user is alerted to the imputation and the monthly plot shows the imputed data as a filled grey circle.

  • smoothTrend() will now use loess when it has insufficient data to fit a GAM.

  • cutData() gains the wd.res argument, which can take one of 4, 8, or 16, defaulting to 8. 4 cuts the data into North, East, South and West. 16 cuts the data into N, NNE, NE, ENE, etc. All plotting function's type argument now responds to this, showing only four panels when wd.res = 4 and a 5x5 grid of sixteen panels when wd.res = 16.

  • trajPlot() and trajLevel() regain the map.res argument. This is passed to rnaturalearth::ne_countries() so can take three different resolutions.

  • polarPlot() will now annotate the identity of the radial axis as a caption, if annotate = TRUE.

  • ws and/or wd have been added to percentileRose(), polarAnnulus(), polarDiff() and polarFreq() in line with polarPlot().

  • angle.scale has been added to polarFreq() in line with all other polar coordinate plots.

  • The default key.position of corPlot() is now "right".

  • timePlot() has gained the key.title argument.

Bug Fixes

  • timeAverage() now has a more robust approach to multi-period averaging time such as "3 day". Bin boundaries are at fixed periods of time, which should ensure consistency across data sets that start at different times. This also means there is less need to use the argument start.date unless there is a need to extend the start date of the data for some reason e.g. to the beginning of a year. THis change may result in slight differences in returned output but should not affect periods such as "day", "hour" and "month".

  • calendarPlot() will no longer duplicate windflow arrows when type != "default".

  • calendarPlot() now supports type = "wd".

  • Fix intercept calculation in TheilSen(). Updated the intercept formula from median(y) - slope * median(x) to median(y - slope * x). This resolves a visual bug where trend lines (especially annual aggregations with negative slopes) appeared horizontally displaced due to inaccurate intercept estimates on sparse data. This fix affects the visual display of the trend lines but not the calculated slopes or uncertainties.

  • quickText() is once again tolerant of apostrophes.

  • windflowOpts() will no longer overwrite the default range of functions like calendarPlot() and trendLevel() if range is not supplied by the user.

  • polarPlot() will no longer produce a square-shaped surface when exclude.missing = FALSE.

  • fontsize is now correctly passed to scatterPlot(), polarAnnulus(), variationPlot(), and timePlot().

  • Strings with line breaks (e.g., the result of cutData(type="seasonyear"))) will no longer error when used for facet labels.

  • name.pol will now correctly map onto the names of pollutants in timeVariation().

  • windRose() now respects cols when ws2 and wd2 are provided.

  • xlim is now correctly passed to the coordinates of trajLevel().

Natural Resources - Air Quality - R
Published by jack-davison 14 days ago

openair - openair 3.0.0

Dependency Changes

  • openair now depends on R v4.1 and, internally, uses the base R pipe (|>).

  • openair now imports ggplot2 and scales and suggests sf, geomtextpath, legendry and rnaturalearthdata.

  • openair no longer imports lattice, latticeExtra, hexbin or mapproj nor suggests mapdata or maps.

Breaking Changes

  • All plotting functions are now written in ggplot2. lattice specific options and annotations will no longer work, but many can now be achieved using ggplot2::theme() and ggplot2::annotate().

  • trajPlot(), trajLevel() and trajCluster() have had their three projection related arguments removed and replaced with a single crs argument, which defaults to lat/lng (4326).

  • As the above three functions no longer call scatterPlot(), scatterPlot() no longer has the map argument.

  • drawOpenKey() has been removed due to being lattice-specific.

  • linearRelation() and calcFno2() have been removed from openair due to using outdated methodology and assumptions.

  • summaryPlot() has been removed from openair. This function was very old and inconsistent with the rest of openair. It is planned to be replaced in the future with new summary functions.

  • key.header and key.footer have been replaced with a single key.title. This is due to ggplot2 not supporting a separate "header" and "footer" for guides.

  • The key argument has been deprecated, as it now only exists to overwrite key.position when it is FALSE. Please use key.position = "none" going forward.

  • Argument names have been standardised throughout openair. For example, instances of col have been replaced with cols. This may cause some existing code to break, but will ensure each function behaves more similarly going into the future.

New Features

  • timeVariation() has been almost completely rewritten. It is now a thin wrapper around the new variationPlot(), which can take any arbitrary x value - passed to [cutData()] - to use for its x-axis. Furthermore, it has gained the following changes:

    • Gained the panels argument. This allows for panels other than "hour.weekday", "hour", "month", and "weekday" to be represented in the plot assembly.

    • When key is FALSE, no key is shown for any of the four timeVariation() plots. Previously, any value passed to key would cause all four plots to display a key.

    • (!) BREAKING: The order of xlab and ylim now matches the order of panels. month.last has also been deprecated; if used and TRUE, this will override panels with a warning. The output names output$data will now vary based on panels, and the type column will be named {type}_type (e.g., "hour_type").

    • (!) BREAKING: The names of the plot and data objects returned by timeVariation() are now named after panels and have a more consistent structure.

  • timePlot() refinements:

    • group can now take a character string, passed to [cutData()] via [timeAverage()]. This works similarly to group in timeVariation() in that it colours traces within the panel, rather than splitting them into multiple panels.

    • Gained the x.relation argument, allowing for different x ranges on different panels.

  • smoothTrend() refinements:

    • Gained the x.relation, date.format, and key.position arguments, in line with timePlot().

    • Gained the progress argument, passed to timeAverage().

    • avg.time is also no longer restricted to just three options (any timeAverage() option is permitted), although too fine a time resolution may obscure the smooth trend for long running data.

  • calendarPlot() refinements:

    • Gained the type argument. This can take one type and creates a 2D matrix using month & whatever the user has selected. type = "year" has special handling.

    • Gained the windflow argument, which deprecates passing "ws" or "wd" to annotate.

    • Gained the percentile argument, passed on to timeAverage().

    • Gained the show.year argument, defaulting to TRUE. When FALSE and only one year of data is given, the strip titles will only read, e.g., "January" instead of "January-2000". This can create cleaner plots, as well as being useful for certain edge cases (e.g., if the calendarPlot is showing day-of-year averages over multiple years).

    • When statistic == "min" and annotation %in% c("ws", "wd"), the ws/wd returned will correspond to the minimum daily pollutant, rather than the minimum daily ws/wd.

  • timeProp() refinements:

    • proportion is now treated more like type internally. For a user, this means it can now be passed "default" to avoid any conditioning and create a regular period average barchart.

    • sub can now be defined via ...; set sub = NA to remove the text annotation which appears by default at the bottom of a timeProp() plot.

    • Gained the key argument to remove a legend.

    • "season" is now a permitted avg.time option in timeProp(), better aligning it with the options in timeAverage().

    • ... is now correctly passed to cutData() when using type/proportion.

  • corPlot() refinements:

    • Added the annotate argument which can change the correlation annotation to a p-value marker or stars, or remove it entirely.

    • Added two new arguments triangle and diagonal for controlling the plot appearance.

    • Added arguments key and key.title for adding and refining a plot legend.

  • trendLevel() refinements:

    • (!) BREAKING: type now defaults to "default", in line with other openair functions.

    • Added windflow and min.bin arguments, in line with similar functions.

    • Two type values are now supported.

  • TaylorDiagram() refinements:

    • Added the pos.cor argument which controls whether the negative correlation quadrant is shown.
  • New function WhittakerSmooth() to do Whittaker-Eilers Smoothing. This is a fast and general smoothing technique, well-suited to a wide range of problems. The function can be used to flexibly smooth and interpolate missing data. Additionally, the function can flexibly define a baseline (and hence increment) for a time series.

  • New function windflowOpts() which can be passed to the windflow argument of various openair functions to thoroughly customise the "windflow" arrows.

  • All openair plotting functions have gained strip.position to control the placement of the facet strip.

  • trajPlot() and trajLevel() have gained the grid.nx and grid.ny arguments which can be used to control the number of ticks on the coordinate grid, or remove it altogether.

  • cutData() now contains the drop argument. This allows for greater control over factor levels for appended columns. For example, consider a situation in which data only contains dates in March and May and type = "month" is used:

    • drop = "empty" will ensure the resulting vector only has factor levels "March" and "May".

    • drop = "none" will ensure the vector has all twelve months (January, February, March, etc.).

    • drop = "outside" will retain 'inclusive' factor levels within the range of the data - in this case "March", "April", and "May".

    • drop = "default" is the existing cutData() behaviour - in the case of type = "month", it is equivalent to drop = "empty".

  • cutData() also gains the "quarter" and "quarteryear"/"yearquarter" type options. These split a year cleanly into quarters, as an alternative to "season" and "seasonyear"/"yearseason". While seasons better align with meteorology, quarters more cleanly fit into a single calendar year and may better align with other relevant periods (e.g., reporting schedules, ratification calendars, economic activity, etc.).

  • is.axis now has an effect on weekday, season, seasonyear and monthyear.

  • quickText() now converts air_temp (a common worldmet variable) into "temperature".

  • timeAverage() is much faster with the bulk of the calculations made using C++.

  • runRegression() is now much faster with a new algorithm.

Bug Fixes

  • timePlot() now allows duplicate dates when time.avg is used. The user will still receive a warning from timeAverage(), which is used internally, but the plot will still be created.

  • The windflow argument of timePlot() now works when "ws" and/or "wd" are in pollutant.

  • importUKAQ() now closes its url() connections and generally fails more gracefully when data_type %in% c("annual", "monthly", "daqi"). This was already the case for other data types.

  • timeAverage() will no longer leave Uu and Vv columns behind when statistic = "data.cap".

  • timeAverage() now correctly passes ... to cutData().

  • timeAverage() now properly calculates wind speed and direction when vector.ws = TRUE.

  • selectByDate() now correctly handles the end date if supplied when in a date format (i.e., dd/mm/yyyy) and selects all hours in that day if present.

Natural Resources - Air Quality - R
Published by jack-davison 2 months ago

openair - openair 2.19.0

Deprecations

importEurope() relies on the same back-end database as the saqgetr package (https://github.com/skgrange/saqgetr), which was retired in February 2024. importEurope() will now warn users of this, and outright error if year >= 2025. Users are instead encouraged to use the EEA Air Quality Download Service https://eeadmz1-downloads-webapp.azurewebsites.net to obtain European data for the time being. An R package, https://github.com/openair-project/euroaq, has been developed to facilitate its use.

New Features

Data Access

  • The source argument of importUKAQ() now defaults to NULL. This option allows the function to assign the source of each site itself, with some caveats:

    • Ambiguous codes (e.g., "AD1", which corresponds to a SAQN and locally managed site) will preferentially import from the national networks (AURN, then AQE/SAQN/WAQN/NIAQN) over locally-managed networks. To override this users should manually define source.

    • Incorrect codes not found in importMeta() will error if importUKAQ() is left to assign the source.

    • When data_type is one of the aggregate types (e.g., "annual") and a site isn't defined, a source must be provided.

    • It is likely slightly slower for the function to assign source itself than for users to specify it themselves.

  • The specific metadata columns appended when importUKAQ(meta = TRUE) can now be controlled using the meta_columns argument. For example, setting meta_columns to c("zone", "agglomeration") will append the zone/agglomeration information instead of the default site type/latitude/longitude.

  • DAQI information imported using importUKAQ(data_type = "daqi") will be returned with the relevant DAQI band appended as an additional factor column; either "Low" (1-3), "Moderate" (4-6), "High" (7-9), or "Very High" (10). See https://uk-air.defra.gov.uk/air-pollution/daqi for more information.

  • importImperial() has been added, superseding importKCL(). They are functionally identical, but reflect that londonair is now managed by Imperial College London. Function arguments have been renamed in importImperial() to better match importUKAQ().

Utility Functions

  • cutData() gained numerous new features:

    • Added the names argument to specify the name of the appended columns. For example, cutData(mydata, "wd", names = c("windDir")) will append a column named "windDir".

    • Added the suffix argument as an alternative to names. If a new column would otherwise overwrite an existing column, suffix will be appended. For example, cutData(mydata, c("nox", "o3"), suffix = "_cuts") would append nox_cuts and o3_cuts columns.

    • cutData() is now less destructive and better cleans up after itself. For example, when type = "yearseason", it will no longer leave 'year' and 'season' columns behind, or overwrite existing 'year' and 'season' columns.

    • cutData() will now give an informative error message if the user provides a type which is in neither an in-built option nor a column in their dataframe.

  • calcPercentile() gained the following arguments:

    • Added the type argument, in line with timeAverage().

    • Added the prefix argument to control the naming of the returned columns.

  • binData() gained the following arguments:

    • Added the type argument, passed to cutData().

    • Added the B and conf.int arguments, passed to bootMeanDF().

  • selectRunning() gained the following arguments:

    • Added the type argument, passed to cutData().

    • Added the name argument, which changes the name of the new column appended by the function.

    • Added the mode argument, which allows selectRunning() to filter the dataset rather than append a column.

  • rollingMean() has gained the type argument. This will likely be of most use for distinguishing between - and calculating separate statistics for - different monitoring stations within the same data frame.

  • splitByDate() can now more consistently take Date / POSIXct inputs as well as characters, and provides more flexibility over inputs with a new format argument.

  • aqStats() gained the progress argument, in line with timeAverage().

  • Many 'data utility' functions will now either warn or error if duplicate dates are detected, which is suggestive of a mix of either sites or averaging times within the same dataframe. The following functions have new behaviour:

    • selectRunning() and rollingMean() will error (duplicate dates break the logic of 'rolling window' functions).

    • aqStats() will also error, as it relies on rollingMean().

    • timeAverage() will warn the user but proceed with calculations, as averaging across different sites may be a legitimate action.

    • Functions which rely on timeAverage() will also warn but not error (notably calcPercentile() but also many plotting functions with avg.time arguments).

Plotting Functions

  • Added new features for openColours():

    • Added new qualitative colour palettes: the "tol" family are colour-blind friendly palettes based on the work of Paul Tol, and "tableau" and "observable" provide access to the "Tableau10" and "Observable10" palettes to aid in consistency with plots made in those platforms.

    • When n isn't defined for a qualitative palette (e.g., "Dark2"), the full qualitative palette will be returned. Previously this errored with the default of 100.

    • openColours() will now check whether the provided scheme is either a known scheme name or a vector of valid R colours, and provide an informative error if this is not the case.

  • polarDiff() has gained the type argument, and correctly responds to main, key.footer and key.header via the ... options.

  • trendLevel() has gained new statistic types to match timeAverage(), including "mean", "median", "min", "max", "sd", "sum", "frequency" and "percentile".

  • trendLevel() will now automatically generate appropriate labels if breaks are provided. The labels argument can still be used to provide custom labels per break.

  • The formula.label argument of polarPlot() will now control whether concentration information is printed when statistic = "cpf".

  • Added calm.thresh as an option to windRose(). This change allows users to set a non-zero wind speed threshold that is considered as calm.

  • Added the map.lwd, map.lty and map.border arguments to trajPlot(), trajLevel() and trajCluster() for greater control over the 'basemap' of each plot.

Bug fixes

  • Fixed repeated day number in calendarPlot() when statistic = max.

  • Fixed annotate = FALSE in windRose() where axes and labels were not shown

  • Fixed an issue wherein importUKAQ() would drop sites if importing from local sites and another network.

  • polarCluster() will no longer error with multiple pollutants and a single n.clusters.

  • importUKAQ() will correctly append site meta data when meta = TRUE, source is a length greater than 1, and a single site is repeated in more than one source (e.g., importUKAQ(source = c("waqn", "aurn"), data_type = "daqi", year = 2024L)))

  • calcPercentile() will now correctly pass its arguments (e.g., date.start) to timeAverage().

  • timeAverage() will now more consistently return NA values rather than NaN or Inf when all values are NA. This specifically affects the "mean" and "min" statistics.

  • importUKAQ() will now correctly label a measurement as ratified when it is on the day of ratified_to. i.e., if a site is ratified to 2020/01/01, the measurement at 2020/01/01 23:00 will now be labelled as ratified.

  • Fixed importImperial() URLs.

Natural Resources - Air Quality - R
Published by jack-davison 9 months ago