Recent Releases of vak

vak - 1.0.3

What's Changed

Full Changelog: https://github.com/vocalpy/vak/compare/1.0.2...1.0.3

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave 6 months ago

vak - 1.0.2

What's Changed

Full Changelog: https://github.com/vocalpy/vak/compare/1.0.1...1.0.2

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave 6 months ago

vak - 1.0.1

What's Changed

Full Changelog: https://github.com/vocalpy/vak/compare/1.0.0...1.0.1

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave 10 months ago

vak - 1.0.0

What's Changed

New Contributors

Full Changelog: https://github.com/vocalpy/vak/compare/0.8.2...1.0.0

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave 12 months ago

vak - 0.8.2

What's Changed

New Contributors

Full Changelog: https://github.com/vocalpy/vak/compare/0.8.1...0.8.2

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave over 1 year ago

vak - 0.8.1

What's Changed

Full Changelog: https://github.com/vocalpy/vak/compare/0.8.0...0.8.1

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave over 1 year ago

vak - 1.0.0a3

What's Changed

Full Changelog: https://github.com/vocalpy/vak/compare/1.0.0a2...1.0.0a3

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave over 1 year ago

vak - 1.0.0a2

What's Changed

New Contributors

Full Changelog: https://github.com/vocalpy/vak/compare/0.8.0...1.0.0a2

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave over 1 year ago

vak - 1.0.0a1

What's Changed

Full Changelog: https://github.com/vocalpy/vak/compare/0.8.0...1.0.0a1

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave over 1 year ago

vak -

0.8.0 release notes

2023-02-09

Added

  • Add options for how audio.to_spect calls dask.bag, to help with memory issues when processing large files
    #611. Fixes #580.
  • Add ability to run evaluation of models with and without post-processing transforms. This is done by specifying an option post_tfm_kwargs in the [EVAL] or [LEARNCURVE] sections of a .toml configuration file. If the option is not specified, then models are evaluated as they were previously, by converting the predicted label for each time bin to a label for each continuous segment, represented as a string. If the option is specified, then the post-processing is applied to the model predictions before converting to strings. Metrics are computed for outputs with and without post-processing, to be able to compare the two. #621. Fixes #472.
  • vak.core.eval now logs computed evaluation metrics so they can be quickly inspected in the terminal or log files before full analysis
    #621. Fixes #471.

Changed

  • Rewrite post-processing transforms applied to network outputs as transforms, with functional and class implementations,
    to make it possible to compose these transforms, and more easily evaluate model performance with and without them
    #621. Fixes #537.

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave about 2 years ago

vak -

vak 0.7.0 release notes

vak 0.7.0 is a maintenance release, but it does include some new features and bug fixes.
Highlights:

  • For annotation formats that have one annotation file per annotated file, vak can now recognize when
    the annotation files are named by removing the annotated file extension (e.g., .wav or .npz)
    and replacing it with the annotation format extension, e.g. .txt or .csv. (Other ways of relating annotations
    and annotated files are still valid, e.g. by including the original source audio file in both filenames.)
  • The transform that normalizes spectrograms is now fit only to the training set; previously no split was specified and in some cases the entire dataset was used, which could potentially reduce the error on the test set because of dataset leakage (the model "knows" about the distribution of the test set because the parameters used to normalize the spectrograms take it into account). For training sets large enough to achieve good performance with current models, there is probably not a big enough difference between their distribution and that of the test set for this to seriously impact evaluation, but we have not tested this extensively.
  • Several other clean ups, additional unit tests, and minor bug fixes that should not have impacted performance but do make the library more efficient and robust.

Added

  • Add unit tests for csv.has_unlabled
    #541.
    Fixes #102.
  • Add unit tests for __main__
    #542.
    Fixes #337.
  • Add validation of labels argument to vak.split.algorithms.brute_force,
    to prevent conditions where algorithm can fail to converge
    because of bad input
    #562.
    Fixes #288.
  • Add a "Frequently Asked Questions" page to the documentation,
    and a page to the "Reference" section on file naming conventions
    #564.
    Fixes #524
    and #424.
  • Add a new way for vak to map annotation files to annotated files
    when preparing datasets, e.g. for training models.
    For annotation formats that have one annotation file per
    annotated file, vak can now recognize when
    the annotation files are named by removing the
    annotated file extension (e.g., .wav or .npz)
    and replacing it with the annotation format extension,
    e.g. .txt or .csv. (Other ways of relating annotations
    and annotated files are still valid, e.g. by including
    the original source audio file in both filenames.)
    #572.
    Fixes #563.
  • Have runs from command-line interface log version to logfile
    #587.
    Fixes #216.

Changed

  • Rewrite unit tests in tests/test_cli/ to use mocks for vak.core functions
    #544.
    Fixes #543.
  • It is now possible to load configuration files
    and work with them programmatically even if the paths
    they point to do not exist.
    The core functions handle validation instead.
    E.g., the PrepConfig class does not check whether
    output_dir exist is a directory, but vak.core.prep does.
    #550.
    Fixes #459.
  • Refactor and speed up logic for determining whether a
    dataset with sequence annotations has unlabeled segments
    that should be assigned a "background" label
    #559.
    Fixes #243.
    • Adds a new sub-sub-package, datasets.seq
      with a validators module, which is where the
      re-written has_unlabeled function now lives.
      Replaces the vak.csv module which was not well named.
    • Also adds a has_unlabeled function to vak.annotation
      that is used by vak.datasets.seq.validators.has_unlabeled;
      this function handles edge cases outlined in
      #243.
  • Rename and refactor functions in vak.annotation
    that map annotations to the files that they annotate,
    so that the purpose of the functions is clearer,
    and add clearer error messages with links to documentation
    about file naming conventions
    #566.
    Fixes #525.
  • Revise "autoannotate" tutorial to use .wav audio and .csv
    annotation files from new release of Bengalese Finch Song
    Repository, and to suggest that Windows users unpack
    archives with tar, not other programs such as WinZip
    #578.
    Fixes #560
    and #576.
  • Change vak.files.find_fname and vak.files.spect.find_audio_fname
    so they work when spaces are in filename and/or path
    #594.
    Fixes #589.

Fixed

  • Fix how vak.core.prep handles labelset parameter.
    Add pre-condition that raises a ValueError
    when labelset is None but the .toml config is one of
    {'train', 'learncurve', 'eval'}
    #545.
    Avoids running computationally expensive step of generating
    and validating spectrograms before crashing when trying to
    split the dataset using labelset. Also avoids silent
    failures for datasets that do not require splitting,
    e.g., an 'eval' set that could contain labels not in the
    training set.
    Fixes #468.
  • Fix how cli and core functions that have the csv_path parameter
    handles it. The parameter points to a dataset .csv generated by vak prep
    that other core/cli function use: train, learncurve, eval, predict.
    They now validate that it exists, and if it doesn't, the cli functions
    politely suggest running vak prep first; the core functions
    raise a FileNotFoundError.
    #546.
    Fixes #469.
  • Fix bug where labelmap_path parameter was ignored by core.train.
    Change function so that either labelmap_path or labelset must
    be passed in, both passing in both will raise an error.
    Also change cli.train to only pass in one of those and set the other
    to None.
    #552.
    Fixes #547.
  • Fix vak.annotation.has_unlabeled to handle the edge case where an
    annotation file has no annotated segments
    #583.
    Fixes #378.
  • Fix StandardizeSpect method fit_df so that it computes
    parameters for standardization from a specific
    split of the dataset--the training split, by default--instead
    of using the entire dataset, which could technically give rise
    to data leakage
    #584.
    Fixes #575.
  • Fix error message in vak.core.eval
    #589.
    Fixes #588.

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave over 2 years ago

vak -

0.4.0 -- 2021-12-29

Added

  • add a CITATION.cff file
    #407.
  • add an all-contributors table to README,
    using their bot to adopt the spec.
    E.g., #395.
    Fixes #387.
  • add description of command-line interface to reference section of documentation.
    #417.
    Fixes #270.
  • add how-to on using an annotation format that's not built in
    #421.
    Fixes #397.
  • add how-to on using custom spectrograms
    #421.
    Fixes #413.

Changed

  • updated the .toml configuration files in the tutorial
    to match what was used for TweetyNet paper.
    #416.
    Fixes #414.
  • move tutorial into "getting started" section of docs,
    and revise landing page of docs
    #419.
  • revise the documentation for the configuration file format.
    Show valid options for each section by including docstrings from the classes
    that represents the different sections
    #428.
    Fixes #271.

Fixed

  • make further fixes + add unit tests for handling predictions where all timebins
    are the background "unlabeled" class #409.
    Fixes bug in remove_short_segments #403.
    Related to #393
    and #386.
  • fix docs so entries appear in navbar
    #427.
    Fixes #426.

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave over 2 years ago

vak -

0.6.0 -- 2022-07-07

Added

  • better document conda install
    #528.
    Fixes #527.
  • Add tests for console script, i.e., the command-line interface
    #533.
    Fixes #369.

Changed

  • switch from using make to nox for running tasks
    #532.
    Fixes #440.
  • Refactor logging so that it can be configured by cli functions
    when running vak through command-line interface, and by users
    that are working with the API directly
    #535.

Fixed

  • Fix bug that prevented creating spectrogram files with non-default keys
    (e.g. 'spect' instead of the default 's'). Needed to pass keys from spect_params
    into spect.to_dataframe inside vak.io.dataframe.from_files.
    #531.
    Fixes #412.
  • Fix logging so a single message is not logged multiple times.
    #535.
    Fixes #258.

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave almost 3 years ago

vak -

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave almost 3 years ago

vak -

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave almost 3 years ago

vak -

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave about 3 years ago

vak -

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave over 3 years ago

vak -

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave over 3 years ago

vak -

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave over 3 years ago

vak -

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave about 4 years ago

vak -

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave about 4 years ago

vak -

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave about 4 years ago

vak -

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave about 4 years ago

vak -

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave over 4 years ago

vak -

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave over 4 years ago

vak -

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave over 4 years ago

vak -

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave over 4 years ago

vak -

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave over 4 years ago

vak -

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave over 4 years ago

vak -

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave about 5 years ago

vak -

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave about 5 years ago

vak -

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave about 5 years ago

vak -

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave about 5 years ago

vak -

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave about 5 years ago

vak - bullfinches-blathering

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave over 5 years ago

vak - bullfinches-bettering

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave over 5 years ago

vak - bullfinches-bellowing

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave over 5 years ago

vak - sparrows-gathering

Added

  • add helper function to TestLearncurve that multiple unit tests can use to assert all outputs
    were generated. Now being used to make sure bug fixed in 0.1.0a8 stays fixed.
  • error checking in cli that raises ValueError when cli command is learncurve and the option
    'results_dir_made_by_main_script' is already defined in [OUTPUT] section, since running
    'learncurve' would overwrite it.
  • dataset subpackage that houses VocalizationDataset and related classes that facilitate creating data sets for training neural networks from heterogeneous data: audio files, files of arrays containing spectrograms, different annotation types, etc.
    • also includes modules for handling each data source
      • e.g. audio.to_spect creates spectrograms from audio files
      • spect.from_files creates a VocalizationDataset from spectrogram files
  • core sub-package that contains / will contain functions that do heavy lifting: learning_curve, train, predict
    • learning_curve is a sub-sub-module that does both train and test of models, instead of having a separate learncurve and summary function (i.e. train and test). Still will confuse some ML/AI people that this "learning curve" has a test data step but whatevs
    • cli sub-package calls / will call these functions and handle any command-line-interface specific logic
      (e.g. making changes to config.ini files)

Changed

  • change name of vak.cli.make_data to vak.cli.prep
  • structure of config.ini file
    • now specify either audio_format or spect_format in [DATA] section
    • and annot_format for annotations
  • refactor utils sub-package
    • move several functions from data and general into a labels module

Removed

  • remove unused options from command-line interface: --glob, --txt, --dataset
  • skip_files_with_labels_not_in_labelset option
    • now happens whenever labelset is specified; if no labelset is given then no filtering is done
  • summary command-line option, since learncurve now runs trains models and also tests them on separate data set
  • silent_label_gap option, because VocalizationDataset class determines if a label for unlabeled segments between other segments is needed, and if so automatically assigns this a label of 0 when mapping user labels to consecutive integers
    • this way user does not have to think about it
    • and program doesn't have to keep track of a labels_mapping file that saves what user specified

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave almost 6 years ago

vak -

Fixed

  • Fix how main loop in learncurve re-loads indices for grabbing subsets of training data after
    generating them, and do so in a way that still allows for re-using subsets from previous runs

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave about 6 years ago

vak -

Added

  • vak.cli.summary has save_transformed_data parameter and vak.cli passed value from
    config.data.save_transformed_data as the argument when calling vak.cli.summary

Changed

  • vak.cli.summary only saves transformed train/test data if save_transformed_data is True
  • move a test from tests/unit_tests/test_utils.py into tests/unit_tests/test_utils/test_data.py

Removed

  • vak.cli.summary no longer saves copy of test data in results directory

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave about 6 years ago

vak -

Added

  • add test for utils.data.get_inds_for_dur

Changed

  • learncurve gets indices for all train data subsets before starting training

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave about 6 years ago

vak -

Fixed

  • add missing 'save_transformed_data' option to Data config parsing

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave about 6 years ago

vak -

Added

  • Use attrs-based classes to represent sections of config.ini files

Changed

  • rewrite vak.cli so it can deal with state of config.ini files
    • e.g. doesn't throw an error if train_data_path not declared as an option in [TRAIN] when running vak prep
      (since training data won't exist yet, doesn't make sense to throw an error).

Removed

  • remove code about freq_bins in a couple of places, since the number of frequency bins
    in spectrograms is now just determined programmatically
    • vak.config.data no longer has freq_bins field in DataConfig namedtuple
    • make_data no longer adds freq_bins option to [DATA] section after making data sets

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave about 6 years ago

vak -

Changed

  • checkpoints saved in individual directories by learncurve so they are more cleanly segregated,
    e.g. if user wants to point to a specific checkpoint when calling predict
  • calling vak prep config.ini will run vak.cli.make_data function
    • so to generate a learning curve, the three steps now are:
    vak prep config.ini
    vak learncurve config.ini
    vak summary config.ini
    

Fixed

  • vak.cli.train runs all the way through, passes basic "does not crash" test
  • vak.cli.predict runs all the way through, passes basic "does not crash" test

Biosphere - Bioacoustics and Acoustic Data Analysis - Python
Published by NickleDave about 6 years ago