AVEX

An API for model loading and inference, and a Python-based system for training and evaluating bioacoustics representation learning models.
https://github.com/earthspecies/avex

Category: Biosphere
Sub Category: Bioacoustics and Acoustic Data Analysis

Keywords

audio bioacoustics representation-learning

Last synced: about 16 hours ago
JSON representation

Repository metadata

Animal Vocalization Encoder Library

Host: GitHub
URL: https://github.com/earthspecies/avex
Owner: earthspecies
License: mit
Created: 2025-04-15T05:18:50.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2026-07-01T10:58:53.000Z (6 days ago)
Last Synced: 2026-07-01T11:04:35.435Z (6 days ago)
Topics: audio, bioacoustics, representation-learning
Language: Python
Homepage:
Size: 43.7 MB
Stars: 36
Watchers: 5
Forks: 4
Open Issues: 13
Releases: 6
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
- Support: docs/supported_models.md

AVEX - Animal Vocalization Encoder Library

An API for model loading and inference, and a Python-based system for training and evaluating bioacoustics representation learning models.

Description

The Animal Vocalization Encoder library AVEX provides a unified interface for working with pre-trained bioacoustics representation learning models, with support for:

Model Loading: Load pre-trained models with checkpoints and class mappings
Embedding Extraction: Extract features from audio for downstream tasks
Probe System: Flexible probe heads (linear, MLP, LSTM, attention, transformer) for transfer learning
Training & Evaluation: Scripts for supervised learning experiments
Plugin Architecture: Register and use custom models seamlessly

Installation

Prerequisites

Python 3.10, 3.11, 3.12, or 3.13 (see requires-python in pyproject.toml; CI runs 3.13 on Ubuntu)

Install with pip

pip install avex

Install with uv

uv add avex

For development installation with training/evaluation tools, see the Contributing guide.

Quick Start

import torch
import librosa
from avex import load_model, list_models

# List available models
print(list_models().keys())

# Load a pre-trained model
model = load_model("esp_aves2_sl_beats_all", device="cpu")

# Load and preprocess audio (BEATs expects 16kHz)
audio, sr = librosa.load("your_audio.wav", sr=16000)
audio_tensor = torch.tensor(audio).unsqueeze(0)  # Shape: (1, num_samples)

# Run inference
with torch.no_grad():
    logits = model(audio_tensor)
    predicted_class = logits.argmax(dim=-1).item()

# Get human-readable label
if model.label_mapping:
    label = model.label_mapping.get(str(predicted_class), predicted_class)
    print(f"Predicted: {label}")

Embedding Extraction

# Load for embedding extraction (no classifier head)
model = load_model("esp_aves2_sl_beats_all", return_features_only=True, device="cpu")

with torch.no_grad():
    embeddings = model(audio_tensor)
    # Shape: (batch, time_steps, 768) for BEATs

# Pool to get fixed-size embedding
embedding = embeddings.mean(dim=1)  # Shape: (batch, 768)

Layer-wise Embedding Extraction (Multiple Layers)

For some analyses and probes it can help to extract embeddings from multiple internal layers.
You can select layers by index (0-based, negative indices allowed) instead of long module names.

import torch
from avex import load_model

model = load_model("esp_aves2_naturelm_audio_v1_beats", device="cpu")
print(model.get_model_layer_map())  # {0: "...", 1: "...", ...}

audio = torch.randn(1, 16000 * 5)
_ = model.register_hooks_for_layers([0, -1])
emb = model.extract_embeddings(audio, aggregation="mean")

Transfer Learning with Probes

from avex.models.probes import build_probe_from_config
from avex.configs import ProbeConfig

# Load backbone for feature extraction
base = load_model("esp_aves2_sl_beats_all", return_features_only=True, device="cpu")

# Define a probe head for your task
probe_config = ProbeConfig(
    probe_type="linear",
    target_layers=["last_layer"],
    aggregation="mean",
    freeze_backbone=True,
    online_training=True,
)

probe = build_probe_from_config(
    probe_config=probe_config,
    base_model=base,
    num_classes=10,  # Your number of classes
    device="cpu",
)

Documentation

Full documentation: docs/index.md

Core Documentation

API Reference - Complete API documentation for model loading, registry, and management functions
Architecture - Framework architecture, core components, and plugin system
Supported Models - List of supported models and their configurations
Configuration - ModelSpec parameters, audio requirements, and configuration options

Usage Guides

Training and Evaluation - Guide to training and evaluating models
Embedding Extraction - Working with feature representations and embeddings
Examples - Comprehensive examples and use cases

Advanced Topics

Probe System - Understanding and using probes for transfer learning
API Probes - API reference for probe-related functionality
Custom Model Registration - Guide on registering custom model classes and loading pre-trained models

Examples: See the examples/ directory:

00_quick_start.py - Basic model loading
01_basic_model_loading.py - Loading models with different configurations
02_checkpoint_loading.py - Working with checkpoints
03_custom_model_registration.py - Custom model registration
04_training_and_evaluation.py - Training and evaluation examples
05_embedding_extraction.py - Feature extraction
06_classifier_head_loading.py - Classifier head behavior

Supported Models

The framework supports the following audio representation learning models:

EfficientNet - EfficientNet-based models for audio classification
BEATs - BEATs transformer models for audio representation learning
EAT - Efficient Audio Transformer models
AVES - AVES model for bioacoustics
BirdMAE - BirdMAE masked autoencoder for bioacoustic representation learning
ATST - Audio Spectrogram Transformer
ResNet - ResNet models (ResNet18, ResNet50, ResNet152)
CLIP - Contrastive Language-Audio Pretraining models
BirdNet - BirdNet models for bioacoustic classification - external tensorflow model, some features might not be available
Perch - Perch models for bioacoustics - external tensorflow model, some features might not be available
SurfPerch - SurfPerch models - external tensorflow model, some features might not be available

See Supported Models for detailed information and configuration examples.

Supported Probes

The framework provides flexible probe heads for transfer learning:

Linear - Simple linear classifier (fastest, most memory-efficient)
MLP - Multi-layer perceptron with configurable hidden layers
LSTM - Long Short-Term Memory network for sequence modeling
Attention - Self-attention mechanism for sequence modeling
Transformer - Full transformer encoder architecture

Probes can be trained:

Online: End-to-end with the backbone (raw audio input)
Offline: On pre-computed embeddings

See Probe System and API Probes for detailed documentation.

Citing

If you use this framework in your research, please cite:

@inproceedings{miron2025matters,
  title={What Matters for Bioacoustic Encoding},
  author={Miron, Marius and Robinson, David and Alizadeh, Milad and Gilsenan-McMahon, Ellen and Narula, Gagan and Chemla, Emmanuel and Cusimano, Maddie and Effenberger, Felix and Hagiwara, Masato and Hoffman, Benjamin and Keen, Sara and Kim, Diane and Lawton, Jane K. and Liu, Jen-Yu and Raskin, Aza and Pietquin, Olivier and Geist, Matthieu},
  booktitle={The Fourteen International Conference on Learning Representations},
  year={2026}
}

Related ESP papers:

@inproceedings{miron2026probing,
  title={Multi-layer attentive probing improves transfer of audio representations for bioacoustics},
  author={Miron, Marius and Robinson, David and Hagiwara, Masato and Titouan, Parcollet and Cauzinille, Jules and and Narula, Gagan and Alizadeh, Milad and Gilsenan-McMahon, Ellen and Keen, Sara and Chemla, Emmanuel and Hoffman, Benjamin and Cusimano, Maddie and Kim, Diane and Effenberger, Felix and Lawton, Jane K. and Raskin, Aza and Pietquin, Olivier and Geist, Matthieu},
  booktitle={ICASSP 2026-2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={1--5},
  year={2026},
  organization={IEEE}
}
@inproceedings{hagiwara2023aves,
  title={Aves: Animal vocalization encoder based on self-supervision},
  author={Hagiwara, Masato},
  booktitle={ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={1--5},
  year={2023},
  organization={IEEE}
}

Contributing

We welcome contributions! Please see CONTRIBUTING.md for:

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built on top of PyTorch
ICLR2026 and ICASSP2026 reviewers for the feedback
Titouan Parcollet for templating, engineering feedback
Bioacoustics community (IBAC, BioDCASE, ABS)

Owner metadata

Name: Earth Species Project
Login: earthspecies
Email: humans@earthspecies.org
Kind: organization
Description: An open-source collaborative and nonprofit dedicated to decoding animal communication.
Website: https://earthspecies.org
Location:
Twitter:
Company:
Icon url: https://avatars.githubusercontent.com/u/36208630?v=4
Repositories: 25
Last ynced at: 2024-05-14T00:09:51.904Z
Profile URL: https://github.com/earthspecies

GitHub Events

Total

Delete event: 2
Member event: 1
Pull request event: 3
Issues event: 4
Watch event: 4
Push event: 51
Pull request review comment event: 4
Pull request review event: 8
Create event: 19

Last Year

Delete event: 2
Member event: 1
Pull request event: 3
Issues event: 4
Watch event: 4
Push event: 51
Pull request review comment event: 4
Pull request review event: 8
Create event: 19

Committers metadata

Last synced: 4 days ago

Total Commits: 162
Total Committers: 6
Avg Commits per committer: 27.0
Development Distribution Score (DDS): 0.432

Commits in past year: 67
Committers in past year: 5
Avg Commits per committer in past year: 13.4
Development Distribution Score (DDS) in past year: 0.194

Name	Email	Commits
Marius Miron	m**s@g**m	92
David	d**d@e**g	38
Milad Alizadeh	g**t@m**d	26
Gagan Narula	g**n@e**g	4
Benjamin Hoffman	7****n	1
CheekySparrow	c**y@s**m	1

Committer domains:

Issue and Pull Request metadata

Last synced: 4 days ago

Total issues: 9
Total pull requests: 26
Average time to close issues: 27 days
Average time to close pull requests: about 1 month
Total issue authors: 6
Total pull request authors: 4
Average comments per issue: 0.67
Average comments per pull request: 1.0
Merged pull request: 14
Bot issues: 0
Bot pull requests: 0

Past year issues: 9
Past year pull requests: 26
Past year average time to close issues: 27 days
Past year average time to close pull requests: about 1 month
Past year issue authors: 6
Past year pull request authors: 4
Past year average comments per issue: 0.67
Past year average comments per pull request: 1.0
Past year merged pull request: 14
Past year bot issues: 0
Past year bot pull requests: 0

More stats: https://issues.ecosyste.ms/repositories/lookup?url=https://github.com/earthspecies/avex

Top Issue Authors

nkundiushuti (4)
ilyassmoummad (1)
sodaJar (1)
aharmax (1)
eharris128 (1)
benjaminsshoffman (1)

Top Pull Request Authors

nkundiushuti (22)
david-rx (2)
chrispla (1)
mil-ad (1)

Top Issue Labels

bug (1)

Top Pull Request Labels

Package metadata

Total packages: 1
Total downloads:
- pypi: 559 last-month
Total dependent packages: 0
Total dependent repositories: 0
Total versions: 5
Total maintainers: 1

pypi.org: avex

A comprehensive Python-based system for training, evaluating, and analyzing audio representation learning models with support for both supervised and self-supervised learning paradigms

Homepage: https://github.com/earthspecies/avex
Documentation: https://github.com/earthspecies/avex#readme
Licenses: MIT
Latest release: 1.2.0 (published 21 days ago)
Last Synced: 2026-07-03T13:01:44.449Z (4 days ago)
Versions: 5
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 559 Last month
Rankings:
- Dependent packages count: 7.872%
- Average: 26.189%
- Dependent repos count: 44.505%
Maintainers (1)
- earthspecies

Score: 12.011516551067876

AVEX

Keywords

Repository metadata

README.md

AVEX - Animal Vocalization Encoder Library

Description

Installation

Prerequisites

Install with pip

Install with uv

Quick Start

Embedding Extraction

Layer-wise Embedding Extraction (Multiple Layers)

Transfer Learning with Probes

Documentation

Core Documentation

Usage Guides

Advanced Topics

Supported Models

Supported Probes

Citing

Contributing

License

Acknowledgments

Owner metadata

GitHub Events

Total

Last Year

Committers metadata

Committer domains:

Issue and Pull Request metadata

Top Issue Authors

Top Pull Request Authors

Top Issue Labels

Top Pull Request Labels

Package metadata

pypi.org: avex