KSO

The Koster Seafloor Observatory is an open-source, citizen science and machine learning approach to analyse subsea movies.
https://github.com/ocean-data-factory-sweden/kso

Category: Biosphere
Sub Category: Marine Life and Fishery

Keywords

citizen-science deep-learning marine-protected-areas object-detection

Keywords from Contributors

transformer optimize archiving measur language-model compose observation conversion generic animals

Last synced: about 9 hours ago
JSON representation

Repository metadata

Notebooks to upload/download marine footage, connect to a citizen science project, train machine learning models and publish marine biological observations.

README.md

KSO System

The KSO System is an open-source machine learning framework for underwater video analysis, developed from the Koster Seafloor Observatory research initiative and the Swedish Platform for Subsea Image Analysis (SUBSIM). It is optimized for GPU-accelerated HPC environments, particularly LUMI, and integrates with MLflow for experiment tracking.

Contributors
Forks
Stargazers
Issues
GPL License

πŸ“˜ New to KSO? Each notebook contains step-by-step instructions with clearly marked EDIT THIS cells. This README provides an overview β€” the notebooks will guide you through each stage.

System Overview

KSO System Overview

Quick Start

1) Choose your environment

See installation.

2) Run the notebooks

Use the table below to choose the first stage that matches what you already have.

You have… Start at
Raw footage only OR BIIGLE-annotated images but need a YOLO dataset 00 Data Preparation
A YOLO dataset (data.yaml + train/val/test splits) 01 Project Setup
A trained model (or weights) and you want to fine-tune on a dataset 02 Training & Eval
A trained model and you want to run inference on images/video 03 Inference + 04 Analysis
A validated model that you want to publish along with your dataset 05 Publish Model

Note: Notebooks 00, 03, 04, and 05 are still in development. For the recommended working path today, see the Standalone notebooks section below.

Notebook workflow

KSO is transitioning to a clear, staged notebook pipeline. Stages 01–02 are stable today; later stages are under active development.

Official Pipeline (00–05)

# Notebook Description Status
00 00_Data_Preparation.ipynb Transfer footage to LUMI (optional), extract frames, build your image set for annotation in BIIGLE, convert annotation CSV β†’ YOLO. Skip if you already have a YOLO dataset. πŸ”œ In development
01 01_Project_Setup.ipynb Create a KSO2 project (.project.yaml), attach your YOLO dataset, configure tracking, and optionally run offline augmentation. βœ… Stable
02 02_Train_and_Eval_Models.ipynb Train or fine-tune a YOLO model, track runs with MLflow, and evaluate on the test set. βœ… Stable
03 03_Inference.ipynb Run inference or batch inference on new images or video; export detections (CSV + annotated media). πŸ”œ In development
04 04_Analysis.ipynb Summary statistics, maxN, per-class summaries, and visualizations. 🚧 Planned
05 05_Publish_Models.ipynb Package models and metadata; publish to Zenodo or Researchdata.se. 🚧 Planned

Standalone Notebooks

While the official pipeline is being finalized, these notebooks provide a working path for new users β€” covering dataset preparation in BIIGLE, and end-to-end model training.

Notebook Path Covers
Biigle_to_YOLO.ipynb notebooks/setup/Biigle_to_YOLO.ipynb BIIGLE CSV β†’ YOLO conversion (data preparation for BIIGLE users)
Train_models.ipynb notebooks/analyse/Train_models.ipynb YOLO model training and fine-tuning using Ultralytics

Available YOLO models

The training notebook supports several Ultralytics model families, including YOLO11. See the notebook itself for the authoritative model list and parameters.

Practical guidance:

  • Small datasets (~100–250 images): prefer nano or small β€” larger models may overfit.
  • Medium datasets (~250–750 images): use medium for a good balance.
  • Large datasets (750+ images): consider large or xlarge if resources allow.

Installation

System requirements

  • Minimum: Python 3.11, 16 GB RAM, β‰ˆ10 GB free disk space.
  • Recommended: CUDA/ROCm-capable GPU (β‰₯8 GB VRAM) and access to an HPC system (e.g. LUMI).

Option 1 β€” LUMI (recommended)

KSO is primarily developed and tested on the LUMI supercomputer, running via a Singularity/Apptainer container on GPU nodes.

If you're a first time user, start here:

Option 2 β€” Other HPC systems

git clone -b dev https://github.com/ocean-data-factory-sweden/kso.git
cd kso
# Follow your HPC's recommended way to launch Jupyter or batch jobs

Use your center's standard GPU modules or containers, and bind project and scratch storage as appropriate.

Option 3 β€” Local development

For local use without HPC access. Training without a GPU will be slow; smaller models and lower batch sizes are recommended.

Docker

Note: The instructions below run the notebooks inside the container.
Any changes you make will be lost when the container stops unless you save them outside the container
(e.g., using a mounted volume: -v $(pwd):/opt/kso/notebooks).

# Pull kso with a suitable backend
docker pull ghcr.io/ocean-data-factory-sweden/kso:dev-ubuntu24.04             # CPU only
# docker pull ghcr.io/ocean-data-factory-sweden/kso:dev-cuda12.9-ubuntu24.04  # CUDA / NVIDIA GPUs
# docker pull ghcr.io/ocean-data-factory-sweden/kso:dev-rocm6.4-ubuntu24.04   # ROCm / AMD GPUs

# Run the notebooks
docker run -it --rm -p 8888:8888 ghcr.io/ocean-data-factory-sweden/kso:dev-ubuntu24.04 notebooks/
# docker run -it --rm -p 8888:8888 --gpus all ghcr.io/ocean-data-factory-sweden/kso:dev-cuda12.9-ubuntu24.04 notebooks/
# docker run -it --rm -p 8888:8888 --device /dev/kfd --device /dev/dri ghcr.io/ocean-data-factory-sweden/kso:dev-rocm6.4-ubuntu24.04 notebooks/

# Then open http://localhost:8888 in your browser and use the token printed out

pip in venv

# Create and activate a virtual environment
python3 -m venv venv
source venv/bin/activate

# Fetch the repository
git clone -b dev https://github.com/ocean-data-factory-sweden/kso.git
cd kso

# Install kso with a suitable backend
pip install -e .[dev] --extra-index-url https://download.pytorch.org/whl/cpu        # CPU only
# pip install -e .[dev] --extra-index-url https://download.pytorch.org/whl/cu129    # CUDA / NVIDIA GPUs
# pip install -e .[dev] --extra-index-url https://download.pytorch.org/whl/rocm6.4  # ROCm / AMD GPUs

# Run the notebooks
jupyter lab notebooks/

Developer Instructions

We welcome contributions!

  1. Work from the dev branch; create feature branches off dev.
  2. Format Python code with Black:
    black filename.py
    
  3. Use Conventional Commits for messages: feat:, fix:, docs:, refactor:, test:.
  4. Keep commit history clean and logical (squash where appropriate) and rebase onto dev (never merge).
  5. Open a Pull Request targeting dev and request at least 2 reviewers.

Citation

If this code or its trained models contribute to your research, please cite:

Anton V, Germishuys J, BergstrΓΆm P, Lindegarth M, Obst M (2021). An open-source, citizen science and machine learning approach to analyse subsea movies. Biodiversity Data Journal 9: e60548. https://doi.org/10.3897/BDJ.9.e60548

Support & Contact

We are always excited to collaborate with marine scientists. Feel free to reach out with questions or ideas!

Legacy Notebooks (Zooniverse workflow)

These notebooks implement the original Zooniverse citizen-science pipeline and are maintained for existing projects. For new work, use the main workflow above.

Task Notebook Description Colab
Check Zooniverse metadata Check_metadata Check format of footage, sites, media and species CSV files Open In Colab
Classify Upload_subjects_to_Zooniverse Prepare footage and upload clips to Zooniverse Open In Colab
Classify Process_classifications Pull and process classifications from Zooniverse Open In Colab
Analyse Evaluate_models Standalone model evaluation Open In Colab
Publish Publish_models Publish model to a public repository Open In Colab
Publish Publish_observations Export observations to GBIF/OBIS Open In Colab

License

SUBSIM/KSO is released under the GPL-3.0 license. See LICENSE.txt for details.


Owner metadata


GitHub Events

Total
Last Year

Committers metadata

Last synced: 3 days ago

Total Commits: 1,032
Total Committers: 26
Avg Commits per committer: 39.692
Development Distribution Score (DDS): 0.401

Commits in past year: 235
Committers in past year: 20
Avg Commits per committer in past year: 11.75
Development Distribution Score (DDS) in past year: 0.702

Name Email Commits
Jurie Germishuys j****s@c****e 618
Diewertje11 d****r@c****e 100
Victor 5****e 91
Ghaith g****h@G****l 70
Tuomas Rossi t****i@c****i 43
Louis Fiorina 1****f 20
Ghaith g****h@w****z 11
Ghaith g****h@v****z 11
nithador t****c@v****z 11
Jannes 3****g 10
Pablo Correa GΓ³mez p****z@c****e 10
PilarNavarro p****r@h****s 5
Ghaith g****h@v****z 5
Ghaith g****h@w****z 4
Ghaith g****h@w****z 3
Ghaith g****h@w****z 3
Ghaith g****h@v****z 3
Ghaith g****h@v****z 3
Jurie Germishuys j****g@a****e 2
Ghaith g****h@v****z 2
Ghaith g****h@v****z 2
Ghaith g****h@v****z 1
Ghaith g****h@v****z 1
Ghaith g****h@w****z 1
Ghaith g****h@w****z 1
dependabot[bot] 4****] 1

Committer domains:


Issue and Pull Request metadata

Last synced: 3 days ago

Total issues: 195
Total pull requests: 159
Average time to close issues: about 1 month
Average time to close pull requests: 11 days
Total issue authors: 13
Total pull request authors: 10
Average comments per issue: 1.26
Average comments per pull request: 1.92
Merged pull request: 94
Bot issues: 0
Bot pull requests: 25

Past year issues: 17
Past year pull requests: 26
Past year average time to close issues: about 2 months
Past year average time to close pull requests: 19 days
Past year issue authors: 6
Past year pull request authors: 6
Past year average comments per issue: 0.71
Past year average comments per pull request: 3.04
Past year merged pull request: 17
Past year bot issues: 0
Past year bot pull requests: 0

More stats: https://issues.ecosyste.ms/repositories/lookup?url=https://github.com/ocean-data-factory-sweden/kso

Top Issue Authors

  • Bergylta (68)
  • victor-wildlife (43)
  • jannesgg (37)
  • Diewertje11 (21)
  • donkyjohn (5)
  • louisrf (4)
  • Nithador (4)
  • ShrimpFather7 (3)
  • trossi (3)
  • KalindiFonda (2)
  • pabloyoyoista (2)
  • pilarnavarro (2)
  • XhD98 (1)

Top Pull Request Authors

  • victor-wildlife (58)
  • Diewertje11 (25)
  • dependabot[bot] (25)
  • jannesgg (19)
  • trossi (11)
  • louisrf (9)
  • pilarnavarro (5)
  • GhaithChaabane (4)
  • pabloyoyoista (2)
  • Nithador (1)

Top Issue Labels

  • bug (87)
  • enhancement (35)
  • Development (18)
  • good first issue (10)
  • Enhancement (5)
  • Spyfish (3)
  • Documentation (2)
  • Support (2)
  • Dependencies (2)
  • documentation (1)
  • help wanted (1)
  • question (1)
  • dependencies (1)

Top Pull Request Labels

  • Enhancement (2)
  • Documentation (1)
  • Bug (1)
  • bug (1)

Dependencies

requirements.txt pypi
  • PIMS ==0.6.1
  • PyYAML >=5.3.1
  • av ==8.1.0
  • boto3 ==1.26.64
  • dataclass-csv ==1.4.0
  • easydict ==1.9.0
  • fastapi ==0.73.0
  • ffmpeg-python ==0.2.0
  • gdown ==3.13.0
  • imagesize ==1.4.1
  • ipyfilechooser ==0.4.4
  • itables ==0.3.0
  • jupyter ==1.0.0
  • jupyter_bbox_widget ==0.5.0
  • matplotlib >=3.2.2
  • moviepy ==1.0.3
  • natsort ==8.1.0
  • numpy >=1.18.5
  • opencv-contrib-python *
  • opencv-python ==4.6.0.66
  • opencv-python-headless *
  • openpyxl ==3.1.0
  • pandas ==1.1.4
  • panoptes-client ==1.5.0
  • protobuf ==3.15.8
  • pyopenssl >=23
  • python-magic ==0.4.24
  • python-multipart ==0.0.5
  • scipy >=1.4.1
  • scp ==0.14.1
  • seaborn >=0.11.0
  • split-folders ==0.5.1
  • tensorboard >=2.4.1
  • thop *
  • tqdm >=4.41.0
  • uvicorn ==0.17.2
  • wandb *
.github/workflows/detect-unused-code.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
.github/workflows/code-formatting.yml actions
  • actions/checkout v3 composite
  • psf/black stable composite
containers/base/Dockerfile docker
  • ${BASE_IMAGE} latest build
  • ${FFMPEG_BASE_IMAGE} latest build
requirements_cpu.txt pypi
  • torch ==2.8.0
  • torchvision ==0.23.0
requirements_rocm6.4.txt pypi
  • torch ==2.8.0
  • torchvision ==0.23.0
.github/actions/increase-docker-space/action.yml actions
requirements_cuda12.9.txt pypi
  • torch ==2.8.0
  • torchvision ==0.23.0
containers/final/Dockerfile docker
  • ${BASE_IMAGE} latest build
.github/workflows/build-and-test.yml actions
  • ./.github/actions/increase-docker-space * composite
  • actions/checkout v6 composite
  • docker/build-push-action v6 composite
  • docker/login-action v3 composite
  • docker/setup-buildx-action v3 composite
  • docker/setup-qemu-action v3 composite

Score: 6.436150368369428