KSO

The Koster Seafloor Observatory is an open-source, citizen science and machine learning approach to analyse subsea movies.
https://github.com/ocean-data-factory-sweden/kso

Category: Biosphere
Sub Category: Marine Life and Fishery

Keywords

citizen-science deep-learning marine-protected-areas object-detection

Keywords from Contributors

transformer optimize archiving measur language-model compose observation conversion generic animals

Last synced: about 9 hours ago
JSON representation

Repository metadata

Notebooks to upload/download marine footage, connect to a citizen science project, train machine learning models and publish marine biological observations.

README.md

KSO System

The KSO System is an open-source machine learning framework for underwater video analysis, developed from the Koster Seafloor Observatory research initiative and the Swedish Platform for Subsea Image Analysis (SUBSIM). It is optimized for GPU-accelerated HPC environments, particularly LUMI, and integrates with MLflow for experiment tracking.

Contributors
Forks
Stargazers
Issues
GPL License

πŸ“˜ New to KSO? Each notebook contains step-by-step instructions with clearly marked EDIT THIS cells. This README provides an overview β€” the notebooks will guide you through each stage.

System Overview

KSO System Overview

Quick Start

1. Clone the repository

git clone -b dev https://github.com/ocean-data-factory-sweden/kso.git
cd kso

2. Choose your environment

For LUMI (recommended): see docs/LUMI_SETUP.md for container setup and Jupyter session configuration.

For local development:

pip install -r requirements.txt
jupyter lab

3. Run the notebooks

Use the table below to choose the first stage that matches what you already have.

You have… Start at
Raw footage only OR Already annotated images in Biigle but need a dataset 00 Data Preparation
A YOLO dataset (data.yaml + train/val/test splits) 01 Project Setup
A trained model (or weights) and you want to fine-tune on a dataset 02 Training & Eval
A trained model and you want to run inference on images/video 03 Inference + 04 Analysis
A validated model that you want to publish along with your dataset 05 Publish Model

Note: Notebooks 00, 03, 04, and 05 are still in development. For a working path today, see the Standalone Notebooks section below.

Official Pipeline (00–05)

# Notebook Description Status
00 00_Data_Preparation.ipynb Transfer footage to LUMI (optional), extract frames, build your image set for annotation in Biigle, convert annotation CSV to YOLO format. Skip if you already have a YOLO dataset. πŸ”œ In development
01 01_Project_Setup.ipynb Create a KSO2 project (.project.yaml), attach your YOLO dataset, configure tracking, and optionally run offline augmentation. βœ… Stable
02 02_Train_and_Eval_Models.ipynb Train or fine-tune a YOLO model, track runs with MLflow, and evaluate on the test set. βœ… Stable
03 03_Inference.ipynb Run inference or batch inference on new images or video; export detections (CSV + annotated media). πŸ”œ In development
04 04_Analysis.ipynb Summary statistics, maxN, per-class summaries, and visualizations. 🚧 Planned
05 05_Publish_Models.ipynb Package models and metadata; publish to Zenodo or Researchdata.se. 🚧 Planned

Standalone Notebooks

While the official pipeline is being finalized, these notebooks provide a working path for new users β€” covering dataset preparation in Biigle, and model training end-to-end.

Notebook Path Covers
Biigle_to_YOLO.ipynb notebooks/setup/Biigle_to_YOLO.ipynb Biigle CSV β†’ YOLO conversion (data preparation for our Biigle users)
Train_models.ipynb notebooks/analyse/Train_models.ipynb YOLO model training and fine-tuning using Ultralytics

Available YOLO models

The training notebook supports several Ultralytics model families. See the notebook itself for full model tables.

Supported families include YOLOv8, YOLOv9, YOLOv10, and YOLO11. Each offers sizes from nano (n) through xlarge (x).

Practical guidance:

  • Small datasets (~100–250 images): prefer nano or small β€” larger models will overfit.
  • Medium datasets (~250–750 images): use medium for a good balance.
  • Large datasets (750+ images): consider large or xlarge if resources allow.

Installation

System requirements

  • Minimum: Python 3.12, 16 GB RAM, β‰ˆ10 GB free disk space.
  • Recommended: CUDA/ROCm-capable GPU (β‰₯8 GB VRAM) and access to an HPC system (e.g. LUMI).

Option 1 β€” LUMI (recommended)

KSO is primarily developed and tested on the LUMI supercomputer, running via a Singularity/Apptainer container on GPU nodes. If you're a first time user:

Option 2 β€” Other HPC systems

git clone -b dev https://github.com/ocean-data-factory-sweden/kso.git
cd kso
# Follow your HPC's recommended way to launch Jupyter or batch jobs

Use your center's standard GPU modules or containers, and bind project and scratch storage as appropriate.

Option 3 β€” Local development

For local use without HPC access. Note that training without a GPU will be slow, and smaller models with lower batch sizes are recommended.

Docker:

⚠️ Note: The Docker image may not be up to date with the current codebase. Verify it works for your use case before relying on it.

docker pull ghcr.io/ocean-data-factory-sweden/kso:dev
docker run --gpus all -it -p 8888:8888 ghcr.io/ocean-data-factory-sweden/kso:dev
# Then open http://localhost:8888 in your browser

pip:

git clone -b dev https://github.com/ocean-data-factory-sweden/kso.git
cd kso
pip install -r requirements.txt
jupyter lab

Developer Instructions

We welcome contributions!

  1. Work from the dev branch; create feature branches off dev.
  2. Format Python code with Black:
    black filename.py
    
  3. Use Conventional Commits for messages: feat:, fix:, docs:, refactor:, test:.
  4. Keep commit history clean and logical (squash where appropriate) and rebase onto dev (never merge).
  5. Open a Pull Request targeting dev and request at least 2 reviewers.

Citation

If this code or its trained models contribute to your research, please cite:

Anton V, Germishuys J, BergstrΓΆm P, Lindegarth M, Obst M (2021). An open-source, citizen science and machine learning approach to analyse subsea movies. Biodiversity Data Journal 9: e60548. https://doi.org/10.3897/BDJ.9.e60548

Support & Contact

We are always excited to collaborate with marine scientists. Feel free to reach out with questions or ideas!

Legacy Notebooks (Zooniverse workflow)

These notebooks implement the original Zooniverse citizen-science pipeline and are maintained for existing projects. For new work, use the main workflow above.

Task Notebook Description Colab
Check Zooniverse metadata Check_metadata Check format of footage, sites, media and species CSV files Open In Colab
Classify Upload_subjects_to_Zooniverse Prepare footage and upload clips to Zooniverse Open In Colab
Classify Process_classifications Pull and process classifications from Zooniverse Open In Colab
Analyse Evaluate_models Standalone model evaluation Open In Colab
Publish Publish_models Publish model to a public repository Open In Colab
Publish Publish_observations Export observations to GBIF/OBIS Open In Colab

License

SUBSIM/KSO is released under the GPL-3.0 license. See LICENSE.txt for details.


Owner metadata


GitHub Events

Total
Last Year

Committers metadata

Last synced: 7 days ago

Total Commits: 985
Total Committers: 21
Avg Commits per committer: 46.905
Development Distribution Score (DDS): 0.373

Commits in past year: 234
Committers in past year: 16
Avg Commits per committer in past year: 14.625
Development Distribution Score (DDS) in past year: 0.688

Name Email Commits
Jurie Germishuys j****s@c****e 618
Diewertje11 d****r@c****e 100
Victor 5****e 91
Ghaith g****h@G****l 62
Tuomas Rossi t****i@c****i 31
Louis Fiorina 1****f 14
Ghaith g****h@v****z 11
Pablo Correa GΓ³mez p****z@c****e 10
Jannes 3****g 10
nithador t****c@v****z 8
PilarNavarro p****r@h****s 5
Ghaith g****h@v****z 5
Ghaith g****h@w****z 4
Ghaith g****h@w****z 3
Ghaith g****h@v****z 3
Ghaith g****h@v****z 3
Jurie Germishuys j****g@a****e 2
Ghaith g****h@v****z 2
Ghaith g****h@v****z 1
Ghaith g****h@v****z 1
dependabot[bot] 4****] 1

Committer domains:


Issue and Pull Request metadata

Last synced: 11 days ago

Total issues: 190
Total pull requests: 151
Average time to close issues: about 1 month
Average time to close pull requests: 11 days
Total issue authors: 12
Total pull request authors: 10
Average comments per issue: 1.24
Average comments per pull request: 1.91
Merged pull request: 85
Bot issues: 0
Bot pull requests: 25

Past year issues: 13
Past year pull requests: 22
Past year average time to close issues: about 1 month
Past year average time to close pull requests: about 2 months
Past year issue authors: 6
Past year pull request authors: 7
Past year average comments per issue: 0.23
Past year average comments per pull request: 3.32
Past year merged pull request: 10
Past year bot issues: 0
Past year bot pull requests: 0

More stats: https://issues.ecosyste.ms/repositories/lookup?url=https://github.com/ocean-data-factory-sweden/kso

Top Issue Authors

  • Bergylta (68)
  • victor-wildlife (43)
  • jannesgg (37)
  • Diewertje11 (21)
  • donkyjohn (5)
  • ShrimpFather7 (3)
  • trossi (3)
  • Nithador (3)
  • KalindiFonda (2)
  • pabloyoyoista (2)
  • pilarnavarro (2)
  • XhD98 (1)

Top Pull Request Authors

  • victor-wildlife (58)
  • Diewertje11 (25)
  • dependabot[bot] (25)
  • jannesgg (19)
  • trossi (8)
  • pilarnavarro (5)
  • louisrf (5)
  • GhaithChaabane (3)
  • pabloyoyoista (2)
  • Nithador (1)

Top Issue Labels

  • bug (87)
  • enhancement (35)
  • Development (18)
  • good first issue (10)
  • Enhancement (4)
  • Spyfish (3)
  • Documentation (2)
  • Support (2)
  • question (1)
  • dependencies (1)
  • documentation (1)
  • help wanted (1)

Top Pull Request Labels

  • Enhancement (2)
  • bug (1)
  • Bug (1)
  • Documentation (1)

Dependencies

requirements.txt pypi
  • PIMS ==0.6.1
  • PyYAML >=5.3.1
  • av ==8.1.0
  • boto3 ==1.26.64
  • dataclass-csv ==1.4.0
  • easydict ==1.9.0
  • fastapi ==0.73.0
  • ffmpeg-python ==0.2.0
  • gdown ==3.13.0
  • imagesize ==1.4.1
  • ipyfilechooser ==0.4.4
  • itables ==0.3.0
  • jupyter ==1.0.0
  • jupyter_bbox_widget ==0.5.0
  • matplotlib >=3.2.2
  • moviepy ==1.0.3
  • natsort ==8.1.0
  • numpy >=1.18.5
  • opencv-contrib-python *
  • opencv-python ==4.6.0.66
  • opencv-python-headless *
  • openpyxl ==3.1.0
  • pandas ==1.1.4
  • panoptes-client ==1.5.0
  • protobuf ==3.15.8
  • pyopenssl >=23
  • python-magic ==0.4.24
  • python-multipart ==0.0.5
  • scipy >=1.4.1
  • scp ==0.14.1
  • seaborn >=0.11.0
  • split-folders ==0.5.1
  • tensorboard >=2.4.1
  • thop *
  • tqdm >=4.41.0
  • uvicorn ==0.17.2
  • wandb *
.github/workflows/detect-unused-code.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
.github/workflows/code-formatting.yml actions
  • actions/checkout v3 composite
  • psf/black stable composite
containers/base/Dockerfile docker
  • ${BASE_IMAGE} latest build
  • ${FFMPEG_BASE_IMAGE} latest build
requirements_cpu.txt pypi
  • torch ==2.8.0
  • torchvision ==0.23.0
requirements_rocm6.4.txt pypi
  • torch ==2.8.0
  • torchvision ==0.23.0
.github/actions/increase-docker-space/action.yml actions
requirements_cuda12.9.txt pypi
  • torch ==2.8.0
  • torchvision ==0.23.0
containers/final/Dockerfile docker
  • ${BASE_IMAGE} latest build
.github/workflows/build-and-test.yml actions
  • ./.github/actions/increase-docker-space * composite
  • actions/checkout v6 composite
  • docker/build-push-action v6 composite
  • docker/login-action v3 composite
  • docker/setup-buildx-action v3 composite
  • docker/setup-qemu-action v3 composite

Score: 6.222576268071369