KSO
The Koster Seafloor Observatory is an open-source, citizen science and machine learning approach to analyse subsea movies.
https://github.com/ocean-data-factory-sweden/kso
Category: Biosphere
Sub Category: Marine Life and Fishery
Keywords
citizen-science deep-learning marine-protected-areas object-detection
Keywords from Contributors
transformer optimize archiving measur language-model compose observation conversion generic animals
Last synced: about 9 hours ago
JSON representation
Repository metadata
Notebooks to upload/download marine footage, connect to a citizen science project, train machine learning models and publish marine biological observations.
- Host: GitHub
- URL: https://github.com/ocean-data-factory-sweden/kso
- Owner: ocean-data-factory-sweden
- License: gpl-3.0
- Created: 2021-07-01T14:47:48.000Z (almost 5 years ago)
- Default Branch: dev
- Last Pushed: 2026-05-24T13:23:09.000Z (8 days ago)
- Last Synced: 2026-05-24T14:08:49.345Z (8 days ago)
- Topics: citizen-science, deep-learning, marine-protected-areas, object-detection
- Language: Python
- Homepage:
- Size: 16.4 MB
- Stars: 8
- Watchers: 2
- Forks: 17
- Open Issues: 16
- Releases: 2
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
README.md
KSO System
The KSO System is an open-source machine learning framework for underwater video analysis, developed from the Koster Seafloor Observatory research initiative and the Swedish Platform for Subsea Image Analysis (SUBSIM). It is optimized for GPU-accelerated HPC environments, particularly LUMI, and integrates with MLflow for experiment tracking.
π New to KSO? Each notebook contains step-by-step instructions with clearly marked EDIT THIS cells. This README provides an overview β the notebooks will guide you through each stage.
System Overview

Quick Start
1) Choose your environment
See installation.
2) Run the notebooks
Use the table below to choose the first stage that matches what you already have.
| You have⦠| Start at |
|---|---|
| Raw footage only OR BIIGLE-annotated images but need a YOLO dataset | 00 Data Preparation |
A YOLO dataset (data.yaml + train/val/test splits) |
01 Project Setup |
| A trained model (or weights) and you want to fine-tune on a dataset | 02 Training & Eval |
| A trained model and you want to run inference on images/video | 03 Inference + 04 Analysis |
| A validated model that you want to publish along with your dataset | 05 Publish Model |
Note: Notebooks 00, 03, 04, and 05 are still in development. For the recommended working path today, see the Standalone notebooks section below.
Notebook workflow
KSO is transitioning to a clear, staged notebook pipeline. Stages 01β02 are stable today; later stages are under active development.
Official Pipeline (00β05)
| # | Notebook | Description | Status |
|---|---|---|---|
| 00 | 00_Data_Preparation.ipynb | Transfer footage to LUMI (optional), extract frames, build your image set for annotation in BIIGLE, convert annotation CSV β YOLO. Skip if you already have a YOLO dataset. | π In development |
| 01 | 01_Project_Setup.ipynb | Create a KSO2 project (.project.yaml), attach your YOLO dataset, configure tracking, and optionally run offline augmentation. |
β Stable |
| 02 | 02_Train_and_Eval_Models.ipynb | Train or fine-tune a YOLO model, track runs with MLflow, and evaluate on the test set. | β Stable |
| 03 | 03_Inference.ipynb | Run inference or batch inference on new images or video; export detections (CSV + annotated media). | π In development |
| 04 | 04_Analysis.ipynb | Summary statistics, maxN, per-class summaries, and visualizations. | π§ Planned |
| 05 | 05_Publish_Models.ipynb | Package models and metadata; publish to Zenodo or Researchdata.se. | π§ Planned |
Standalone Notebooks
While the official pipeline is being finalized, these notebooks provide a working path for new users β covering dataset preparation in BIIGLE, and end-to-end model training.
| Notebook | Path | Covers |
|---|---|---|
| Biigle_to_YOLO.ipynb | notebooks/setup/Biigle_to_YOLO.ipynb |
BIIGLE CSV β YOLO conversion (data preparation for BIIGLE users) |
| Train_models.ipynb | notebooks/analyse/Train_models.ipynb |
YOLO model training and fine-tuning using Ultralytics |
Available YOLO models
The training notebook supports several Ultralytics model families, including YOLO11. See the notebook itself for the authoritative model list and parameters.
Practical guidance:
- Small datasets (~100β250 images): prefer nano or small β larger models may overfit.
- Medium datasets (~250β750 images): use medium for a good balance.
- Large datasets (750+ images): consider large or xlarge if resources allow.
Installation
System requirements
- Minimum: Python 3.11, 16 GB RAM, β10 GB free disk space.
- Recommended: CUDA/ROCm-capable GPU (β₯8 GB VRAM) and access to an HPC system (e.g. LUMI).
Option 1 β LUMI (recommended)
KSO is primarily developed and tested on the LUMI supercomputer, running via a Singularity/Apptainer container on GPU nodes.
If you're a first time user, start here:
Option 2 β Other HPC systems
git clone -b dev https://github.com/ocean-data-factory-sweden/kso.git
cd kso
# Follow your HPC's recommended way to launch Jupyter or batch jobs
Use your center's standard GPU modules or containers, and bind project and scratch storage as appropriate.
Option 3 β Local development
For local use without HPC access. Training without a GPU will be slow; smaller models and lower batch sizes are recommended.
Docker
Note: The instructions below run the notebooks inside the container.
Any changes you make will be lost when the container stops unless you save them outside the container
(e.g., using a mounted volume: -v $(pwd):/opt/kso/notebooks).
# Pull kso with a suitable backend
docker pull ghcr.io/ocean-data-factory-sweden/kso:dev-ubuntu24.04 # CPU only
# docker pull ghcr.io/ocean-data-factory-sweden/kso:dev-cuda12.9-ubuntu24.04 # CUDA / NVIDIA GPUs
# docker pull ghcr.io/ocean-data-factory-sweden/kso:dev-rocm6.4-ubuntu24.04 # ROCm / AMD GPUs
# Run the notebooks
docker run -it --rm -p 8888:8888 ghcr.io/ocean-data-factory-sweden/kso:dev-ubuntu24.04 notebooks/
# docker run -it --rm -p 8888:8888 --gpus all ghcr.io/ocean-data-factory-sweden/kso:dev-cuda12.9-ubuntu24.04 notebooks/
# docker run -it --rm -p 8888:8888 --device /dev/kfd --device /dev/dri ghcr.io/ocean-data-factory-sweden/kso:dev-rocm6.4-ubuntu24.04 notebooks/
# Then open http://localhost:8888 in your browser and use the token printed out
pip in venv
# Create and activate a virtual environment
python3 -m venv venv
source venv/bin/activate
# Fetch the repository
git clone -b dev https://github.com/ocean-data-factory-sweden/kso.git
cd kso
# Install kso with a suitable backend
pip install -e .[dev] --extra-index-url https://download.pytorch.org/whl/cpu # CPU only
# pip install -e .[dev] --extra-index-url https://download.pytorch.org/whl/cu129 # CUDA / NVIDIA GPUs
# pip install -e .[dev] --extra-index-url https://download.pytorch.org/whl/rocm6.4 # ROCm / AMD GPUs
# Run the notebooks
jupyter lab notebooks/
Developer Instructions
We welcome contributions!
- Work from the
devbranch; create feature branches offdev. - Format Python code with Black:
black filename.py - Use Conventional Commits for messages:
feat:,fix:,docs:,refactor:,test:. - Keep commit history clean and logical (squash where appropriate) and rebase onto
dev(never merge). - Open a Pull Request targeting
devand request at least 2 reviewers.
Citation
If this code or its trained models contribute to your research, please cite:
Anton V, Germishuys J, BergstrΓΆm P, Lindegarth M, Obst M (2021). An open-source, citizen science and machine learning approach to analyse subsea movies. Biodiversity Data Journal 9: e60548. https://doi.org/10.3897/BDJ.9.e60548
Support & Contact
- Website: https://subsim.se
- Issues: GitHub Issues
- Contact: matthias.obst(at)marine.gu.se
We are always excited to collaborate with marine scientists. Feel free to reach out with questions or ideas!
Legacy Notebooks (Zooniverse workflow)
These notebooks implement the original Zooniverse citizen-science pipeline and are maintained for existing projects. For new work, use the main workflow above.
License
SUBSIM/KSO is released under the GPL-3.0 license. See LICENSE.txt for details.
Owner metadata
- Name: Ocean Data Factory Sweden
- Login: ocean-data-factory-sweden
- Email: torsten.linders@gu.se
- Kind: organization
- Description:
- Website:
- Location:
- Twitter:
- Company:
- Icon url: https://avatars.githubusercontent.com/u/54248548?v=4
- Repositories: 4
- Last ynced at: 2023-03-03T19:53:11.188Z
- Profile URL: https://github.com/ocean-data-factory-sweden
GitHub Events
Total
- Delete event: 8
- Member event: 2
- Pull request event: 21
- Fork event: 3
- Issues event: 77
- Watch event: 3
- Issue comment event: 53
- Push event: 195
- Pull request review comment event: 54
- Pull request review event: 31
- Create event: 18
Last Year
- Delete event: 8
- Pull request event: 14
- Fork event: 2
- Issues event: 57
- Issue comment event: 26
- Push event: 169
- Pull request review comment event: 51
- Pull request review event: 27
- Create event: 11
Committers metadata
Last synced: 3 days ago
Total Commits: 1,032
Total Committers: 26
Avg Commits per committer: 39.692
Development Distribution Score (DDS): 0.401
Commits in past year: 235
Committers in past year: 20
Avg Commits per committer in past year: 11.75
Development Distribution Score (DDS) in past year: 0.702
| Name | Commits | |
|---|---|---|
| Jurie Germishuys | j****s@c****e | 618 |
| Diewertje11 | d****r@c****e | 100 |
| Victor | 5****e | 91 |
| Ghaith | g****h@G****l | 70 |
| Tuomas Rossi | t****i@c****i | 43 |
| Louis Fiorina | 1****f | 20 |
| Ghaith | g****h@w****z | 11 |
| Ghaith | g****h@v****z | 11 |
| nithador | t****c@v****z | 11 |
| Jannes | 3****g | 10 |
| Pablo Correa GΓ³mez | p****z@c****e | 10 |
| PilarNavarro | p****r@h****s | 5 |
| Ghaith | g****h@v****z | 5 |
| Ghaith | g****h@w****z | 4 |
| Ghaith | g****h@w****z | 3 |
| Ghaith | g****h@w****z | 3 |
| Ghaith | g****h@v****z | 3 |
| Ghaith | g****h@v****z | 3 |
| Jurie Germishuys | j****g@a****e | 2 |
| Ghaith | g****h@v****z | 2 |
| Ghaith | g****h@v****z | 2 |
| Ghaith | g****h@v****z | 1 |
| Ghaith | g****h@v****z | 1 |
| Ghaith | g****h@w****z | 1 |
| Ghaith | g****h@w****z | 1 |
| dependabot[bot] | 4****] | 1 |
Committer domains:
- combine.se: 3
- csc.fi: 1
- w115-35.vsb.cz: 1
- vpns378.vsb.cz: 1
- vsb.cz: 1
- hotmail.es: 1
- vpnp119.vsb.cz: 1
- w116-81.vsb.cz: 1
- w113-176.vsb.cz: 1
- w113-138.vsb.cz: 1
- vpns32.vsb.cz: 1
- vpns162.vsb.cz: 1
- alvis1.int.private: 1
- vpns479.vsb.cz: 1
- vpns27.vsb.cz: 1
- vpns122.vsb.cz: 1
- vpns447.vsb.cz: 1
- w116-135.vsb.cz: 1
- w118-8.vsb.cz: 1
Issue and Pull Request metadata
Last synced: 3 days ago
Total issues: 195
Total pull requests: 159
Average time to close issues: about 1 month
Average time to close pull requests: 11 days
Total issue authors: 13
Total pull request authors: 10
Average comments per issue: 1.26
Average comments per pull request: 1.92
Merged pull request: 94
Bot issues: 0
Bot pull requests: 25
Past year issues: 17
Past year pull requests: 26
Past year average time to close issues: about 2 months
Past year average time to close pull requests: 19 days
Past year issue authors: 6
Past year pull request authors: 6
Past year average comments per issue: 0.71
Past year average comments per pull request: 3.04
Past year merged pull request: 17
Past year bot issues: 0
Past year bot pull requests: 0
Top Issue Authors
- Bergylta (68)
- victor-wildlife (43)
- jannesgg (37)
- Diewertje11 (21)
- donkyjohn (5)
- louisrf (4)
- Nithador (4)
- ShrimpFather7 (3)
- trossi (3)
- KalindiFonda (2)
- pabloyoyoista (2)
- pilarnavarro (2)
- XhD98 (1)
Top Pull Request Authors
- victor-wildlife (58)
- Diewertje11 (25)
- dependabot[bot] (25)
- jannesgg (19)
- trossi (11)
- louisrf (9)
- pilarnavarro (5)
- GhaithChaabane (4)
- pabloyoyoista (2)
- Nithador (1)
Top Issue Labels
- bug (87)
- enhancement (35)
- Development (18)
- good first issue (10)
- Enhancement (5)
- Spyfish (3)
- Documentation (2)
- Support (2)
- Dependencies (2)
- documentation (1)
- help wanted (1)
- question (1)
- dependencies (1)
Top Pull Request Labels
- Enhancement (2)
- Documentation (1)
- Bug (1)
- bug (1)
Dependencies
- PIMS ==0.6.1
- PyYAML >=5.3.1
- av ==8.1.0
- boto3 ==1.26.64
- dataclass-csv ==1.4.0
- easydict ==1.9.0
- fastapi ==0.73.0
- ffmpeg-python ==0.2.0
- gdown ==3.13.0
- imagesize ==1.4.1
- ipyfilechooser ==0.4.4
- itables ==0.3.0
- jupyter ==1.0.0
- jupyter_bbox_widget ==0.5.0
- matplotlib >=3.2.2
- moviepy ==1.0.3
- natsort ==8.1.0
- numpy >=1.18.5
- opencv-contrib-python *
- opencv-python ==4.6.0.66
- opencv-python-headless *
- openpyxl ==3.1.0
- pandas ==1.1.4
- panoptes-client ==1.5.0
- protobuf ==3.15.8
- pyopenssl >=23
- python-magic ==0.4.24
- python-multipart ==0.0.5
- scipy >=1.4.1
- scp ==0.14.1
- seaborn >=0.11.0
- split-folders ==0.5.1
- tensorboard >=2.4.1
- thop *
- tqdm >=4.41.0
- uvicorn ==0.17.2
- wandb *
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- psf/black stable composite
- ${BASE_IMAGE} latest build
- ${FFMPEG_BASE_IMAGE} latest build
- torch ==2.8.0
- torchvision ==0.23.0
- torch ==2.8.0
- torchvision ==0.23.0
- torch ==2.8.0
- torchvision ==0.23.0
- ${BASE_IMAGE} latest build
- ./.github/actions/increase-docker-space * composite
- actions/checkout v6 composite
- docker/build-push-action v6 composite
- docker/login-action v3 composite
- docker/setup-buildx-action v3 composite
- docker/setup-qemu-action v3 composite
Score: 6.436150368369428