KSO
The Koster Seafloor Observatory is an open-source, citizen science and machine learning approach to analyse subsea movies.
https://github.com/ocean-data-factory-sweden/kso
Category: Biosphere
Sub Category: Marine Life and Fishery
Keywords
citizen-science deep-learning marine-protected-areas object-detection
Keywords from Contributors
transformer optimize archiving measur language-model compose observation conversion generic animals
Last synced: about 9 hours ago
JSON representation
Repository metadata
Notebooks to upload/download marine footage, connect to a citizen science project, train machine learning models and publish marine biological observations.
- Host: GitHub
- URL: https://github.com/ocean-data-factory-sweden/kso
- Owner: ocean-data-factory-sweden
- License: gpl-3.0
- Created: 2021-07-01T14:47:48.000Z (almost 5 years ago)
- Default Branch: dev
- Last Pushed: 2026-04-02T14:58:19.000Z (6 days ago)
- Last Synced: 2026-04-06T00:09:35.930Z (2 days ago)
- Topics: citizen-science, deep-learning, marine-protected-areas, object-detection
- Language: Python
- Homepage:
- Size: 16.1 MB
- Stars: 8
- Watchers: 2
- Forks: 17
- Open Issues: 16
- Releases: 2
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
README.md
KSO System
The KSO System is an open-source machine learning framework for underwater video analysis, developed from the Koster Seafloor Observatory research initiative and the Swedish Platform for Subsea Image Analysis (SUBSIM). It is optimized for GPU-accelerated HPC environments, particularly LUMI, and integrates with MLflow for experiment tracking.
π New to KSO? Each notebook contains step-by-step instructions with clearly marked EDIT THIS cells. This README provides an overview β the notebooks will guide you through each stage.
System Overview

Quick Start
1. Clone the repository
git clone -b dev https://github.com/ocean-data-factory-sweden/kso.git
cd kso
2. Choose your environment
For LUMI (recommended): see docs/LUMI_SETUP.md for container setup and Jupyter session configuration.
For local development:
pip install -r requirements.txt
jupyter lab
3. Run the notebooks
Use the table below to choose the first stage that matches what you already have.
| You have⦠| Start at |
|---|---|
| Raw footage only OR Already annotated images in Biigle but need a dataset | 00 Data Preparation |
A YOLO dataset (data.yaml + train/val/test splits) |
01 Project Setup |
| A trained model (or weights) and you want to fine-tune on a dataset | 02 Training & Eval |
| A trained model and you want to run inference on images/video | 03 Inference + 04 Analysis |
| A validated model that you want to publish along with your dataset | 05 Publish Model |
Note: Notebooks 00, 03, 04, and 05 are still in development. For a working path today, see the Standalone Notebooks section below.
Official Pipeline (00β05)
| # | Notebook | Description | Status |
|---|---|---|---|
| 00 | 00_Data_Preparation.ipynb | Transfer footage to LUMI (optional), extract frames, build your image set for annotation in Biigle, convert annotation CSV to YOLO format. Skip if you already have a YOLO dataset. | π In development |
| 01 | 01_Project_Setup.ipynb | Create a KSO2 project (.project.yaml), attach your YOLO dataset, configure tracking, and optionally run offline augmentation. |
β Stable |
| 02 | 02_Train_and_Eval_Models.ipynb | Train or fine-tune a YOLO model, track runs with MLflow, and evaluate on the test set. | β Stable |
| 03 | 03_Inference.ipynb | Run inference or batch inference on new images or video; export detections (CSV + annotated media). | π In development |
| 04 | 04_Analysis.ipynb | Summary statistics, maxN, per-class summaries, and visualizations. | π§ Planned |
| 05 | 05_Publish_Models.ipynb | Package models and metadata; publish to Zenodo or Researchdata.se. | π§ Planned |
Standalone Notebooks
While the official pipeline is being finalized, these notebooks provide a working path for new users β covering dataset preparation in Biigle, and model training end-to-end.
| Notebook | Path | Covers |
|---|---|---|
| Biigle_to_YOLO.ipynb | notebooks/setup/Biigle_to_YOLO.ipynb |
Biigle CSV β YOLO conversion (data preparation for our Biigle users) |
| Train_models.ipynb | notebooks/analyse/Train_models.ipynb |
YOLO model training and fine-tuning using Ultralytics |
Available YOLO models
The training notebook supports several Ultralytics model families. See the notebook itself for full model tables.
Supported families include YOLOv8, YOLOv9, YOLOv10, and YOLO11. Each offers sizes from nano (n) through xlarge (x).
Practical guidance:
- Small datasets (~100β250 images): prefer nano or small β larger models will overfit.
- Medium datasets (~250β750 images): use medium for a good balance.
- Large datasets (750+ images): consider large or xlarge if resources allow.
Installation
System requirements
- Minimum: Python 3.12, 16 GB RAM, β10 GB free disk space.
- Recommended: CUDA/ROCm-capable GPU (β₯8 GB VRAM) and access to an HPC system (e.g. LUMI).
Option 1 β LUMI (recommended)
KSO is primarily developed and tested on the LUMI supercomputer, running via a Singularity/Apptainer container on GPU nodes. If you're a first time user:
- head to
contrib/lumi/and checkcontrib/lumi/README.mdto get started.
Option 2 β Other HPC systems
git clone -b dev https://github.com/ocean-data-factory-sweden/kso.git
cd kso
# Follow your HPC's recommended way to launch Jupyter or batch jobs
Use your center's standard GPU modules or containers, and bind project and scratch storage as appropriate.
Option 3 β Local development
For local use without HPC access. Note that training without a GPU will be slow, and smaller models with lower batch sizes are recommended.
Docker:
β οΈ Note: The Docker image may not be up to date with the current codebase. Verify it works for your use case before relying on it.
docker pull ghcr.io/ocean-data-factory-sweden/kso:dev
docker run --gpus all -it -p 8888:8888 ghcr.io/ocean-data-factory-sweden/kso:dev
# Then open http://localhost:8888 in your browser
pip:
git clone -b dev https://github.com/ocean-data-factory-sweden/kso.git
cd kso
pip install -r requirements.txt
jupyter lab
Developer Instructions
We welcome contributions!
- Work from the
devbranch; create feature branches offdev. - Format Python code with Black:
black filename.py - Use Conventional Commits for messages:
feat:,fix:,docs:,refactor:,test:. - Keep commit history clean and logical (squash where appropriate) and rebase onto
dev(never merge). - Open a Pull Request targeting
devand request at least 2 reviewers.
Citation
If this code or its trained models contribute to your research, please cite:
Anton V, Germishuys J, BergstrΓΆm P, Lindegarth M, Obst M (2021). An open-source, citizen science and machine learning approach to analyse subsea movies. Biodiversity Data Journal 9: e60548. https://doi.org/10.3897/BDJ.9.e60548
Support & Contact
- Website: https://subsim.se
- Issues: GitHub Issues
- Contact: matthias.obst(at)marine.gu.se
We are always excited to collaborate with marine scientists. Feel free to reach out with questions or ideas!
Legacy Notebooks (Zooniverse workflow)
These notebooks implement the original Zooniverse citizen-science pipeline and are maintained for existing projects. For new work, use the main workflow above.
License
SUBSIM/KSO is released under the GPL-3.0 license. See LICENSE.txt for details.
Owner metadata
- Name: Ocean Data Factory Sweden
- Login: ocean-data-factory-sweden
- Email: torsten.linders@gu.se
- Kind: organization
- Description:
- Website:
- Location:
- Twitter:
- Company:
- Icon url: https://avatars.githubusercontent.com/u/54248548?v=4
- Repositories: 4
- Last ynced at: 2023-03-03T19:53:11.188Z
- Profile URL: https://github.com/ocean-data-factory-sweden
GitHub Events
Total
- Delete event: 7
- Member event: 2
- Pull request event: 21
- Fork event: 3
- Issues event: 77
- Watch event: 3
- Issue comment event: 52
- Push event: 176
- Pull request review comment event: 52
- Pull request review event: 30
- Create event: 14
Last Year
- Delete event: 7
- Pull request event: 17
- Fork event: 2
- Issues event: 58
- Watch event: 2
- Issue comment event: 27
- Push event: 171
- Pull request review comment event: 52
- Pull request review event: 30
- Create event: 9
Committers metadata
Last synced: 7 days ago
Total Commits: 985
Total Committers: 21
Avg Commits per committer: 46.905
Development Distribution Score (DDS): 0.373
Commits in past year: 234
Committers in past year: 16
Avg Commits per committer in past year: 14.625
Development Distribution Score (DDS) in past year: 0.688
| Name | Commits | |
|---|---|---|
| Jurie Germishuys | j****s@c****e | 618 |
| Diewertje11 | d****r@c****e | 100 |
| Victor | 5****e | 91 |
| Ghaith | g****h@G****l | 62 |
| Tuomas Rossi | t****i@c****i | 31 |
| Louis Fiorina | 1****f | 14 |
| Ghaith | g****h@v****z | 11 |
| Pablo Correa GΓ³mez | p****z@c****e | 10 |
| Jannes | 3****g | 10 |
| nithador | t****c@v****z | 8 |
| PilarNavarro | p****r@h****s | 5 |
| Ghaith | g****h@v****z | 5 |
| Ghaith | g****h@w****z | 4 |
| Ghaith | g****h@w****z | 3 |
| Ghaith | g****h@v****z | 3 |
| Ghaith | g****h@v****z | 3 |
| Jurie Germishuys | j****g@a****e | 2 |
| Ghaith | g****h@v****z | 2 |
| Ghaith | g****h@v****z | 1 |
| Ghaith | g****h@v****z | 1 |
| dependabot[bot] | 4****] | 1 |
Committer domains:
- combine.se: 3
- vpns447.vsb.cz: 1
- vpns122.vsb.cz: 1
- vpns479.vsb.cz: 1
- alvis1.int.private: 1
- vpns162.vsb.cz: 1
- vpns32.vsb.cz: 1
- w113-176.vsb.cz: 1
- w116-81.vsb.cz: 1
- vpnp119.vsb.cz: 1
- hotmail.es: 1
- vsb.cz: 1
- vpns378.vsb.cz: 1
- csc.fi: 1
Issue and Pull Request metadata
Last synced: 11 days ago
Total issues: 190
Total pull requests: 151
Average time to close issues: about 1 month
Average time to close pull requests: 11 days
Total issue authors: 12
Total pull request authors: 10
Average comments per issue: 1.24
Average comments per pull request: 1.91
Merged pull request: 85
Bot issues: 0
Bot pull requests: 25
Past year issues: 13
Past year pull requests: 22
Past year average time to close issues: about 1 month
Past year average time to close pull requests: about 2 months
Past year issue authors: 6
Past year pull request authors: 7
Past year average comments per issue: 0.23
Past year average comments per pull request: 3.32
Past year merged pull request: 10
Past year bot issues: 0
Past year bot pull requests: 0
Top Issue Authors
- Bergylta (68)
- victor-wildlife (43)
- jannesgg (37)
- Diewertje11 (21)
- donkyjohn (5)
- ShrimpFather7 (3)
- trossi (3)
- Nithador (3)
- KalindiFonda (2)
- pabloyoyoista (2)
- pilarnavarro (2)
- XhD98 (1)
Top Pull Request Authors
- victor-wildlife (58)
- Diewertje11 (25)
- dependabot[bot] (25)
- jannesgg (19)
- trossi (8)
- pilarnavarro (5)
- louisrf (5)
- GhaithChaabane (3)
- pabloyoyoista (2)
- Nithador (1)
Top Issue Labels
- bug (87)
- enhancement (35)
- Development (18)
- good first issue (10)
- Enhancement (4)
- Spyfish (3)
- Documentation (2)
- Support (2)
- question (1)
- dependencies (1)
- documentation (1)
- help wanted (1)
Top Pull Request Labels
- Enhancement (2)
- bug (1)
- Bug (1)
- Documentation (1)
Dependencies
- PIMS ==0.6.1
- PyYAML >=5.3.1
- av ==8.1.0
- boto3 ==1.26.64
- dataclass-csv ==1.4.0
- easydict ==1.9.0
- fastapi ==0.73.0
- ffmpeg-python ==0.2.0
- gdown ==3.13.0
- imagesize ==1.4.1
- ipyfilechooser ==0.4.4
- itables ==0.3.0
- jupyter ==1.0.0
- jupyter_bbox_widget ==0.5.0
- matplotlib >=3.2.2
- moviepy ==1.0.3
- natsort ==8.1.0
- numpy >=1.18.5
- opencv-contrib-python *
- opencv-python ==4.6.0.66
- opencv-python-headless *
- openpyxl ==3.1.0
- pandas ==1.1.4
- panoptes-client ==1.5.0
- protobuf ==3.15.8
- pyopenssl >=23
- python-magic ==0.4.24
- python-multipart ==0.0.5
- scipy >=1.4.1
- scp ==0.14.1
- seaborn >=0.11.0
- split-folders ==0.5.1
- tensorboard >=2.4.1
- thop *
- tqdm >=4.41.0
- uvicorn ==0.17.2
- wandb *
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- psf/black stable composite
- ${BASE_IMAGE} latest build
- ${FFMPEG_BASE_IMAGE} latest build
- torch ==2.8.0
- torchvision ==0.23.0
- torch ==2.8.0
- torchvision ==0.23.0
- torch ==2.8.0
- torchvision ==0.23.0
- ${BASE_IMAGE} latest build
- ./.github/actions/increase-docker-space * composite
- actions/checkout v6 composite
- docker/build-push-action v6 composite
- docker/login-action v3 composite
- docker/setup-buildx-action v3 composite
- docker/setup-qemu-action v3 composite
Score: 6.222576268071369