KSO

The Koster Seafloor Observatory is an open-source, citizen science and machine learning approach to analyse subsea movies.
https://github.com/ocean-data-factory-sweden/kso

Category: Biosphere
Sub Category: Marine Life and Fishery

Keywords

citizen-science deep-learning marine-protected-areas object-detection

Keywords from Contributors

transforms archiving measur generic optimize observation conversion compose projection animals

Last synced: about 7 hours ago
JSON representation

Repository metadata

Notebooks to upload/download marine footage, connect to a citizen science project, train machine learning models and publish marine biological observations.

Host: GitHub
URL: https://github.com/ocean-data-factory-sweden/kso
Owner: ocean-data-factory-sweden
License: gpl-3.0
Created: 2021-07-01T14:47:48.000Z (about 4 years ago)
Default Branch: dev
Last Pushed: 2025-05-16T13:55:19.000Z (about 2 months ago)
Last Synced: 2025-05-17T00:03:41.846Z (about 2 months ago)
Topics: citizen-science, deep-learning, marine-protected-areas, object-detection
Language: Python
Homepage:
Size: 14.8 MB
Stars: 7
Watchers: 2
Forks: 12
Open Issues: 60
Releases: 1
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE

KSO System

The Koster Seafloor Observatory is an open-source, citizen science and machine learning approach to analyse subsea movies.

KSO overview

The KSO system has been developed to:

move and process underwater footage and its associated data (e.g. location, date, sampling device).
make this data available to citizen scientists in Zooniverse to annotate the data.
train and evaluate machine learning models (customise Yolov5 or Yolov8 models).

koster_info_diag

The system is built around a series of easy-to-use Jupyter Notebooks. Each notebook allows users to perform a specific task of the system (e.g. upload footage to the citizen science platform or analyse the classified data).

Users can run these notebooks via Google Colab (by clicking on the Colab links in the table below), locally or on a high-performance computing (HPC) environment.

Notebooks

Our notebooks are modular and grouped into four main task categories; Set up, Classify, Analyse and Publish.

Task	Notebook	Description
Set up	Check_metadata	Check format and contents of footage and sites, media and species csv files
Classify	Upload_subjects_to_Zooniverse	Prepare original footage and upload short clips to Zooniverse, extract frames of interest from the original footage and upload them to Zooniverse
Classify	Process_classifications	Pull and process up-to-date classifications from Zooniverse
Analyse	Train_models	Prepare the training and test data, set model parameters and train models
Analyse	Evaluate_models	Use ecologically relevant metrics to test the models
Publish	Publish_models	Publish the model to a public repository
Publish	Publish_observations	Automatically classify new footage and export observations to GBIF

Local Installation

Docker Installation

Requirements

Docker

Pull KSO Docker image

Bash
docker pull ghcr.io/ocean-data-factory-sweden/kso:dev

Conda Installation

Requirements

Download this repository

Clone this repository using

git clone https://github.com/ocean-data-factory-sweden/kso.git

Prepare your system

Depending on your system (Windows/Linux/MacOS), you might need to install some extra tools. If this is the case, you will get a message about what you need to install in the next steps.
For example, Microsoft Build Tools C++ with a version higher than 14.0 is required for Windows systems.

Set up the environment with Conda

Open the Anaconda Prompt
Navigate to the folder where you have cloned the repository or unzipped the manually downloaded repository. Then go into the kso folder.

cd kso

Create an Anaconda environment with Python 3.8. Remember to change the name env.

conda create -n <name env> python=3.8

Enter the environment:

conda activate <name env>

Specify your GPU details.

5a. Find out the pytorch installation you need. Navigate to the system options (example below) and select your device/platform details.

5b. Add the recommended command to the KSO's gpu_requirements_user.txt file.

Install all the requirements:

pip install -r requirements.txt -r gpu_requirements_user.txt

Cloudina

Cloudina is a hosted version of KSO (powered by JupyterHub) on NAISS Science Cloud. It allows users to scale and automate larger workflows using a powerful processing backend. This is currently an invitation-only service. To access the platform, please contact jurie.germishuys[at]combine.se.

The current portals are accessible as:

Console (object storage) - storage
Album (JupyterHub) - notebooks
Vendor (MLFlow) - mlflow

Starting a new project

To start a new project you will need to:

Create initial information for the database: Input the information about the underwater footage files, sites and species of interest. You can use a template of the csv files and move the directory to the "db_starter" folder.
Link your footage to the database: You will need files of underwater footage to run this system. You can download some samples and move them to db_starter. You can also store your own files and specify their directory in the notebooks.

Please remember the format of the underwater media is standardised (typically .mp4 or .jpg) and the associated metadata captured in three CSV files (“movies”, “sites” and “species”) should follow the Darwin Core standards (DwC).

Developer instructions

If you would like to expand and improve the KSO capabilities, please follow the instructions above to set the project up on your local computer.

When you add any changes, please create your branch on top of the current 'dev' branch. Before submitting a Merge Request, please:

Run Black on the code you have edited

black filename

Clean up your commit history on your branch, so that every commit represents a logical change. (so squash and edit commits so that it is understandable for others)
For the commit messages, we ask that you please follow the conventional commits guidelines (table below) to facilitate code sharing. Also, please describe the logic behind the commit in the body of the message.
Commit types

Commit Type	Title	Description	Emoji
`feat`	Features	A new feature	✨
`fix`	Bug Fixes	A bug Fix	🐛
`docs`	Documentation	Documentation only changes	📚
`style`	Styles	Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc)	💎
`refactor`	Code Refactoring	A code change that neither fixes a bug nor adds a feature	📦
`perf`	Performance Improvements	A code change that improves performance	🚀
`test`	Tests	Adding missing tests or correcting existing tests	🚨
`build`	Builds	Changes that affect the build system or external dependencies (example scopes: gulp, broccoli, npm)	🛠
`ci`	Continuous Integrations	Changes to our CI configuration files and scripts (example scopes: Travis, Circle, BrowserStack, SauceLabs)	⚙️
`chore`	Chores	Other changes that don't modify src or test files	♻️
`revert`	Reverts	Reverts a previous commit	🗑

Rebase on top of dev. (never merge, only use rebase)
Submit a Pull Request and link at least 2 reviewers

Citation

If you use this code or its models in your research, please cite:

Anton V, Germishuys J, Bergström P, Lindegarth M, Obst M (2021) An open-source, citizen science and machine learning approach to analyse subsea movies. Biodiversity Data Journal 9: e60548. https://doi.org/10.3897/BDJ.9.e60548

Collaborations/Questions

You can find out more about the project at https://subsim.se.

We are always excited to collaborate and help other marine scientists. Please feel free to contact us (matthias.obst(at)marine.gu.se) with your questions.

Troubleshooting

If you experience issues importing panoptes_client in Windows, it is a known issue with the libmagic package. Pmason's suggestions in the Talk board of Zooniverse can be useful for troubleshooting it.

Owner metadata

Name: Ocean Data Factory Sweden
Login: ocean-data-factory-sweden
Email: [email protected]
Kind: organization
Description:
Website:
Location:
Twitter:
Company:
Icon url: https://avatars.githubusercontent.com/u/54248548?v=4
Repositories: 4
Last ynced at: 2023-03-03T19:53:11.188Z
Profile URL: https://github.com/ocean-data-factory-sweden

GitHub Events

Total

Create event: 9
Issues event: 21
Watch event: 3
Delete event: 1
Member event: 2
Issue comment event: 28
Push event: 28
Pull request review comment event: 9
Pull request review event: 9
Pull request event: 8
Fork event: 1

Last Year

Create event: 9
Issues event: 21
Watch event: 3
Delete event: 1
Member event: 2
Issue comment event: 28
Push event: 28
Pull request review comment event: 9
Pull request review event: 9
Pull request event: 8
Fork event: 1

Committers metadata

Last synced: 1 day ago

Total Commits: 797
Total Committers: 8
Avg Commits per committer: 99.625
Development Distribution Score (DDS): 0.225

Commits in past year: 173
Committers in past year: 5
Avg Commits per committer in past year: 34.6
Development Distribution Score (DDS) in past year: 0.335

Name	Email	Commits
Jurie Germishuys	j**s@c**e	618
Victor	5****e	88
Diewertje11	d**r@c**e	63
Jannes	3****g	10
Pablo Correa Gómez	p**z@c**e	10
PilarNavarro	p**r@h**s	5
Jurie Germishuys	j**g@a**e	2
dependabot[bot]	4****]	1

Committer domains:

Issue and Pull Request metadata

Last synced: about 8 hours ago

Total issues: 194
Total pull requests: 119
Average time to close issues: about 1 month
Average time to close pull requests: 7 days
Total issue authors: 10
Total pull request authors: 7
Average comments per issue: 1.48
Average comments per pull request: 1.65
Merged pull request: 68
Bot issues: 0
Bot pull requests: 26

Past year issues: 37
Past year pull requests: 7
Past year average time to close issues: 20 days
Past year average time to close pull requests: 17 days
Past year issue authors: 6
Past year pull request authors: 4
Past year average comments per issue: 0.73
Past year average comments per pull request: 1.57
Past year merged pull request: 3
Past year bot issues: 0
Past year bot pull requests: 0

More stats: https://issues.ecosyste.ms/repositories/lookup?url=https://github.com/ocean-data-factory-sweden/kso

Top Issue Authors

Bergylta (72)
jannesgg (45)
victor-wildlife (42)
Diewertje11 (20)
donkyjohn (5)
ShrimpFather7 (3)
KalindiFonda (2)
pabloyoyoista (2)
pilarnavarro (2)
XhD98 (1)

Top Pull Request Authors

victor-wildlife (48)
dependabot[bot] (26)
Diewertje11 (23)
jannesgg (15)
pilarnavarro (5)
trossi (1)
pabloyoyoista (1)

Top Issue Labels

bug (96)
enhancement (42)
Development (24)
good first issue (11)
Support (10)
Spyfish (4)
Template (4)
help wanted (3)
GU (2)
question (1)
dependencies (1)
documentation (1)
Research (1)
test (1)

Top Pull Request Labels

bug (1)

Dependencies

.github/workflows/linting.yml actions

actions/checkout v3 composite
psf/black stable composite

Dockerfile docker

nvcr.io/nvidia/pytorch 21.05-py3 build

requirements.txt pypi

PIMS ==0.6.1
PyYAML >=5.3.1
av ==8.1.0
boto3 ==1.26.64
dataclass-csv ==1.4.0
easydict ==1.9.0
fastapi ==0.73.0
ffmpeg-python ==0.2.0
gdown ==3.13.0
imagesize ==1.4.1
ipyfilechooser ==0.4.4
itables ==0.3.0
jupyter ==1.0.0
jupyter_bbox_widget ==0.5.0
matplotlib >=3.2.2
moviepy ==1.0.3
natsort ==8.1.0
numpy >=1.18.5
opencv-contrib-python *
opencv-python ==4.6.0.66
opencv-python-headless *
openpyxl ==3.1.0
pandas ==1.1.4
panoptes-client ==1.5.0
protobuf ==3.15.8
pyopenssl >=23
python-magic ==0.4.24
python-multipart ==0.0.5
scipy >=1.4.1
scp ==0.14.1
seaborn >=0.11.0
split-folders ==0.5.1
tensorboard >=2.4.1
thop *
tqdm >=4.41.0
uvicorn ==0.17.2
wandb *

.github/workflows/build-and-test-container.yml actions

actions/checkout v3 composite
docker/login-action v2 composite
tj-actions/changed-files v37 composite

.github/workflows/detect-unused-code.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

gpu_requirements_user.txt pypi

torch *
torchaudio *
torchvision *

pyproject.toml pypi

boto3 1.26.64
csv-diff ^1.1
dataclass-csv 1.4.0
ffmpeg 1.4
ffmpeg-python 0.2.0
folium 0.12.1
ftfy 6.1.1
gdown 4.6.4
imagesize 1.4.1
ipyfilechooser 0.4.4
ipysheet 0.4.4
ipython 8.11.0
ipywidgets 7.7.2
jupyter-bbox-widget 0.5.0
natsort 8.1.0
opencv-python 4.5.4.60
pandas 1.5.3
panoptes-client 1.6.0
pillow 9.4.0
pims 0.6.1
python ^3.8
pyyaml 6.0
requests 2.28.2
scikit-learn 1.2.2
scp 0.14.1
split-folders 0.5.1
torch 1.8.0
tqdm 4.64.1
wandb 0.13.2

requirements_cdn.txt pypi

PIMS ==0.6.1
PyYAML ==6.0
SQLAlchemy ==2.0.20
av ==8.1.0
boto3 ==1.26.64
boxmot ==10.0.43
csv-diff ==1.1
dataclass-csv ==1.4.0
ffmpeg ==1.4
ffmpeg-python ==0.2.0
fiftyone ==0.20.0
fiftyone_db ==0.4.0
folium ==0.12.1
ftfy ==6.1.1
gdown ==4.7.1
imagesize ==1.4.1
ipyfilechooser ==0.4.4
ipysheet ==0.7.0
ipython ==8.11.0
ipywidgets ==8.1.1
jupyter ==1.0.0
jupyter-bbox-widget ==0.5.0
jupyter_contrib_nbextensions ==0.7.0
mlflow ==2.7.1
more-itertools ==9.1.0
moviepy ==1.0.3
natsort ==8.1.0
notebook ==7.0.4
numpy >=1.22.0,<1.24.1
opencv-contrib-python ==4.6.0.66
opencv-python ==4.6.0.66
opencv-python-headless ==4.6.0.66
pandas ==1.4.0
scikit_learn ==1.3.0
scp ==0.14.1
setuptools ==67.6.1
split-folders ==0.5.1
tqdm ==4.64.1
traitlets ==5.9.0
ultralytics ==8.0.200
wandb ==0.15.11
yolov5 ==7.0.13

requirements_colab.txt pypi

av ==10.0.0
boto3 ==1.28.80
csv_diff ==1.1
dataclass_csv ==1.4.0
ffmpeg ==1.4
fiftyone ==0.22.3
ipysheet ==0.7.0
jupyter_bbox_widget ==0.5.0
lida ==0.0.10
mlflow ==2.8.0
pims ==0.6.1
torch ==2.1.1
typing-extensions ==4.5.0
ultralytics ==8.0.200
wandb ==0.16.0
yolov5 ==7.0.13

Score: 6.284134161070801

KSO

Keywords

Keywords from Contributors

Repository metadata

README.md

KSO System

KSO overview

Notebooks

Local Installation

Docker Installation

Requirements

Pull KSO Docker image

Conda Installation

Requirements

Download this repository

Prepare your system

Set up the environment with Conda

Cloudina

Starting a new project

Developer instructions

Commit types

Citation

Collaborations/Questions

Troubleshooting

Owner metadata

GitHub Events

Total

Last Year

Committers metadata

Committer domains:

Issue and Pull Request metadata

Top Issue Authors

Top Pull Request Authors

Top Issue Labels

Top Pull Request Labels

Dependencies