FishGlob_data

An integrated database of fish biodiversity sampled with scientific bottom trawl survey.
https://github.com/fishglob/FishGlob_data

Category: Biosphere
Sub Category: Marine Life and Fishery

Last synced: about 7 hours ago
JSON representation

Repository metadata

Database and methods related to the manuscript "An integrated database of fish biodiversity sampled with scientific bottom trawl surveys"

Host: GitHub
URL: https://github.com/fishglob/FishGlob_data
Owner: fishglob
License: cc-by-4.0
Created: 2022-12-19T19:59:00.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2026-01-22T20:14:37.000Z (about 1 month ago)
Last Synced: 2026-02-07T09:56:37.115Z (25 days ago)
Language: R
Homepage:
Size: 3.86 GB
Stars: 26
Watchers: 5
Forks: 9
Open Issues: 1
Releases: 5
Metadata Files:
- Readme: README.md
- Changelog: NEWS.md
- License: LICENSE

FishGlob_data

This repository contains the FishGlob database, including the methods to load, clean, and process the public bottom trawl surveys in it. The database is described in the manuscript, "An integrated database of fish biodiversity sampled with scientific bottom trawl surveys" by Aurore A. Maureaud, Juliano Palacios-Abrantes, Zoë Kitchel, Laura Mannocci, Malin L. Pinsky, Alexa Fredston, Esther Beukhof, Daniel L. Forrest, Romain Frelat, Maria L.D. Palomares, Laurene Pecuchet, James T. Thorson, P. Daniël van Denderen, and Bastien Mérigot.

This database is a product of the CESAB working group, FishGlob: Fish biodiversity under global change – a worldwide assessment from scientific trawl surveys.

Main contacts: Aurore A. Maureaud, Juliano Palacios-Abrantes, Zoë J. Kitchel, and Malin L. Pinsky

Anyone interested in reusing this data or its outputs should read this readme as well as our Data Disclaimer in full.

This work is licensed under a
Creative Commons Attribution 4.0 International License.

Structure of the repository

cleaning_codes includes all scripts to process and perform quality control on the trawl surveys.
data_descriptor_figures contains the R script to construct figures 2-4 for the data descriptor manuscript.
functions contains useful functions used in other scripts
length_weight contains the length-weight relationships for surveys where weights have to be calculated from abundance at length data (including NOR-BTS and DATRAS)
metadata_docs has a README with notes about each survey. This is a place to document changes in survey methods, quirks, etc. It is a growing list. If you have information to add, please open an Issue.
outputs contains all survey data processed .RData files and flagging outputs
QAQC contains the additional QAQC performed on surveys that required supplementary checks (DATRAS-sourced surveys)
standard_formats includes definitions of file formats in the FishGlob database, including survey ID codes.
standardization_steps contains the R codes to run a full survey standardization and a cross-survey summary of flagging methods
summary contains the quality check plots for each survey

Survey data processing steps

Data processing and cleaning is done on a per survey basis unless formats are similar across a group of surveys. The current repository can process 29 scientific bottom-trawl surveys, according to the following steps.

Steps

Merge the data files for one survey
Clean & homogenize column names following the format described in standard_formats/fishglob_data_columns.xlsx
Create missing columns and standardize units using the standard format standard_formats/fishglob_data_columns.xlsx
Integrate the cleaned taxonomy by applying the function clean_taxa() and apply expert knowledge on taxonomic treatments
Perform quality checks, including the output in the summary folder and specific QAQC for other surveys detailed in the QAQC folder

Survey data standardization and flags

Data standardization and flags are done on a per survey basis and per survey_unit basis (integrating seasons and quarters). Flags are performed both on the temporal occurrence of taxa and the spatio-temporal sampling footprint according to the following steps.

Steps

Taxonomic quality control: run flag_spp() for each survey region
Apply methods to identify a standard spatial footprint through time for each survey-season/quarter (the survey_unit column). Use the functions apply_trimming_per_survey_unit_method1() and apply_trimming_per_survey_unit_method2()
Display and integrate results in the summary files

Final data products

Options
Users can either use the single survey data products in outputs/Cleaned_data/ and work with survey .RData files including flags or not (inclusion of flags is specified by XX_std_clean.RData), or generate their own compiled version of the data by running the cleaning_codes/merge.R which will write local versions of the database in outputs/Compiled_data/

Author contributions

Contributors to code

Cleaning taxonomy: Juliano Palacios-Abrantes
Cleaning surveys: Juliano Palacios-Abrantes, Aurore Maureaud, Zoë Kitchel, Dan Forrest, Daniël van Denderen, Laurene Pecuchet, Esther Beukhof
Summary of surveys: Juliano Palacios-Abrantes, Aurore Maureaud, Zoë Kitchel, Laura Mannocci
Merge surveys: Aurore Maureaud
Standardize surveys: Laura Mannocci, Malin Pinsky, Aurore Maureaud, Zoë Kitchel, Alexa Fredston
QAQC of DATRAS surveys: Aurore Maureaud, Daniël van Denderen, Esther Beukhof, Laurene Pecuchet
QAQC of the Barents Sea surveys: Laurene Pecuchet
QAQC of North American surveys: Zoë Kitchel, Malin Pinsky, Daniel Forrest

Credit and citation

Our full citation policy is described in the Fishglob_data disclaimer. Briefly, users should cite Maureaud et al. 2021, Maureaud et al. 2024, and relevant primary SBTS sources referenced in the FISHGLOB data files and source data tables of the two Maureaud et al. papers. Users integrating multiple surveys are encouraged to cite additional studies on data integration.

⚠️ Important updates ⚠️

5/06/2024: A warning about CSVs
Datasets are available for download in outputs/Cleaned_data/ as .Rdata files. We do not recommend saving FishGlob data in .csv format. For at least some surveys, the haul_id column is composed of a long string of numerics, which is incorrectly rounded if loaded from a .csv programmatically in R (with read_csv() or read.csv()). As documented in issue #49, this leads to errors in the haul_id column, and may occur regardless of the "class" assigned to this column. The most robust way to prevent this error is to write to / read from other data types such as .Rdata or .rds. Packages exist for users to import these into Python and other programming languages.

23/11/2023: FishGlob_data v2.0

05/09/2023: Norwegian survey is erroneous and will be replaced with a Barents Sea centered survey over 2004-onwards which will change the spatio-temporal coverage of the region (coordinated by Laurene Pecuchet with IMR), see issue #29

Owner metadata

Name: FISHGLOB
Login: fishglob
Email: fishglobconsortium@gmail.com
Kind: organization
Description:
Website: fishglob.sites.ucsc.edu
Location:
Twitter:
Company:
Icon url: https://avatars.githubusercontent.com/u/226361955?v=4
Repositories: 1
Last ynced at: 2025-08-20T11:47:58.393Z
Profile URL: https://github.com/fishglob

GitHub Events

Total

Pull request event: 6
Issues event: 5
Issue comment event: 6
Push event: 12
Pull request review event: 2
Create event: 7

Last Year

Pull request event: 6
Issues event: 5
Issue comment event: 6
Push event: 12
Pull request review event: 2
Create event: 7

Committers metadata

Last synced: 17 days ago

Total Commits: 280
Total Committers: 12
Avg Commits per committer: 23.333
Development Distribution Score (DDS): 0.475

Commits in past year: 45
Committers in past year: 7
Avg Commits per committer in past year: 6.429
Development Distribution Score (DDS) in past year: 0.378

Name	Email	Commits
Aurore Maureaud	a**a@g**m	147
Juliano Palacios Abrantes	j**s@o**a	63
Aurore Maureaud	4**a@u**m	28
Malin Pinsky	m**y@u**u	11
Zoë Kitchel	3**l@u**m	11
zoekitchel	z**l@g**m	7
Alexa Fredston	a**n@g**m	4
Laurene Pecuchet	l**t@g**m	3
Alexa Fredston	a**n@g**m	2
LaurenePecuchet	7**t@u**m	2
Esther Beukhof	e**b@a**k	1
Malin Pinsky	m**y@g**m	1

Committer domains:

Issue and Pull Request metadata

Last synced: about 1 month ago

Total issues: 3
Total pull requests: 6
Average time to close issues: about 1 month
Average time to close pull requests: 15 days
Total issue authors: 2
Total pull request authors: 4
Average comments per issue: 2.0
Average comments per pull request: 0.5
Merged pull request: 3
Bot issues: 0
Bot pull requests: 0

Past year issues: 3
Past year pull requests: 6
Past year average time to close issues: about 1 month
Past year average time to close pull requests: 15 days
Past year issue authors: 2
Past year pull request authors: 4
Past year average comments per issue: 2.0
Past year average comments per pull request: 0.5
Past year merged pull request: 3
Past year bot issues: 0
Past year bot pull requests: 0

More stats: https://issues.ecosyste.ms/repositories/lookup?url=https://github.com/fishglob/FishGlob_data

Top Issue Authors

jepa (2)
mpinsky (1)

Top Pull Request Authors

afredston (2)
zoekitchel (2)
mpinsky (1)
jepa (1)

Top Issue Labels

documentation (1)
GMEX (1)
function (1)

Top Pull Request Labels

function (1)

Score: 5.780743515792329

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Sustainable Technology

FishGlob_data

Repository metadata

README.md

FishGlob_data

Structure of the repository

Survey data processing steps

Survey data standardization and flags

Final data products

Author contributions

Credit and citation

⚠️ Important updates ⚠️

Owner metadata

GitHub Events

Total

Last Year

Committers metadata

Committer domains:

Issue and Pull Request metadata

Top Issue Authors

Top Pull Request Authors

Top Issue Labels

Top Pull Request Labels