{"id":46070,"name":"FishGlob_data","description":"An integrated database of fish biodiversity sampled with scientific bottom trawl survey.","url":"https://github.com/fishglob/FishGlob_data","last_synced_at":"2026-05-20T06:30:24.464Z","repository":{"id":100893964,"uuid":"580133169","full_name":"fishglob/FishGlob_data","owner":"fishglob","description":"Database and methods related to the manuscript \"An integrated database of fish biodiversity sampled with scientific bottom trawl surveys\"","archived":false,"fork":false,"pushed_at":"2026-03-19T22:31:40.000Z","size":4317509,"stargazers_count":27,"open_issues_count":3,"forks_count":9,"subscribers_count":6,"default_branch":"main","last_synced_at":"2026-04-23T01:04:45.312Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"cc-by-4.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fishglob.png","metadata":{"files":{"readme":"README.md","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2022-12-19T19:59:00.000Z","updated_at":"2026-03-19T22:31:53.000Z","dependencies_parsed_at":"2024-01-14T19:22:50.722Z","dependency_job_id":"550a9f27-04ba-4ab0-a4de-ad6863a4eb22","html_url":"https://github.com/fishglob/FishGlob_data","commit_stats":{"total_commits":235,"total_committers":10,"mean_commits":23.5,"dds":"0.37446808510638296","last_synced_commit":"4423296c121183010e903ed782dadcd7283b408a"},"previous_names":["fishglob/fishglob_data","aquaauma/fishglob_data"],"tags_count":8,"template":false,"template_full_name":null,"purl":"pkg:github/fishglob/FishGlob_data","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fishglob%2FFishGlob_data","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fishglob%2FFishGlob_data/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fishglob%2FFishGlob_data/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fishglob%2FFishGlob_data/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fishglob","download_url":"https://codeload.github.com/fishglob/FishGlob_data/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fishglob%2FFishGlob_data/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32449172,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-29T22:27:22.272Z","status":"ssl_error","status_checked_at":"2026-04-29T22:10:49.234Z","response_time":110,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"owner":{"login":"fishglob","name":"FISHGLOB","uuid":"226361955","kind":"organization","description":"","email":"fishglobconsortium@gmail.com","website":"fishglob.sites.ucsc.edu","location":null,"twitter":null,"company":null,"icon_url":"https://avatars.githubusercontent.com/u/226361955?v=4","repositories_count":1,"last_synced_at":"2025-08-20T11:47:58.393Z","metadata":{"has_sponsors_listing":false},"html_url":"https://github.com/fishglob","funding_links":[],"total_stars":23,"followers":0,"following":0,"created_at":"2025-08-20T11:47:58.417Z","updated_at":"2025-08-20T11:47:58.417Z","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fishglob","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fishglob/repositories"},"packages":[],"commits":{"id":10803473,"full_name":"fishglob/FishGlob_data","default_branch":"master","total_commits":324,"total_committers":8,"total_bot_commits":0,"total_bot_committers":0,"mean_commits":40.5,"dds":0.4598765432098766,"past_year_total_commits":79,"past_year_total_committers":6,"past_year_total_bot_commits":0,"past_year_total_bot_committers":0,"past_year_mean_commits":13.166666666666666,"past_year_dds":0.5063291139240507,"last_synced_at":"2026-05-13T03:07:29.123Z","last_synced_commit":"972e7a5e80d4caddd4557d877e2435916abd66b3","created_at":"2025-08-20T11:31:00.505Z","updated_at":"2026-05-02T12:44:24.844Z","committers":[{"name":"Aurore Maureaud","email":"aurore.aqua@gmail.com","login":"AquaAuma","count":175},{"name":"Juliano Palacios Abrantes","email":"j.palacios@oceans.ubc.ca","login":"jepa","count":66},{"name":"Malin Pinsky","email":"malin.pinsky@gmail.com","login":"mpinsky","count":51},{"name":"Zoë Kitchel","email":"31512830+zoekitchel","login":"zoekitchel","count":18},{"name":"Alexa Fredston","email":"alexa.fredston@gmail.com","login":"afredston","count":6},{"name":"Laurene Pecuchet","email":"laurene.pecuchet@gmail.com","login":"LaurenePecuchet","count":5},{"name":"Sean Anderson","email":"sean@seananderson.ca","login":"seananderson","count":2},{"name":"Esther Beukhof","email":"estb@aqua.dtu.dk","login":"eshdb","count":1}],"past_year_committers":[{"name":"Malin Pinsky","email":"malin.pinsky@gmail.com","login":"mpinsky","count":39},{"name":"Juliano Palacios Abrantes","email":"j.palacios@oceans.ubc.ca","login":"jepa","count":30},{"name":"Zoë Kitchel","email":"31512830+zoekitchel","login":"zoekitchel","count":4},{"name":"Alexa Fredston","email":"alexa.fredstonhermann@gmail.com","login":"afredston","count":3},{"name":"Sean Anderson","email":"sean@seananderson.ca","login":"seananderson","count":2},{"name":"Esther Beukhof","email":"estb@aqua.dtu.dk","login":"eshdb","count":1}],"commits_url":"https://commits.ecosyste.ms/api/v1/hosts/GitHub/repositories/fishglob%2FFishGlob_data/commits","host":{"name":"GitHub","url":"https://github.com","kind":"github","last_synced_at":"2026-05-15T00:00:35.990Z","repositories_count":6234168,"commits_count":894487224,"contributors_count":34899301,"owners_count":1153082,"icon_url":"https://github.com/github.png","host_url":"https://commits.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://commits.ecosyste.ms/api/v1/hosts/GitHub/repositories"}},"issues_stats":{"full_name":"fishglob/FishGlob_data","html_url":"https://github.com/fishglob/FishGlob_data","last_synced_at":"2026-05-11T02:05:54.800Z","status":"error","issues_count":5,"pull_requests_count":8,"avg_time_to_close_issue":2705590.0,"avg_time_to_close_pull_request":788822.4,"issues_closed_count":3,"pull_requests_closed_count":5,"pull_request_authors_count":4,"issue_authors_count":3,"avg_comments_per_issue":1.4,"avg_comments_per_pull_request":0.5,"merged_pull_requests_count":5,"bot_issues_count":0,"bot_pull_requests_count":0,"past_year_issues_count":5,"past_year_pull_requests_count":8,"past_year_avg_time_to_close_issue":2705590.0,"past_year_avg_time_to_close_pull_request":788822.4,"past_year_issues_closed_count":3,"past_year_pull_requests_closed_count":5,"past_year_pull_request_authors_count":4,"past_year_issue_authors_count":3,"past_year_avg_comments_per_issue":1.4,"past_year_avg_comments_per_pull_request":0.5,"past_year_bot_issues_count":0,"past_year_bot_pull_requests_count":0,"past_year_merged_pull_requests_count":5,"created_at":"2025-08-20T11:31:00.890Z","updated_at":"2026-05-11T02:05:54.800Z","repository_url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/repositories/fishglob%2FFishGlob_data","issues_url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/repositories/fishglob%2FFishGlob_data/issues","issue_labels_count":{"function":1,"documentation":1,"enhancement":1,"GMEX":1},"pull_request_labels_count":{"function":1},"issue_author_associations_count":{"CONTRIBUTOR":3,"COLLABORATOR":2},"pull_request_author_associations_count":{"CONTRIBUTOR":5,"COLLABORATOR":3},"issue_authors":{"mpinsky":2,"jepa":2,"LaurenePecuchet":1},"pull_request_authors":{"mpinsky":3,"afredston":2,"zoekitchel":2,"jepa":1},"host":{"name":"GitHub","url":"https://github.com","kind":"github","last_synced_at":"2026-05-13T00:00:11.310Z","repositories_count":14585372,"issues_count":34322651,"pull_requests_count":112368086,"authors_count":11260295,"icon_url":"https://github.com/github.png","host_url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/repositories","owners_url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/owners","authors_url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors"},"past_year_issue_labels_count":{"documentation":1,"enhancement":1,"function":1,"GMEX":1},"past_year_pull_request_labels_count":{"function":1},"past_year_issue_author_associations_count":{"CONTRIBUTOR":3,"COLLABORATOR":2},"past_year_pull_request_author_associations_count":{"CONTRIBUTOR":5,"COLLABORATOR":3},"past_year_issue_authors":{"jepa":2,"mpinsky":2,"LaurenePecuchet":1},"past_year_pull_request_authors":{"mpinsky":3,"afredston":2,"zoekitchel":2,"jepa":1},"maintainers":[{"login":"jepa","count":3,"url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors/jepa"},{"login":"zoekitchel","count":2,"url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors/zoekitchel"}],"active_maintainers":[{"login":"jepa","count":3,"url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors/jepa"},{"login":"zoekitchel","count":2,"url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors/zoekitchel"}]},"events":{"total":{"PullRequestEvent":6,"IssuesEvent":5,"IssueCommentEvent":6,"PushEvent":16,"PullRequestReviewEvent":2,"CreateEvent":8},"last_year":{"PullRequestEvent":6,"IssuesEvent":5,"IssueCommentEvent":6,"PushEvent":16,"PullRequestReviewEvent":2,"CreateEvent":8}},"keywords":[],"dependencies":[],"score":5.480638923341991,"created_at":"2023-09-13T08:45:11.282Z","updated_at":"2026-05-20T06:30:24.475Z","avatar_url":"https://github.com/fishglob.png","language":"R","category":"Biosphere","sub_category":"Marine Life and Fishery","monthly_downloads":0,"total_dependent_repos":0,"total_dependent_packages":0,"readme":"# FishGlob_data\n\n[![DOI](https://zenodo.org/badge/580133169.svg)](https://zenodo.org/badge/latestdoi/580133169)\n\nThis repository contains the FishGlob database, including the methods to load, clean, and process the public bottom trawl surveys in it. The database is described in the manuscript, \"An integrated database of fish biodiversity sampled with scientific bottom trawl surveys\" by Aurore A. Maureaud, Juliano Palacios-Abrantes, Zoë Kitchel, Laura Mannocci, Malin L. Pinsky, Alexa Fredston, Esther Beukhof, Daniel L. Forrest, Romain Frelat, Maria L.D. Palomares, Laurene Pecuchet, James T. Thorson, P. Daniël van Denderen, and Bastien Mérigot.\n\nThis database is a product of the CESAB working group, [FishGlob: Fish biodiversity under global change – a worldwide assessment from scientific trawl surveys](https://www.fondationbiodiversite.fr/en/the-frb-in-action/programs-and-projects/le-cesab/fishglob/).\n\n\u003cimg src =\"https://github.com/AquaAuma/FishGlob_data/blob/main/fishglob_logo.png\" width =\"200\"\u003e\n\nMain contacts: [Aurore A. Maureaud](mailto:aurore.aqua@gmail.com),  [Juliano Palacios-Abrantes](mailto:j.palacios@oceans.ubc.ca), [Zoë J. Kitchel](mailto:zoe.j.kitchel@gmail.com), and [Malin L. Pinsky](mailto:mpinsky@ucsc.edu)\n\n**Anyone interested in reusing this data or its outputs should read this readme as well as our [Data Disclaimer](https://docs.google.com/document/d/1uiEIcUugCf-dOSvio6hB1r8xFf0sm1Ip2IzjbMu9I4o/edit) in full.**\n\n[![CC BY 4.0][cc-by-shield]][cc-by]\n\nThis work is licensed under a\n[Creative Commons Attribution 4.0 International License][cc-by].\n\n[![CC BY 4.0][cc-by-image]][cc-by]\n\n[cc-by]: http://creativecommons.org/licenses/by/4.0/\n[cc-by-image]: https://i.creativecommons.org/l/by/4.0/88x31.png\n[cc-by-shield]: https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg\n\n### Structure of the repository\n\n* **cleaning_codes** includes all scripts to process and perform quality control on the trawl surveys.\n* **data_descriptor_figures** contains the R script to construct figures 2-4 for the data descriptor manuscript. \n* **functions** contains useful functions used in other scripts\n* **length_weight** contains the length-weight relationships for surveys where weights have to be calculated from abundance at length data (including NOR-BTS and DATRAS)\n* **metadata_docs** has a README with notes about each survey. This is a place to document changes in survey methods, quirks, etc. It is a growing list. If you have information to add, please open an Issue.\n* **outputs** contains all survey data processed .RData files and flagging outputs\n* **QAQC** contains the additional QAQC performed on surveys that required supplementary checks (DATRAS-sourced surveys)\n* **standard_formats** includes definitions of file formats in the FishGlob database, including survey ID codes.\n* **standardization_steps** contains the R codes to run a full survey standardization and a cross-survey summary of flagging methods\n* **summary** contains the quality check plots for each survey\n\n### Survey data processing steps\n\nData processing and cleaning is done on a per survey basis unless formats are similar across a group of surveys. The current repository can process 29 scientific bottom-trawl surveys, according to the following steps.\n\n**Steps** \n1. Merge the data files for one survey\n2. Clean \u0026 homogenize column names following the format described in *standard_formats/fishglob_data_columns.xlsx*\n3. Create missing columns and standardize units using the standard format *standard_formats/fishglob_data_columns.xlsx*\n4. Integrate the cleaned taxonomy by applying the function *clean_taxa()* and apply expert knowledge on taxonomic treatments\n5. Perform quality checks, including the output in the *summary* folder and specific QAQC for other surveys detailed in the QAQC folder\n\n### Survey data standardization and flags\n\nData standardization and flags are done on a per survey basis and per survey_unit basis (integrating seasons and quarters). Flags are performed both on the temporal occurrence of taxa and the spatio-temporal sampling footprint according to the following steps.\n\n**Steps**\n1. Taxonomic quality control: run flag_spp() for each survey region\n2. Apply methods to identify a standard spatial footprint through time for each survey-season/quarter (the survey_unit column). Use the functions apply_trimming_per_survey_unit_method1() and apply_trimming_per_survey_unit_method2() \n3. Display and integrate results in the summary files\n\n### Final data products\n\n**Options**\nUsers can either use the single survey data products in **outputs/Cleaned_data/** and work with survey .RData files including flags or not (inclusion of flags is specified by XX_std_clean.RData), or generate their own compiled version of the data by running the **cleaning_codes/merge.R** which will write local versions of the database in **outputs/Compiled_data/**\n\n### Author contributions\n*Contributors to code*\n- **Cleaning taxonomy**: Juliano Palacios-Abrantes \n- **Cleaning surveys**: Juliano Palacios-Abrantes, Aurore Maureaud, Zoë Kitchel, Dan Forrest, Daniël van Denderen, Laurene Pecuchet, Esther Beukhof\n- **Summary of surveys**: Juliano Palacios-Abrantes, Aurore Maureaud, Zoë Kitchel, Laura Mannocci\n- **Merge surveys**: Aurore Maureaud\n- **Standardize surveys**: Laura Mannocci, Malin Pinsky, Aurore Maureaud, Zoë Kitchel, Alexa Fredston\n- **QAQC of DATRAS surveys**: Aurore Maureaud, Daniël van Denderen, Esther Beukhof, Laurene Pecuchet\n- **QAQC of the Barents Sea surveys**: Laurene Pecuchet\n- **QAQC of North American surveys**: Zoë Kitchel, Malin Pinsky, Daniel Forrest\n\n### Credit and citation\n\nOur full citation policy is described in the [Fishglob_data disclaimer](https://docs.google.com/document/d/1uiEIcUugCf-dOSvio6hB1r8xFf0sm1Ip2IzjbMu9I4o/). Briefly, users should cite [Maureaud *et al.* 2021](https://doi.org/10.1111/gcb.15404), [Maureaud *et al.* 2024](https://www.nature.com/articles/s41597-023-02866-w), and relevant primary SBTS sources referenced in the FISHGLOB data files and source data tables of the two Maureaud *et al.* papers. Users integrating multiple surveys are encouraged to cite additional studies on data integration. \n\n### :warning: Important updates :warning:\n\n\u003e **5/06/2024**: A warning about CSVs\nDatasets are available for download in **outputs/Cleaned_data/** as .Rdata files. *We do not recommend saving FishGlob data in .csv format.* For at least some surveys, the `haul_id` column is composed of a long string of numerics, which is incorrectly rounded if loaded from a .csv programmatically in R (with `read_csv()` or `read.csv()`). As documented in [issue #49](https://github.com/AquaAuma/FishGlob_data/issues/49), this leads to errors in the `haul_id` column, and may occur regardless of the \"class\" assigned to this column. The most robust way to prevent this error is to write to / read from other data types such as .Rdata or .rds. Packages exist for users to import these into Python and other programming languages. \n\n\u003e **23/11/2023**: FishGlob_data v2.0\n\n\u003e **05/09/2023**: Norwegian survey is erroneous and will be replaced with a Barents Sea centered survey over 2004-onwards which will change the spatio-temporal coverage of the region (coordinated by Laurene Pecuchet with IMR), see [issue #29](https://github.com/AquaAuma/FishGlob_data/issues/29)\n","funding_links":[],"readme_doi_urls":["https://doi.org/10.1111/gcb.15404"],"works":{},"citation_counts":{},"total_citations":0,"keywords_from_contributors":[],"project_url":"https://ost.ecosyste.ms/api/v1/projects/46070","html_url":"https://ost.ecosyste.ms/projects/46070"}