Global Names Verifier
Verifies scientific names against more than 100 biodiversity databases.
https://github.com/gnames/gnverifier
Category: Biosphere
Sub Category: Biodiversity Data Cleaning and Standardization
Keywords
biodiversity bioinformatics go golang reconciliation resolution scientific-names verification
Last synced: about 18 hours ago
JSON representation
Repository metadata
GNverifier verifies scientific names against more than 100 biodiversity databases
- Host: GitHub
- URL: https://github.com/gnames/gnverifier
- Owner: gnames
- License: mit
- Created: 2020-09-21T11:49:47.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2025-04-02T17:52:43.000Z (25 days ago)
- Last Synced: 2025-04-17T22:43:55.214Z (9 days ago)
- Topics: biodiversity, bioinformatics, go, golang, reconciliation, resolution, scientific-names, verification
- Language: Go
- Homepage: https://verifier.globalnames.org
- Size: 2.94 MB
- Stars: 25
- Watchers: 8
- Forks: 1
- Open Issues: 11
- Releases: 31
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
- Citation: CITATION.cff
README.md
Global Names Verifier
Warning: Version v1.2.0 introduces some backward incompatible features:
Some flags for command line application are changed, CSV output now returns
TaxonomicStatus
instead of IsSynonym
. The term isSynonym
stays in JSON
format for backward compatibility, but is deprecated.
Try GNverifier
online.
Takes a scientific name or a list of scientific names and verifies them against
a variety of biodiversity Data Sources. Includes an advanced
search feature.
- Citing
- Features
- Installation
- [Using Homebrew on Mac OS X, Linux, and Linux on Windows (WSL2)](#using-homebrew-on-mac-os-x-linux-and-linux-on-windows-wsl2)
- MS Windows
- Linux and Mac (without Homebrew)
- Compile from source
- Usage
- Copyright
Citing
If you want to cite GNverifier, use DOI generated by Zenodo:
Features
- Small and fast app to verify scientific names against many biodiversity
databases. The app is a client to a verifier API. - It provides different match levels:
- Exact: complete match with a canonical form or a full name-string
from a data source. - Fuzzy: if exact match did not happen, it tries to match name-strings
assuming spelling errors. - FuzzyRelaxed: if exact match did not happen, it tries to match
name-strings using 'relaxed' fuzzy-matching rules. - Partial: strips middle or last epithets from bi- or multi-nomial names
and tries to match what is left. - PartialFuzzy: the same as Partial but assuming spelling mistakes.
- PartialFuzzyRelaxed: the same as PartialFuzzy but with relaxed
fuzzy-matchng rules - Virus: verification of virus names.
- FacetedSearch: marks advanced-search queries.
- Exact: complete match with a canonical form or a full name-string
- Fuzzy matching that tries to balance number of false positives and false
negatives (more information on: fuzzy-matching). - Taxonomic resolution. If a database contains taxonomic information, it
returns the currently accepted name for the provided name-string. - Best match is returned according to the match score. Data sources with some
manual curation have priority over auto-curated and uncurated datasets. For
example Catalogue of Life or WoRMS are considered curated,
GBIF auto-curated, uBio not curated. - Fine-tuning the match score by matching authors, years, ranks etc.
- It is possible to map any name-strings checklist to any of registered
Data Sources. - If a Data Source provides a classification for a name, it will be returned to
the output. - The app works for checking just one name-string, or multiple ones written in
a file. - Advanced search uses simple but powerful
query language
to find abbreviated names, search by author, year etc. - Supports feeding data via pipes of an operating system. This feature allows
to chain the program together with other tools. - GNverifier includes a web-based graphical user interface identical to its
"official" web-service.
Installation
WSL2)
Using Homebrew on Mac OS X, Linux, and Linux on Windows (Homebrew is a popular package manager for Open Source software originally
developed for Mac OS X. Now it is also available on Linux, and can easily
be used on Windows 10 or 11, if Windows Subsystem for Linux (WSL) is
installed.
To use GNverifier with Homebrew:
-
Install Homebrew
-
Open terminal and run the following commands:
brew tap gnames/gn
brew install gnverifier
MS Windows
Download the latest release from GitHub, unzip.
One possible way would be to create a default folder for executables and place
GNverifier
there.
Use Windows+R
keys
combination and type "cmd
". In the appeared terminal window type:
mkdir C:\Users\your_username\bin
copy path_to\gnverifier.exe C:\Users\your_username\bin
Add C:\Users\your_username\bin
directory to your PATH
user
and/or system
environment variable.
Another, simpler way, would be to use cd C:\Users\your_username\bin
command
in cmd
terminal window. The GNverifier program then will be automatically
found by Windows operating system when you run its commands from that
directory.
You can also read a more detailed guide for Windows users in
a PDF document.
Linux and Mac (without Homebrew)
If Homebrew is not installed, download the latest release from GitHub,
untar, and install binary somewhere in your path.
tar xvf gnverifier-linux-0.1.0.tar.xz
# or tar xvf gnverifier-mac-0.1.0.tar.gz
sudo mv gnverifier /usr/local/bin
Compile from source
Install Go according to installation instructions
go get github.com/gnames/gnverifier/gnverifier
Usage
GNverifier takes one name-string or a text file with one name-string per
line as an argument, sends a query with these data to a remote GNames
server to match the name-strings against many biodiversity
databases and returns results to STDOUT either in JSON, CSV or TSV format.
The app can alto take a query string like
g:M. sp:galloprovincialis au:Olivier
to perform advanced searching,
if the full scientific name is undetermined.
As a web service
gnverifier -p 8080
After running this command, you should be able to access web-based user
interface via a browser at http://localhost:8080
As a RESTful API
Refer to the RESTful API docs to learn how to use the same
functionality via scripts.
One name-string
gnverifier "Monohamus galloprovincialis"
Many name-strings in a file
gnverifier /path/to/names.txt
The app assumes that a file contains a simple list of names, one per line.
It is also possible to feed data via STDIN:
cat /path/to/names.txt | gnverifier
Advanced search
Advanced search allows to use a simple but powerful query language to find names
by abbreviated genus, a year or a range of years. See detailed description
in Advanced Search Query Language section.
gnverifier "g:B. sp:bubo au:Linn. y:1700-"
Options and flags
According to POSIX standard flags and options can be given either before or
after name-string or file name.
help
gnverifier -h
# or
gnverifier --help
# or
gnverifier
version
gnverifier -V
# or
gnverifier --version
port
Starts GNverifier as a web service using entered port
gnverifier -p 8080
This command will run user-interface accessible by a browser
at http://localhost:8080
all_matches
To see all matches instead of the best one use --all_matches flag.
WARNING: for some names the result will be excessively large.
gnverifier -s '1,12' -M file.txt
gnverifier --all_matches "Pardosa moesta"
This flag is ignored by advanced search.
capitalize
If your names are co not have uninomials or genera capitalized according to
rules on nomenclature, you can still verify them using this option. If
capitalize
flag is set, the first character of every name-string will be
capitalized (when appropriate). This flag is ignores by advanced search.
gnverifier -c "bubo bubo"
# or
gnverifier --capitalize "bubo bubo"
species group
If species_group
flag is on, a search of Aus bus
would also search for
Aus bus bus
and vice versa. This flag expands search to a species group of
a name if applicable. It means it involves into search botanical autonyms and
coordinated names in zoology.
gnverifier -G "Bubo bubo"
gnverifier --species_group "Bubo bubo"
relaxed fuzzy-match
Relaxes fuzzy-matching rules, allowing fuzzy match for words of any size, and
increasing maximum edit distance (for stems) to two. This creates many more
false positives, but increases recall. It is recommended to check results by
hand if this feature is enabled. The maximum number of names allowed when this
option is enabled is 50.
gnverifier -R "Bbo bbo"
gnverifier --fuzzy_relaxed "Bbo bbo"
fuzzy-match of uninomial names
When fuzzy_uninomial
flag is on, uninomials are allowed to go through
fuzzy matching, if needed. Normally this flag is off because fuzzy-matched
uninomials create a significant amount of false positives.
gnverifier -U "Pomatmus"
gnverifier --fuzzy_uninomial "Pomatmus"
format
Allows to pick a format for output. Supported formats are
- compact: one-liner JSON.
- pretty: prettified JSON with new lines and tabs for easier reading.
- tsv: returns tab-separated values representation.
- csv: (DEFAULT) returns comma-separated values representation.
# short form for compact JSON format
gnverifier -f compact file.txt
# or long form for "pretty" JSON format
gnverifier --format="pretty" file.csv
# tsv format
gnverifier -f tsv file.csv
Note that a separate JSON "document" is returned for each separate record,
instead of returning one big JSON document for all records. For large lists it
significantly speeds up parsing of the JSON on the user side.
jobs
If the list of names if very large, it is possible to tell GNverifier to
run requests in parallel. In this example GNverifier will run 8 processes
simultaneously. The order of returned names will be somewhat randomized.
gnverifier -j 8 file.txt
# or
gnverifier --jobs=8 file.tsv
Sometimes it is important to return names in exactly same order. For such
cases set jobs
flag to 1.
gnverifier -j 1 file.txt
This option is ignored by advanced search.
quiet
Removes log messages from the output. Note that results of verification go
to STDOUT, while log messages go to STDERR. So instead of using -q
flag
STDERR can be redirected to /dev/null
:
gnverifier "Puma concolor" -q >verif-results.csv
#or
gnverifier "Puma concolor 2>/dev/null >verif-results.csv
sources
By default GNverifier returns only one "best" result of a match. If a user
has a particular interest in a data set, s/he can set it with this option, and
all matches that exist for this source will be returned as well. You need to
provide a data source id for a dataset. Ids can be found at the following
URL. Some of them are provided in the GNverifier help
output as well.
Data from such sources will be returned in preferred_results section of JSON
output, or with CSV/TSV rows that start with "PreferredMatch" string.
gnverifier file.csv -s "1,11,172"
# or
gnverifier file.tsv --sources="12"
# or
cat file.txt | gnverifier -s '1,12'
If all matched sources need to be returned, set the flag to "0".
WARNING: the result might be excessively large.
gnverifier "Bubo bubo" -s 0
# potentially even more results get returned by adding --all_matches flag
gnverifier "Bubo bubo" -s 0 -M
The sources
option would overwrite ds:
settings in case of advanced search.
Configuration file
If you find yourself using the same flags over and over again, it makes sense
to edit configuration file instead. It is located at
$HOME/.config/gnverifier.yaml
. After that you do not need to use command line
options and flags. Configuration file is self-documented, the default
gnverifier.yaml is located on GitHub
gnverifier file.txt
In case if GNverifier runs as a web-based user interface, it is also
possible to use environment variables for configuration.
Env. Var. | Configuration |
---|---|
GNV_FORMAT | Format |
GNV_DATA_SOURCES | DataSources |
GNV_WITH_ALL_MATCHES | WithAllMatches |
GNV_WITH_CAPITALIZATION | WithCapitalization |
GNV_VERIFIER_URL | VerifierURL |
GNV_JOBS | Jobs |
Advanced Search Query Language
Example: g:M. sp:gallop. au:Oliv. y:1750-1799
or n:M. gallop. Oliv. 1750-1799
Query language allows searching for scientific names using name components
like genus name, specific epithet, infraspecific epithet, author, year.
It includes following operators:
g:
: Genus name, can be abbreviated (for example g:Bubo
, g:B.
).
sp:
: specific epithet, can be abbreviated (for example sp:galloprovincialis
,
sp:gallop.
).
isp:
: Infraspecific epithet, can be abbreviated (for example isp:auspicalis
,
isp:ausp.
).
asp:
: Either specific, or infraspecific epithet (for example asp:bubo
).
au:
: One of the authors of a name, can be abbreviated (for example au:Linn.
,
au:Linnaeus
).
y:
: Year. Can be one year, or a year range (for example y:1888
, y:1800-1802
,
y:1756-
, y:-1880
)
ds:
: Limit result to one or more data-sources. Note that command line sources
option, if given, will overwrite this setting (ds:1,2,172
).
tx:
: Parent taxon. Limit results to names that contain a particular higher taxon
in their classification. If ds:
is given, uses the classification of the
first data-source in the setting. If ds:
is not given, uses managerial
classification of the Catalogue of Life (tx:Hemiptera
, tx:Animalia
,
tx:Magnoliopsida
).
all:
: If true, GNverifier will show all results, not only the best ones.
The setting can be true
or false
(all:t
, all:f
). This setting
will also become true if sources
command line option is set to 0
.
n:
: A "name" setting. It allows to combine several query components together
for convenience. Note that it is not a 'real' scientific name, but a shortcut
to enter several settings at once loosely following rules of nomenclature
(n:B. bubo Linn. 1758
). For example, in contrast with GNparser results, it
is possible to have abbreviated specific epithets or range in
years: n:Mono. gall. Oliv. 1750-1800
.
Often there are errors in species epithets gender. Because of that search
will try to detect names in any gender that correspond to the epithet.
The search requires to have either sp:
, isp:
or asp:
setting,
or provide their analogs in n:
setting.
Examples of searches
gnverifier "n:Pom. saltator tx:Animalia y:1750-"
gnverifier "g:Plantago asp:major au:Linn."
gnverifier "g:Cara. isp:daurica ds:1,12"
Copyright
Authors: Dmitry Mozzherin
Copyright © 2020-2024 Dmitry Mozzherin. See LICENSE for further
details.
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." title: "GNverifier -- a reconciler and resolver of scientific names against more than 100 data sources." version: v1.2.2 authors: - family-names: "Mozzherin" given-names: "Dmitry" orcid: "https://orcid.org/0000-0003-1593-1417" repository-code: "https://github.com/gnames/gnverifier" date-released: 2024-11-04 doi: 10.5281/zenodo.10070488 license: MIT
Owner metadata
- Name: gnames
- Login: gnames
- Email:
- Kind: organization
- Description:
- Website:
- Location:
- Twitter:
- Company:
- Icon url: https://avatars.githubusercontent.com/u/11817407?v=4
- Repositories: 30
- Last ynced at: 2023-02-27T19:45:42.073Z
- Profile URL: https://github.com/gnames
GitHub Events
Total
- Issues event: 23
- Watch event: 4
- Issue comment event: 59
- Push event: 9
- Create event: 4
Last Year
- Issues event: 23
- Watch event: 4
- Issue comment event: 59
- Push event: 9
- Create event: 4
Committers metadata
Last synced: 7 days ago
Total Commits: 156
Total Committers: 1
Avg Commits per committer: 156.0
Development Distribution Score (DDS): 0.0
Commits in past year: 16
Committers in past year: 1
Avg Commits per committer in past year: 16.0
Development Distribution Score (DDS) in past year: 0.0
Name | Commits | |
---|---|---|
Dmitry Mozzherin | d****n@g****m | 156 |
Committer domains:
Issue and Pull Request metadata
Last synced: 2 days ago
Total issues: 132
Total pull requests: 0
Average time to close issues: 24 days
Average time to close pull requests: N/A
Total issue authors: 19
Total pull request authors: 0
Average comments per issue: 2.48
Average comments per pull request: 0
Merged pull request: 0
Bot issues: 0
Bot pull requests: 0
Past year issues: 24
Past year pull requests: 0
Past year average time to close issues: 15 days
Past year average time to close pull requests: N/A
Past year issue authors: 9
Past year pull request authors: 0
Past year average comments per issue: 3.42
Past year average comments per pull request: 0
Past year merged pull request: 0
Past year bot issues: 0
Past year bot pull requests: 0
Top Issue Authors
- dimus (74)
- abubelinha (15)
- BenMerSci (11)
- Adafede (7)
- dshorthouse (5)
- larsgw (3)
- Jegelewicz (3)
- mjy (2)
- thompsonmj (2)
- aguilbau (1)
- Rekyt (1)
- p-schaefer (1)
- ka7eh (1)
- cpauvert (1)
- frousseu (1)
Top Pull Request Authors
Top Issue Labels
- bug (10)
- question (5)
- help wanted (4)
- wontfix (3)
- duplicate (2)
- Epic (1)
Top Pull Request Labels
Package metadata
- Total packages: 1
- Total downloads: unknown
- Total dependent packages: 2
- Total dependent repositories: 2
- Total versions: 58
proxy.golang.org: github.com/gnames/gnverifier
Copyright © 2020 Dmitry Mozzherin <[email protected]> Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
- Homepage: https://github.com/gnames/gnverifier
- Documentation: https://pkg.go.dev/github.com/gnames/gnverifier#section-documentation
- Licenses: MIT
- Latest release: v1.2.5 (published 25 days ago)
- Last Synced: 2025-04-25T13:04:30.885Z (2 days ago)
- Versions: 58
- Dependent Packages: 2
- Dependent Repositories: 2
-
Rankings:
- Dependent repos count: 3.495%
- Dependent packages count: 4.172%
- Average: 7.788%
- Stargazers count: 10.056%
- Forks count: 13.428%
Dependencies
- github.com/davecgh/go-spew v1.1.1
- github.com/dnaeon/go-vcr v1.2.0
- github.com/dustin/go-humanize v1.0.0
- github.com/fsnotify/fsnotify v1.5.4
- github.com/gnames/gnfmt v0.2.0
- github.com/gnames/gnlib v0.14.0
- github.com/gnames/gnquery v0.3.3
- github.com/gnames/gnstats v0.1.0
- github.com/gnames/gnsys v0.2.2
- github.com/gnames/gnuuid v0.1.1
- github.com/golang-jwt/jwt v3.2.2+incompatible
- github.com/golang/snappy v0.0.4
- github.com/google/uuid v1.3.0
- github.com/hashicorp/hcl v1.0.0
- github.com/inconshreveable/mousetrap v1.0.0
- github.com/json-iterator/go v1.1.12
- github.com/kr/pretty v0.3.0
- github.com/labstack/echo/v4 v4.7.2
- github.com/labstack/gommon v0.3.1
- github.com/magiconair/properties v1.8.6
- github.com/mattn/go-colorable v0.1.12
- github.com/mattn/go-isatty v0.0.14
- github.com/maxbrunsfeld/counterfeiter/v6 v6.5.0
- github.com/mitchellh/mapstructure v1.5.0
- github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd
- github.com/modern-go/reflect2 v1.0.2
- github.com/nsqio/go-nsq v1.1.0
- github.com/pelletier/go-toml v1.9.5
- github.com/pelletier/go-toml/v2 v2.0.1
- github.com/pmezard/go-difflib v1.0.0
- github.com/pointlander/compress v1.1.1-0.20190518213731-ff44bd196cc3
- github.com/pointlander/jetset v1.0.1-0.20190518214125-eee7eff80bd4
- github.com/pointlander/peg v1.0.1
- github.com/rs/zerolog v1.26.1
- github.com/sfgrp/lognsq v0.1.1
- github.com/spf13/afero v1.8.2
- github.com/spf13/cast v1.4.1
- github.com/spf13/cobra v1.4.0
- github.com/spf13/jwalterweatherman v1.1.0
- github.com/spf13/pflag v1.0.5
- github.com/spf13/viper v1.11.0
- github.com/stretchr/testify v1.7.1
- github.com/subosito/gotenv v1.2.0
- github.com/valyala/bytebufferpool v1.0.0
- github.com/valyala/fasttemplate v1.2.1
- golang.org/x/crypto v0.0.0-20220507011949-2cf3adece122
- golang.org/x/mod v0.6.0-dev.0.20220106191415-9b9b3d81d5e3
- golang.org/x/net v0.0.0-20220425223048-2871e0cb64e4
- golang.org/x/sys v0.0.0-20220503163025-988cb79eb6c6
- golang.org/x/text v0.3.7
- golang.org/x/time v0.0.0-20220411224347-583f2d630306
- golang.org/x/tools v0.1.10
- golang.org/x/xerrors v0.0.0-20220411194840-2f41105eb62f
- gopkg.in/ini.v1 v1.66.4
- gopkg.in/yaml.v2 v2.4.0
- gopkg.in/yaml.v3 v3.0.0-20210107192922-496545a6307b
- 523 dependencies
- alpine 3.14 build
Score: 4.969813299576001