{"id":349062,"name":"GBIF Name Parser","description":"The core GBIF scientific name parser library.","url":"https://github.com/gbif/name-parser","last_synced_at":"2026-06-23T21:00:28.815Z","repository":{"id":13510928,"uuid":"16201889","full_name":"gbif/name-parser","owner":"gbif","description":"The core GBIF scientific name parser library","archived":false,"fork":false,"pushed_at":"2026-06-15T09:50:30.000Z","size":3212,"stargazers_count":19,"open_issues_count":52,"forks_count":4,"subscribers_count":18,"default_branch":"master","last_synced_at":"2026-06-17T18:04:41.007Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gbif.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2014-01-24T10:44:23.000Z","updated_at":"2026-06-13T14:21:08.000Z","dependencies_parsed_at":"2026-03-31T05:00:29.606Z","dependency_job_id":null,"html_url":"https://github.com/gbif/name-parser","commit_stats":null,"previous_names":[],"tags_count":73,"template":false,"template_full_name":null,"purl":"pkg:github/gbif/name-parser","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gbif%2Fname-parser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gbif%2Fname-parser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gbif%2Fname-parser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gbif%2Fname-parser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gbif","download_url":"https://codeload.github.com/gbif/name-parser/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gbif%2Fname-parser/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34544413,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-19T02:00:06.005Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"owner":{"login":"gbif","name":"Global Biodiversity Information Facility","uuid":"1963797","kind":"organization","description":"","email":null,"website":"https://www.gbif.org","location":"Copenhagen, Denmark","twitter":null,"company":null,"icon_url":"https://avatars.githubusercontent.com/u/1963797?v=4","repositories_count":288,"last_synced_at":"2024-04-14T06:45:04.085Z","metadata":{"has_sponsors_listing":false},"html_url":"https://github.com/gbif","funding_links":[],"total_stars":760,"followers":101,"following":0,"created_at":"2022-11-05T14:41:11.499Z","updated_at":"2024-04-14T06:46:12.959Z","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gbif","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gbif/repositories"},"packages":[],"commits":{"id":11691961,"full_name":"gbif/name-parser","default_branch":"master","total_commits":660,"total_committers":14,"total_bot_commits":2,"total_bot_committers":1,"mean_commits":47.142857142857146,"dds":0.2621212121212121,"past_year_total_commits":90,"past_year_total_committers":2,"past_year_total_bot_commits":0,"past_year_total_bot_committers":0,"past_year_mean_commits":45.0,"past_year_dds":0.1333333333333333,"last_synced_at":"2026-06-21T20:02:14.555Z","last_synced_commit":"20f7d6e6b603ef40927087f62a6ce625d4723217","created_at":"2026-03-23T01:01:37.089Z","updated_at":"2026-06-21T20:01:08.165Z","committers":[{"name":"Markus Döring","email":"mdoering@gbif.org","login":"mdoering","count":487},{"name":"gbif-jenkins","email":"dev@gbif.org","login":"gbif-jenkins","count":96},{"name":"gbif-jenkins","email":"jenkins@rancor.gbif.org","login":null,"count":30},{"name":"gbif-jenkins","email":"jenkins@jenkins-vh.gbif.org","login":null,"count":18},{"name":"pal155","email":"Doug.Palmer@csiro.au","login":"charvolant","count":8},{"name":"Federico Mendez","email":"fmendez@gbif.org","login":"fmendezh","count":7},{"name":"Oliver Meyn","email":"oliver@mineallmeyn.com","login":"omeyn","count":3},{"name":"Kyle Braak","email":"kbraak@gbif.org","login":null,"count":3},{"name":"dependabot[bot]","email":"49699333+dependabot[bot]","login":"dependabot[bot]","count":2},{"name":"Matthew Blissett","email":"mblissett@gbif.org","login":"MattBlissett","count":2},{"name":"Thomas Stjernegaard Jeppesen","email":"tsjeppesen@gbif.org","login":"thomasstjerne","count":1},{"name":"Nikolay Volik","email":"nvolik@gbif.org","login":"muttcg","count":1},{"name":"Christian Gendreau","email":"cgendreau","login":"cgendreau","count":1},{"name":"Jorrit Poelen","email":"jhpoelen@gmail.com","login":null,"count":1}],"past_year_committers":[{"name":"Markus Döring","email":"mdoering@gbif.org","login":"mdoering","count":78},{"name":"gbif-jenkins","email":"dev@gbif.org","login":"gbif-jenkins","count":12}],"commits_url":"https://commits.ecosyste.ms/api/v1/hosts/GitHub/repositories/gbif%2Fname-parser/commits","host":{"name":"GitHub","url":"https://github.com","kind":"github","last_synced_at":"2026-06-23T00:00:10.068Z","repositories_count":6266913,"commits_count":874849903,"contributors_count":35100938,"owners_count":1170691,"icon_url":"https://github.com/github.png","host_url":"https://commits.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://commits.ecosyste.ms/api/v1/hosts/GitHub/repositories"}},"issues_stats":{"full_name":"gbif/name-parser","html_url":"https://github.com/gbif/name-parser","last_synced_at":"2026-06-15T17:00:25.904Z","status":"active","issues_count":4,"pull_requests_count":3,"avg_time_to_close_issue":114712.66666666667,"avg_time_to_close_pull_request":57980611.0,"issues_closed_count":3,"pull_requests_closed_count":2,"pull_request_authors_count":2,"issue_authors_count":3,"avg_comments_per_issue":1.25,"avg_comments_per_pull_request":0.3333333333333333,"merged_pull_requests_count":0,"bot_issues_count":0,"bot_pull_requests_count":3,"past_year_issues_count":3,"past_year_pull_requests_count":1,"past_year_avg_time_to_close_issue":169850.5,"past_year_avg_time_to_close_pull_request":null,"past_year_issues_closed_count":2,"past_year_pull_requests_closed_count":0,"past_year_pull_request_authors_count":1,"past_year_issue_authors_count":3,"past_year_avg_comments_per_issue":1.3333333333333333,"past_year_avg_comments_per_pull_request":0.0,"past_year_bot_issues_count":0,"past_year_bot_pull_requests_count":1,"past_year_merged_pull_requests_count":0,"created_at":"2025-08-29T18:55:33.165Z","updated_at":"2026-06-15T17:00:25.904Z","repository_url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/repositories/gbif%2Fname-parser","issues_url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/repositories/gbif%2Fname-parser/issues","issue_labels_count":{"bug":1},"pull_request_labels_count":{"dependencies":2,"java":1},"issue_author_associations_count":{"NONE":3,"MEMBER":1},"pull_request_author_associations_count":{"CONTRIBUTOR":2,"NONE":1},"issue_authors":{"djtfmartin":2,"CecSve":1,"mdoering":1},"pull_request_authors":{"dependabot[bot]":2,"renovate[bot]":1},"host":{"name":"GitHub","url":"https://github.com","kind":"github","last_synced_at":"2026-06-17T00:00:21.091Z","repositories_count":14814369,"issues_count":33099693,"pull_requests_count":109301442,"authors_count":11309610,"icon_url":"https://github.com/github.png","host_url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/repositories","owners_url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/owners","authors_url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors"},"past_year_issue_labels_count":{"bug":1},"past_year_pull_request_labels_count":{"dependencies":1,"java":1},"past_year_issue_author_associations_count":{"MEMBER":1,"NONE":1},"past_year_pull_request_author_associations_count":{"CONTRIBUTOR":1},"past_year_issue_authors":{"djtfmartin":1,"mdoering":1},"past_year_pull_request_authors":{"dependabot[bot]":1},"maintainers":[{"login":"mdoering","count":1,"url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors/mdoering"}],"active_maintainers":[{"login":"mdoering","count":1,"url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors/mdoering"}]},"events":{"total":{"DeleteEvent":3,"PullRequestEvent":1,"ForkEvent":1,"IssuesEvent":2,"WatchEvent":1,"IssueCommentEvent":4,"PushEvent":61,"CreateEvent":6},"last_year":{"DeleteEvent":1,"PullRequestEvent":1,"IssuesEvent":1,"WatchEvent":1,"IssueCommentEvent":2,"PushEvent":44,"CreateEvent":3}},"keywords":[],"dependencies":[{"ecosystem":"maven","filepath":"name-parser/pom.xml","sha":null,"kind":"manifest","created_at":"2022-09-05T02:42:14.134Z","updated_at":"2022-09-05T02:42:14.134Z","repository_link":"https://github.com/gbif/name-parser/blob/master/name-parser/pom.xml","dependencies":[{"id":3858246688,"package_name":"org.gbif:name-parser-api","ecosystem":"maven","requirements":null,"direct":true,"kind":"runtime","optional":false},{"id":3858246689,"package_name":"org.slf4j:slf4j-api","ecosystem":"maven","requirements":null,"direct":true,"kind":"runtime","optional":false},{"id":3858246690,"package_name":"commons-io:commons-io","ecosystem":"maven","requirements":null,"direct":true,"kind":"runtime","optional":false},{"id":3858246691,"package_name":"org.apache.commons:commons-lang3","ecosystem":"maven","requirements":null,"direct":true,"kind":"runtime","optional":false},{"id":3858246692,"package_name":"com.google.guava:guava","ecosystem":"maven","requirements":null,"direct":true,"kind":"runtime","optional":false},{"id":3858246693,"package_name":"org.gbif:name-parser-api","ecosystem":"maven","requirements":null,"direct":true,"kind":"test","optional":false},{"id":3858246694,"package_name":"junit:junit","ecosystem":"maven","requirements":null,"direct":true,"kind":"test","optional":false},{"id":3858246695,"package_name":"ch.qos.logback:logback-classic","ecosystem":"maven","requirements":null,"direct":true,"kind":"test","optional":false}]},{"ecosystem":"maven","filepath":"name-parser-api/pom.xml","sha":null,"kind":"manifest","created_at":"2022-09-05T02:42:14.292Z","updated_at":"2022-09-05T02:42:14.292Z","repository_link":"https://github.com/gbif/name-parser/blob/master/name-parser-api/pom.xml","dependencies":[{"id":3858246799,"package_name":"com.google.code.findbugs:jsr305","ecosystem":"maven","requirements":null,"direct":true,"kind":"runtime","optional":false},{"id":3858246800,"package_name":"com.google.guava:guava","ecosystem":"maven","requirements":null,"direct":true,"kind":"runtime","optional":false},{"id":3858246801,"package_name":"org.apache.commons:commons-lang3","ecosystem":"maven","requirements":null,"direct":true,"kind":"runtime","optional":false},{"id":3858246802,"package_name":"org.slf4j:slf4j-api","ecosystem":"maven","requirements":null,"direct":true,"kind":"runtime","optional":false},{"id":3858246803,"package_name":"junit:junit","ecosystem":"maven","requirements":null,"direct":true,"kind":"test","optional":false},{"id":3858246804,"package_name":"commons-io:commons-io","ecosystem":"maven","requirements":null,"direct":true,"kind":"test","optional":false},{"id":3858246805,"package_name":"ch.qos.logback:logback-classic","ecosystem":"maven","requirements":null,"direct":true,"kind":"test","optional":false}]},{"ecosystem":"maven","filepath":"name-parser-v1/pom.xml","sha":null,"kind":"manifest","created_at":"2022-09-05T02:42:14.390Z","updated_at":"2022-09-05T02:42:14.390Z","repository_link":"https://github.com/gbif/name-parser/blob/master/name-parser-v1/pom.xml","dependencies":[{"id":3858246865,"package_name":"org.gbif:gbif-api","ecosystem":"maven","requirements":null,"direct":true,"kind":"runtime","optional":false},{"id":3858246866,"package_name":"org.gbif:name-parser","ecosystem":"maven","requirements":null,"direct":true,"kind":"runtime","optional":false},{"id":3858246867,"package_name":"org.gbif:name-parser-api","ecosystem":"maven","requirements":null,"direct":true,"kind":"runtime","optional":false},{"id":3858246868,"package_name":"org.slf4j:slf4j-api","ecosystem":"maven","requirements":null,"direct":true,"kind":"runtime","optional":false},{"id":3858246869,"package_name":"junit:junit","ecosystem":"maven","requirements":null,"direct":true,"kind":"test","optional":false},{"id":3858246870,"package_name":"ch.qos.logback:logback-classic","ecosystem":"maven","requirements":null,"direct":true,"kind":"test","optional":false}]},{"ecosystem":"maven","filepath":"pom.xml","sha":null,"kind":"manifest","created_at":"2022-09-05T02:42:14.530Z","updated_at":"2022-09-05T02:42:14.530Z","repository_link":"https://github.com/gbif/name-parser/blob/master/pom.xml","dependencies":[{"id":3858248593,"package_name":"com.google.code.findbugs:jsr305","ecosystem":"maven","requirements":"3.0.2","direct":true,"kind":"runtime","optional":false},{"id":3858248594,"package_name":"org.gbif:name-parser","ecosystem":"maven","requirements":"3.7.3-SNAPSHOT","direct":true,"kind":"runtime","optional":false},{"id":3858248595,"package_name":"org.gbif:name-parser-api","ecosystem":"maven","requirements":"3.7.3-SNAPSHOT","direct":true,"kind":"runtime","optional":false},{"id":3858248596,"package_name":"org.gbif:name-parser-gbif","ecosystem":"maven","requirements":"3.7.3-SNAPSHOT","direct":true,"kind":"runtime","optional":false},{"id":3858248597,"package_name":"org.gbif:gbif-api","ecosystem":"maven","requirements":"0.166","direct":true,"kind":"runtime","optional":false},{"id":3858248598,"package_name":"org.slf4j:slf4j-api","ecosystem":"maven","requirements":"1.7.24","direct":true,"kind":"runtime","optional":false},{"id":3858248599,"package_name":"commons-io:commons-io","ecosystem":"maven","requirements":"2.8.0","direct":true,"kind":"runtime","optional":false},{"id":3858248600,"package_name":"org.apache.commons:commons-lang3","ecosystem":"maven","requirements":"3.12.0","direct":true,"kind":"runtime","optional":false},{"id":3858248601,"package_name":"com.google.guava:guava","ecosystem":"maven","requirements":"28.0-jre","direct":true,"kind":"runtime","optional":false},{"id":3858248602,"package_name":"org.gbif:name-parser-api","ecosystem":"maven","requirements":"3.7.3-SNAPSHOT","direct":true,"kind":"test","optional":false},{"id":3858248603,"package_name":"junit:junit","ecosystem":"maven","requirements":"4.12","direct":true,"kind":"test","optional":false},{"id":3858248604,"package_name":"ch.qos.logback:logback-classic","ecosystem":"maven","requirements":"1.2.3","direct":true,"kind":"test","optional":false}]}],"score":6.901737206656573,"created_at":"2026-03-21T00:13:30.378Z","updated_at":"2026-06-23T21:00:28.821Z","avatar_url":"https://github.com/gbif.png","language":"Java","category":"Biosphere","sub_category":"Biodiversity Data Cleaning and Standardization","monthly_downloads":0,"total_dependent_repos":0,"total_dependent_packages":0,"readme":"# GBIF Name Parser\n\nA library and command-line tool that parses scientific names — including the\nauthorship, rank, hybrid markers and nomenclatural notes — into a structured\n[`ParsedName`](name-parser-api/src/main/java/org/gbif/nameparser/api/ParsedName.java)\nmodel.\n\n## Modules\n\n| Module | Purpose |\n|---|---|\n| `name-parser-api`  | Pure model + interface module: `ParsedName`, `Authorship`, `Rank`, `NomCode`, `NameType`, the `NameParser` interface, plus formatter / Unicode utilities. Depend on this if you only need the data model. |\n| `name-parser`      | The parser implementation. Single public entry point: `org.gbif.nameparser.NameParserImpl`. |\n| `name-parser-cli`  | Command-line tools (`parse`, `compare`, `benchmark`) wrapping the parser, packaged as an executable shaded jar. |\n\nBuild everything with `mvn install` from the repo root.\n\n## Library use\n\n```xml\n\u003cdependency\u003e\n  \u003cgroupId\u003eorg.gbif\u003c/groupId\u003e\n  \u003cartifactId\u003ename-parser\u003c/artifactId\u003e\n  \u003cversion\u003e4.0.0-SNAPSHOT\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\n```java\nNameParser parser = new NameParserImpl();\nParsedName pn = parser.parse(\"Vulpes vulpes silaceus Miller, 1907\", null, null, null);\n```\n\n## Command-line interface\n\nAfter `mvn install`, the executable jar is at\n`name-parser-cli/target/name-parser-cli-\u003cversion\u003e-shaded.jar`.\n\n```\njava -jar name-parser-cli-\u003cversion\u003e-shaded.jar \u003ccommand\u003e [options]\n```\n\n| Command | What it does |\n|---|---|\n| `parse`     | Stream a text file with one name per row through the parser and write a JSONL file (one JSON object per row). |\n| `compare`   | Stream two JSONL files in lockstep, report aggregate metrics and a per-row dump of every differing parsed value. |\n| `benchmark` | Measure parser throughput against a name-per-line input file (count, total / avg / min / p50 / p95 / max). |\n\nRun `\u003ccommand\u003e --help` for the full per-command option list.\n\nAll commands stream their input — memory use stays flat regardless of input size,\nso multi-million-row inputs are fine.\n\n### Bundled sample corpora\n\nSample inputs ship in `name-parser-cli/data/`:\n\n* `benchmark-data.txt` — ~8k mixed names (hand-picked + test-assertion inputs +\n  random Catalogue of Life rows with authorship) used for throughput benchmarking.\n  Top up with more random names anytime via:\n  ```sh\n  python3 name-parser-cli/scripts/append-colnames-sample.py [-n 2000] [--seed 17]\n  ```\n  The script reservoir-samples col-names.tsv in a single pass and appends rows\n  as `scientificName authorship` — manual edits to the benchmark file are\n  preserved.\n* `col-names.tsv` — the full Catalogue of Life names dump (~6.3M rows, ~340 MB,\n  not tracked in git — drop your own copy here)\n\nEach command's `--input` defaults assume you run it from the repo root.\n\n### `parse`\n\n```\nUsage: name-parser-cli parse [options]\n\nOptions:\n  --input=PATH    source file (default: data/col-names.tsv; '-' = stdin)\n  --output=PATH   target file (default: \u003cinput\u003e.\u003cformat-ext\u003e; '-' = stdout)\n  --format=FMT    output format: jsonl (default), json, csv, tsv\n                  csv / tsv produce a flat ColDP Name file with header\n  --quiet         suppress progress output\n  -h --help       print this message and exit\n```\n\nUse `-` as the input or output path to stream from stdin / to stdout — the\ncommand is fully unix-pipe friendly. Progress messages and the final summary\nare written to **stderr** so stdout stays a clean data stream:\n\n```sh\ncat names.txt | name-parser-cli parse --input=- --output=- --format=tsv | head\nxz -dc col-names.tsv.xz | name-parser-cli parse --input=- --output=- --format=jsonl \u003e col.jsonl\n```\n\n#### Input\n\nThe input format is auto-detected from the first non-blank, non-comment line:\n\n* **ColDP Name file** (TSV or CSV) — recognised when the header row contains\n  any [`ColdpTerm`](https://github.com/CatalogueOfLife/coldp/blob/master/README.md#name)\n  property names (looked up via `ColdpTerm.find`). Only the columns the parser\n  interface accepts are honoured: `ID`, `scientificName`, `authorship`, `rank`,\n  `code`. Other columns are read but ignored.\n* **Plain text** — one name per line. If a line contains a tab, only the\n  substring before the first tab is treated as the name (so `col-names.tsv` is\n  usable both as ColDP-style TSV and as bare plain text).\n\nLines starting with `#` and blank lines are skipped.\n\n#### Output formats\n\n| Format | Description |\n|---|---|\n| `jsonl` (default) | One self-contained JSON object per line; consumed by `compare`. |\n| `json` | Single document containing a JSON array of all rows (streamed; not held in memory). |\n| `csv` / `tsv` | Flat [ColDP Name file](https://github.com/CatalogueOfLife/coldp/blob/master/README.md#name) with header row. |\n\nJSON / JSONL rows look like:\n\n```json\n{\"line\":42,\"id\":\"42\",\"input\":\"Felis catus\",\"parsed\":{ ...full ParsedName... }}\n{\"line\":99,\"id\":\"99\",\"input\":\"Iridoviridae\",\"error\":{\"type\":\"VIRUS\",\"message\":\"...\"}}\n```\n\nThe `id` field is populated from the ColDP `ID` column when present; otherwise\nit is omitted.\n\n#### ColDP CSV/TSV column mapping\n\nEvery structural `ParsedName` field maps to a ColDP column. Where the ColDP\n`Name` entity lacks a column but the `NameUsage` entity defines one, that\nNameUsage term is used (`nameStatus`, `namePhrase`, `namePublishedInPage`,\n`provisional`, `extinct`). Parser-only fields without a ColDP equivalent are\nwritten into custom columns prefixed with `np:` — strict ColDP readers ignore\nunknown columns, so the file stays valid ColDP.\n\nMulti-value rules: author lists join with `|` (the ColDP convention); `notho`\nparts join with `,`.\n\n| `ParsedName` field | ColDP column |\n|---|---|\n| `id` (from input) | `ID` (falls back to verbatim scientificName when absent) |\n| `canonicalNameWithoutAuthorship()` (`Candidatus ` prefixed when applicable) | `scientificName` |\n| `authorshipComplete()` | `authorship` |\n| `rank`, `code` | `rank`, `code` (lower-cased) |\n| `nomenclaturalNote` (or `manuscript` flag) | `nameStatus` |\n| `uninomial`, `genus`, `infragenericEpithet`, `specificEpithet`, `infraspecificEpithet`, `cultivarEpithet` | same column names |\n| `notho` (every flagged part, comma-joined) | `notho` |\n| `originalSpelling` | `originalSpelling` |\n| `combinationAuthorship.{authors,exAuthors,year}` | `combinationAuthorship`, `combinationExAuthorship`, `combinationAuthorshipYear` (authors joined with `\\|`) |\n| `basionymAuthorship.{authors,exAuthors,year}` | `basionymAuthorship`, `basionymExAuthorship`, `basionymAuthorshipYear` (authors joined with `\\|`) |\n| `publishedIn` (free text) | `namePublishedInPage` |\n| `extinct` | `extinct` |\n| `phrase` | `namePhrase` |\n| `doubtful` | `provisional` |\n| `type` (when not `SCIENTIFIC`) | `np:type` |\n| `sanctioningAuthor` | `np:sanctioningAuthor` |\n| `taxonomicNote` (sensu) | `np:taxonomicNote` |\n| `unparsed` | `np:unparsed` |\n| `warnings` (joined with `\\|`) | `np:warnings` |\n| (parser failure message) | `np:error` |\n\nUnparsable rows are still written: `ID`, `scientificName` (the verbatim input)\nand the `np:type` / `np:error` columns are populated.\n\n### `compare`\n\n```\nUsage: name-parser-cli compare [options] \u003ca.jsonl\u003e \u003cb.jsonl\u003e [diffs.txt]\n\nOptions:\n  --a=PATH              first JSONL file (alt. to first positional arg)\n  --b=PATH              second JSONL file (alt. to second positional arg)\n  --output=PATH         write per-row diffs here (default: stdout)\n  --ignore-whitespace   strip whitespace from string leaves before compare\n  --max-diffs=N         cap per-row diff dump at N rows (default: 100)\n  -h --help             print this message and exit\n```\n\nBoth inputs are expected to come from the same source file (matching line\nnumbers, same row order). The summary reports rows compared / identical /\ndiffering, status transitions (`PARSED→ERROR`, `ERROR→PARSED`, …) and the top\ndiffering field paths. Whitespace inside parsed string values is significant by\ndefault — pass `--ignore-whitespace` to suppress whitespace-only differences in\nparsed values (the JSON formatting itself is ignored either way).\n\n### `benchmark`\n\n```\nUsage: name-parser-cli benchmark [options]\n\nOptions:\n  --input=PATH    source file (default: data/benchmark-data.txt)\n  --warmup        do an extra untimed pass over the input first to warm the JIT\n  -h --help       print this message and exit\n```\n\nPure throughput measurement — every input row is parsed and timed. JIT warmup\nis opt-in via `--warmup`, in which case the input is streamed through the\nparser once without timing before the timed pass; on subsequent runs the\nHotSpot-warmed numbers tend to be ~10× lower. Nothing is written to disk; the\nreport goes to stdout.\n\n## License\n\nApache 2.0.\n","funding_links":[],"readme_doi_urls":[],"works":{},"citation_counts":{},"total_citations":0,"keywords_from_contributors":["biodiversity-informatics","darwin-core","taxonomy","gbif","tdwg","biodiversity","species","snapshot","interest-group"],"project_url":"https://ost.ecosyste.ms/api/v1/projects/349062","html_url":"https://ost.ecosyste.ms/projects/349062"}