https://github.com/alasdairforsythe/tokenmonster

Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
text-tokenization tokenisation tokenization tokenize tokenizer tokenizing vocabulary vocabulary-builder vocabulary-generator
Added: over 1 year ago - Last Synced: 11 months ago - Created: May 12, 2023

  • Relevant topics? true
  • External users? true
  • Open source license? true
  • Active? true
  • Fork? false
  • Main Language: Go
  • Commits: 168
  • Committers: 1
  • Issues: 25
  • Pull Requests: 3