https://github.com/allenai/smashed

SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batching, and more. Supports datasets from Huggingface, torchdata iterables, or simple lists of dictionaries.
dataset datasets dict huggingface in-context-learning mappers natural-language-processing nlp pipeline prefix prefix-tuning preprocessing prompting pytorch text torchdata transformer transformers
Added: over 1 year ago - Last Synced: 11 months ago - Created: July 21, 2022

  • Relevant topics? true
  • External users? true
  • Open source license? true
  • Active? true
  • Fork? false
  • Main Language: Python
  • Commits: 145
  • Committers: 6
  • Issues: 1
  • Pull Requests: 65
  • Owner: allenai
  • Stars: 30
  • Forks: 3
  • Packages: 1
  • Downloads: 11,380