https://github.com/allenai/dolma

Data and tools for generating and inspecting OLMo pre-training data.
data-processing large-language-models llm machile-learning nlp
Added: over 1 year ago - Last Synced: about 1 year ago - Created: June 20, 2023

  • Relevant topics? true
  • External users? true
  • Open source license? true
  • Active? true
  • Fork? false
  • Main Language: Python
  • Commits: 233
  • Committers: 13
  • Issues: 87
  • Pull Requests: 135
  • Owner: allenai
  • Stars: 800
  • Forks: 78
  • Packages: 1
  • Downloads: 17,721
https://github.com/allenai/allenact

An open source framework for research in Embodied-AI from AI2.
ai ai2 computer-vision deep-learning embodied-agent python reinforcement-learning research
Added: over 1 year ago - Last Synced: about 1 year ago - Created: January 14, 2020

  • Relevant topics? true
  • External users? true
  • Open source license? true
  • Active? true
  • Fork? false
  • Main Language: Python
  • Commits: 1833
  • Committers: 26
  • Issues: 44
  • Pull Requests: 106
  • Owner: allenai
  • Stars: 296
  • Forks: 46
  • Packages: 2
  • Downloads: 292
https://github.com/allenai/smashed

SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batching, and more. Supports datasets from Huggingface, torchdata iterables, or simple lists of dictionaries.
dataset datasets dict huggingface in-context-learning mappers natural-language-processing nlp pipeline prefix prefix-tuning preprocessing prompting pytorch text torchdata transformer transformers
Added: over 1 year ago - Last Synced: about 1 year ago - Created: July 21, 2022

  • Relevant topics? true
  • External users? true
  • Open source license? true
  • Active? true
  • Fork? false
  • Main Language: Python
  • Commits: 145
  • Committers: 6
  • Issues: 1
  • Pull Requests: 65
  • Owner: allenai
  • Stars: 30
  • Forks: 3
  • Packages: 1
  • Downloads: 11,380