https://github.com/allenai/dolma

Data and tools for generating and inspecting OLMo pre-training data.
data-processing large-language-models llm machile-learning nlp
Added: over 1 year ago - Last Synced: 11 months ago - Created: June 20, 2023

  • Relevant topics? true
  • External users? true
  • Open source license? true
  • Active? true
  • Fork? false
  • Main Language: Python
  • Commits: 233
  • Committers: 13
  • Issues: 87
  • Pull Requests: 135
  • Owner: allenai
  • Stars: 800
  • Forks: 78
  • Packages: 1
  • Downloads: 17,721