https://github.com/allenai/dolma
Data and tools for generating and inspecting OLMo pre-training data.
data-processing
large-language-models
llm
machile-learning
nlp
Added: over 1 year ago - Last Synced: about 1 year ago
- Created: June 20, 2023

https://github.com/allenai/allenact
An open source framework for research in Embodied-AI from AI2.
ai
ai2
computer-vision
deep-learning
embodied-agent
python
reinforcement-learning
research
Added: over 1 year ago - Last Synced: about 1 year ago
- Created: January 14, 2020

https://github.com/allenai/smashed
SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batching, and more. Supports datasets from Huggingface, torchdata iterables, or simple lists of dictionaries.
dataset
datasets
dict
huggingface
in-context-learning
mappers
natural-language-processing
nlp
pipeline
prefix
prefix-tuning
preprocessing
prompting
pytorch
text
torchdata
transformer
transformers
Added: over 1 year ago - Last Synced: about 1 year ago
- Created: July 21, 2022
