https://github.com/allenai/smashed

SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batching, and more. Supports datasets from Huggingface, torchdata iterables, or simple lists of dictionaries.
https://github.com/allenai/smashed

Keywords

dataset datasets dict huggingface in-context-learning mappers natural-language-processing nlp pipeline prefix prefix-tuning preprocessing prompting pytorch text torchdata transformer transformers

Keywords from Contributors

archiving measur observation generic compose large-language-models animals conversion language-model optimize

Last synced: 11 months ago
JSON representation

Acceptance Criteria

Repository metadata

SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batching, and more. Supports datasets from Huggingface, torchdata iterables, or simple lists of dictionaries.


Owner metadata


GitHub Events

Total
Last Year

Committers metadata

Last synced: about 1 year ago

Total Commits: 145
Total Committers: 6
Avg Commits per committer: 24.167
Development Distribution Score (DDS): 0.31

Commits in past year: 15
Committers in past year: 4
Avg Commits per committer in past year: 3.75
Development Distribution Score (DDS) in past year: 0.6

Name Email Commits
Luca Soldaini l****s@a****g 100
Luca Soldaini l****a@s****t 21
kyleclo k****o@u****u 14
dependabot[bot] 4****] 7
Maksym Del m****u@g****m 2
Ben Newman b****9 1

Committer domains:


Issue and Pull Request metadata

Last synced: 12 months ago

Total issues: 1
Total pull requests: 65
Average time to close issues: N/A
Average time to close pull requests: 3 days
Total issue authors: 1
Total pull request authors: 5
Average comments per issue: 0.0
Average comments per pull request: 0.17
Merged pull request: 54
Bot issues: 0
Bot pull requests: 14

Past year issues: 0
Past year pull requests: 8
Past year average time to close issues: N/A
Past year average time to close pull requests: 16 days
Past year issue authors: 0
Past year pull request authors: 3
Past year average comments per issue: 0
Past year average comments per pull request: 0.13
Past year merged pull request: 4
Past year bot issues: 0
Past year bot pull requests: 3

More stats: https://issues.ecosyste.ms/repositories/lookup?url=https://github.com/allenai/smashed

Top Issue Authors

  • soldni (1)

Top Pull Request Authors

  • soldni (40)
  • dependabot[bot] (14)
  • kyleclo (7)
  • MaksymDel (3)
  • bnewm0609 (1)

Top Issue Labels

  • documentation (1)

Top Pull Request Labels

  • dependencies (14)
  • github_actions (12)
  • python (2)

Package metadata

pypi.org: smashed

SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batching, and more. Supports datasets from Huggingface, torchdata iterables, or simple lists of dictionaries.

  • Homepage: https://github.com/allenai/smashed
  • Documentation: https://smashed.readthedocs.io/
  • Licenses: Apache-2.0
  • Latest release: 0.21.5 (published over 1 year ago)
  • Last Synced: 2024-05-26T14:42:19.169Z (12 months ago)
  • Versions: 64
  • Dependent Packages: 2
  • Dependent Repositories: 1
  • Downloads: 11,380 Last month
  • Rankings:
    • Dependent packages count: 3.137%
    • Downloads: 6.124%
    • Stargazers count: 11.541%
    • Average: 11.854%
    • Forks count: 16.824%
    • Dependent repos count: 21.642%
  • Maintainers (2)

Dependencies

.github/workflows/ci.yml actions
  • actions/cache v3 composite
  • actions/checkout v3 composite
  • actions/setup-python v4.6.1 composite
pyproject.toml pypi
  • Jinja2 >=3.0.3
  • ftfy >=6.1.1
  • glom >=21.0.0
  • necessary >=0.4.3
  • numpy >=1.19.5
  • platformdirs >=2.5.0
  • trouting >=0.3.3

Score: 14.715330386182005