https://github.com/allenai/smashed
SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batching, and more. Supports datasets from Huggingface, torchdata iterables, or simple lists of dictionaries.
https://github.com/allenai/smashed
Keywords
dataset datasets dict huggingface in-context-learning mappers natural-language-processing nlp pipeline prefix prefix-tuning preprocessing prompting pytorch text torchdata transformer transformers
Keywords from Contributors
archiving measur observation generic compose large-language-models animals conversion language-model optimize
Last synced: 11 months ago
JSON representation
Acceptance Criteria
- Revelant topics? true
- External users? true
- Open source license? true
- Active? true
- Fork? false
Repository metadata
SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batching, and more. Supports datasets from Huggingface, torchdata iterables, or simple lists of dictionaries.
- Host: GitHub
- URL: https://github.com/allenai/smashed
- Owner: allenai
- License: apache-2.0
- Created: 2022-07-21T17:11:17.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2023-10-02T12:23:22.000Z (over 1 year ago)
- Last Synced: 2024-05-03T07:15:29.777Z (about 1 year ago)
- Topics: dataset, datasets, dict, huggingface, in-context-learning, mappers, natural-language-processing, nlp, pipeline, prefix, prefix-tuning, preprocessing, prompting, pytorch, text, torchdata, transformer, transformers
- Language: Python
- Homepage: https://pypi.org/project/smashed
- Size: 4.55 MB
- Stars: 30
- Watchers: 6
- Forks: 3
- Open Issues: 6
- Releases: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Owner metadata
- Name: AI2
- Login: allenai
- Email: [email protected]
- Kind: organization
- Description:
- Website: http://www.allenai.org
- Location: Seattle, WA
- Twitter:
- Company:
- Icon url: https://avatars.githubusercontent.com/u/5667695?v=4
- Repositories: 454
- Last ynced at: 2024-04-14T22:06:46.803Z
- Profile URL: https://github.com/allenai
GitHub Events
Total
- Fork event: 3
- Create event: 101
- Release event: 39
- Issues event: 1
- Watch event: 32
- Delete event: 44
- Member event: 1
- Issue comment event: 11
- Public event: 1
- Push event: 264
- Pull request review comment event: 7
- Pull request review event: 20
- Pull request event: 114
Last Year
- Create event: 8
- Delete event: 2
- Issue comment event: 1
- Pull request event: 9
- Pull request review event: 1
- Push event: 7
- Release event: 1
- Watch event: 7
Committers metadata
Last synced: about 1 year ago
Total Commits: 145
Total Committers: 6
Avg Commits per committer: 24.167
Development Distribution Score (DDS): 0.31
Commits in past year: 15
Committers in past year: 4
Avg Commits per committer in past year: 3.75
Development Distribution Score (DDS) in past year: 0.6
Name | Commits | |
---|---|---|
Luca Soldaini | l****s@a****g | 100 |
Luca Soldaini | l****a@s****t | 21 |
kyleclo | k****o@u****u | 14 |
dependabot[bot] | 4****] | 7 |
Maksym Del | m****u@g****m | 2 |
Ben Newman | b****9 | 1 |
Committer domains:
- uw.edu: 1
- soldaini.net: 1
- allenai.org: 1
Issue and Pull Request metadata
Last synced: 12 months ago
Total issues: 1
Total pull requests: 65
Average time to close issues: N/A
Average time to close pull requests: 3 days
Total issue authors: 1
Total pull request authors: 5
Average comments per issue: 0.0
Average comments per pull request: 0.17
Merged pull request: 54
Bot issues: 0
Bot pull requests: 14
Past year issues: 0
Past year pull requests: 8
Past year average time to close issues: N/A
Past year average time to close pull requests: 16 days
Past year issue authors: 0
Past year pull request authors: 3
Past year average comments per issue: 0
Past year average comments per pull request: 0.13
Past year merged pull request: 4
Past year bot issues: 0
Past year bot pull requests: 3
Top Issue Authors
- soldni (1)
Top Pull Request Authors
- soldni (40)
- dependabot[bot] (14)
- kyleclo (7)
- MaksymDel (3)
- bnewm0609 (1)
Top Issue Labels
- documentation (1)
Top Pull Request Labels
- dependencies (14)
- github_actions (12)
- python (2)
Package metadata
- Total packages: 1
-
Total downloads:
- pypi: 11,380 last-month
- Total dependent packages: 2
- Total dependent repositories: 1
- Total versions: 64
- Total maintainers: 2
pypi.org: smashed
SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batching, and more. Supports datasets from Huggingface, torchdata iterables, or simple lists of dictionaries.
- Homepage: https://github.com/allenai/smashed
- Documentation: https://smashed.readthedocs.io/
- Licenses: Apache-2.0
- Latest release: 0.21.5 (published over 1 year ago)
- Last Synced: 2024-05-26T14:42:19.169Z (12 months ago)
- Versions: 64
- Dependent Packages: 2
- Dependent Repositories: 1
- Downloads: 11,380 Last month
-
Rankings:
- Dependent packages count: 3.137%
- Downloads: 6.124%
- Stargazers count: 11.541%
- Average: 11.854%
- Forks count: 16.824%
- Dependent repos count: 21.642%
- Maintainers (2)
Dependencies
- actions/cache v3 composite
- actions/checkout v3 composite
- actions/setup-python v4.6.1 composite
- Jinja2 >=3.0.3
- ftfy >=6.1.1
- glom >=21.0.0
- necessary >=0.4.3
- numpy >=1.19.5
- platformdirs >=2.5.0
- trouting >=0.3.3
Score: 14.715330386182005