OSDG Community Dataset
A public dataset of thousands of text excerpts, validated by OSDG Community Platform citizen scientists with respect to the Sustainable Development Goals.
https://github.com/osdg-ai/osdg-data
Category: Sustainable Development
Sub Category: Sustainable Development Goals
Keywords
citizen-science citsci crowdsourcing dataset digital-public-goods machine-learning open-data public-good public-goods sdg sdg-data sdgs sustainability sustainable-development-goals united-nations
Last synced: about 13 hours ago
JSON representation
Repository metadata
The OSDG Community Dataset (OSDG-CD) is a public dataset of thousands of text excerpts, validated by OSDG Community Platform (OSDG-CP) citizen scientists with respect to the Sustainable Development Goals (SDGs). The dataset is updated every quarter and published on Zenodo.
- Host: GitHub
- URL: https://github.com/osdg-ai/osdg-data
- Owner: osdg-ai
- License: gpl-3.0
- Created: 2021-07-08T06:41:57.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2023-10-02T06:33:47.000Z (over 1 year ago)
- Last Synced: 2025-04-17T22:43:08.211Z (9 days ago)
- Topics: citizen-science, citsci, crowdsourcing, dataset, digital-public-goods, machine-learning, open-data, public-good, public-goods, sdg, sdg-data, sdgs, sustainability, sustainable-development-goals, united-nations
- Homepage:
- Size: 6.86 MB
- Stars: 31
- Watchers: 0
- Forks: 8
- Open Issues: 1
- Releases: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
README.md
Dataset Information
The OSDG Community Dataset (OSDG-CD) is the direct result of the work of hundreds of volunteers who have contributed to our understanding of Sustainable Development Goals (SDGs) via the OSDG Community Platform (OSDG-CP). It contains thousands of text excerpts which were labelled by the community volunteers with respect to SDGs. The data can be used to derive insights into the nature of SDGs using either ontology-based or machine learning approaches. The OSDG Community Dataset will be updated on a quarterly basis.
Please note that all versions of the dataset are hosted on Zenodo. This repository is only intended to provide examples of how the dataset can be used in practice. You can access different versions of the dataset using DOI handles above. The Most Recent handle always resolves to the latest version.
Version | DOI Handle |
---|---|
Most Recent | |
Version 2023.10 | |
Version 2023.07 | |
Version 2023.04 | |
Version 2023.01 | |
Version 2022.10 | |
Version 2022.07 | |
Version 2022.04 | |
Version 2022.01 | |
Version 2021.09 |
Methodology
The OSDG Community Platform is an ambitious attempt to bring together volunteers and subject matter experts from all around the world to create a large and accurate source of textual information on SDGs. It uses publicly available texts such as publications, reports and other written data sources. Each text is broken down into smaller pieces of paragraph length. These smaller pieces are then being labelled by the Community volunteers. Since the texts we collect have suggested labels associated with them – these usually come from the data source and do not necessarily reflect the content of a particular paragraph – each volunteer is presented with a single simple question that asks if the suggested label is indeed relevant for the short text at hand. Texts are labelled by multiple volunteers to ensure a high degree of quality.
Documentation
The OSDG-CD dataset is provided in a .csv
format on Zenodo. It is a flat tabular dataset that contains the following columns:
doi
- Digital Object Identifier of the original document;text_id
- unique text identifier;text
- text excerpt from the document;sdg
- the SDG the text is validated against;labels_negative
- the number of volunteers who rejected the suggested SDG label;labels_positive
- the number of volunteers who accepted the suggested SDG label;agreement
- agreement score based on the formula $\text{agreement} = \frac{|labels_{positive} - labels_{negative}|}{labels_{positive} + labels_{negative}}$;
Relevant Papers
Pukelis, L., Bautista-Puig, N., Statulevičiūtė, G., Stančiauskas, V., Dikmener, G., & Akylbekova, D. (2022, November 21). OSDG 2.0: A multilingual tool for classifying text data by UN Sustainable Development Goals (SDGs). arXiv.org. https://doi.org/10.48550/arXiv.2211.11252
Pukelis, L., Puig, N. B., Skrynik, M., & Stanciauskas, V. (2020, May 29). OSDG -- Open-source approach to classify text data by UN Sustainable Development Goals (sdgs). arXiv.org. https://arxiv.org/abs/2005.14569
Usage Examples
Examples of text classification using OSDG-CD can be found under the examples
directory:
osdg-cd-example-classifier-sklearn.ipynb
(open in nbviewer)
Share Your Work
The OSDG Community Dataset (OSDG-CD) is made available for research purposes. We are making the data open with the hope to enable researchers to discover new insights into and meaningful connections among Sustainable Development Goals.
We would like to know what you discover in the data. So do not hesitate to share with us your outputs, be it a research paper, a machine learning model, a blog post, or just an interesting observation. Send us an email at [email protected].
If you are using the dataset in a research paper, please cite the original version as follows:
OSDG, UNDP IICPSD SDG AI Lab, & PPMI. (2021). OSDG Community Dataset (OSDG-CD) (2021.09) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.5550238.
To cite a specific version, use the template provided on Zenodo.
Contribute to OSDG
This dataset is made possible because of a large community effort. We would be glad to see your contribution to the project too. You can join our Community Platform to help us collect more labelled data. If you have a more technical background, you can also contribute to the OSDG Labelling Tool here. If you want to contribute to the project in some other way, do let us know via this contact form.
To learn more about the OSDG project, visit osdg.ai.
Owner metadata
- Name: OSDG
- Login: osdg-ai
- Email: [email protected]
- Kind: organization
- Description: Open Source SDG Classification Tool
- Website: https://osdg.ai
- Location:
- Twitter: OSDG_ai
- Company:
- Icon url: https://avatars.githubusercontent.com/u/74975722?v=4
- Repositories: 2
- Last ynced at: 2023-03-04T07:26:46.899Z
- Profile URL: https://github.com/osdg-ai
GitHub Events
Total
- Watch event: 4
Last Year
- Watch event: 4
Committers metadata
Last synced: 7 days ago
Total Commits: 10
Total Committers: 3
Avg Commits per committer: 3.333
Development Distribution Score (DDS): 0.5
Commits in past year: 0
Committers in past year: 0
Avg Commits per committer in past year: 0.0
Development Distribution Score (DDS) in past year: 0.0
Name | Commits | |
---|---|---|
mykolaskrynnyk | 4****k | 5 |
guste55 | 6****5 | 4 |
Jonas | 4****l | 1 |
Committer domains:
Issue and Pull Request metadata
Last synced: 1 day ago
Total issues: 2
Total pull requests: 2
Average time to close issues: 13 days
Average time to close pull requests: about 1 month
Total issue authors: 2
Total pull request authors: 1
Average comments per issue: 1.5
Average comments per pull request: 0.0
Merged pull request: 1
Bot issues: 0
Bot pull requests: 0
Past year issues: 0
Past year pull requests: 0
Past year average time to close issues: N/A
Past year average time to close pull requests: N/A
Past year issue authors: 0
Past year pull request authors: 0
Past year average comments per issue: 0
Past year average comments per pull request: 0
Past year merged pull request: 0
Past year bot issues: 0
Past year bot pull requests: 0
Top Issue Authors
- filippo82 (1)
- jonas-nothnagel (1)
Top Pull Request Authors
- jonas-nothnagel (2)
Top Issue Labels
Top Pull Request Labels
Score: 4.564348191467836