{"id":39120,"name":"QuotaClimat","description":"The aim of this work is to deliver a tool to a consortium around QuotaClimat, Climat Medias allowing them to quantify the media coverage of the climate crisis.","url":"https://github.com/dataforgoodfr/quotaclimat","last_synced_at":"2026-05-28T08:01:58.839Z","repository":{"id":64012247,"uuid":"546002600","full_name":"dataforgoodfr/quotaclimat","owner":"dataforgoodfr","description":"Observatoire des Médias sur l'Ecologie","archived":false,"fork":false,"pushed_at":"2026-05-22T12:51:14.000Z","size":8692846,"stargazers_count":38,"open_issues_count":14,"forks_count":9,"subscribers_count":5,"default_branch":"main","last_synced_at":"2026-05-22T16:48:32.099Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://observatoiremediaecologie.fr/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dataforgoodfr.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2022-10-05T10:57:06.000Z","updated_at":"2026-05-22T12:51:32.000Z","dependencies_parsed_at":"2023-10-11T06:00:34.157Z","dependency_job_id":"e9009f6e-d1c0-4103-9e16-4252c848ca49","html_url":"https://github.com/dataforgoodfr/quotaclimat","commit_stats":{"total_commits":1435,"total_committers":19,"mean_commits":75.52631578947368,"dds":0.4801393728222997,"last_synced_commit":"e92e1b472ae92a0c6892f4179c8d76a91ff06ef2"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dataforgoodfr/quotaclimat","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dataforgoodfr%2Fquotaclimat","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dataforgoodfr%2Fquotaclimat/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dataforgoodfr%2Fquotaclimat/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dataforgoodfr%2Fquotaclimat/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dataforgoodfr","download_url":"https://codeload.github.com/dataforgoodfr/quotaclimat/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dataforgoodfr%2Fquotaclimat/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33423284,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-23T22:14:44.296Z","status":"online","status_checked_at":"2026-05-24T02:00:06.296Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"owner":{"login":"dataforgoodfr","name":"Data For Good France","uuid":"11797105","kind":"organization","description":"","email":"hellodataforgood@gmail.com","website":"http://www.dataforgood.fr","location":"France","twitter":null,"company":null,"icon_url":"https://avatars.githubusercontent.com/u/11797105?v=4","repositories_count":119,"last_synced_at":"2024-04-24T05:37:38.645Z","metadata":{"has_sponsors_listing":false},"html_url":"https://github.com/dataforgoodfr","funding_links":[],"total_stars":338,"followers":241,"following":0,"created_at":"2022-11-12T03:41:41.220Z","updated_at":"2024-04-24T05:38:16.432Z","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dataforgoodfr","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dataforgoodfr/repositories"},"packages":[],"commits":{"id":1349848,"full_name":"dataforgoodfr/quotaclimat","default_branch":"main","total_commits":2193,"total_committers":26,"total_bot_commits":6,"total_bot_committers":1,"mean_commits":84.34615384615384,"dds":0.6598267213862289,"past_year_total_commits":626,"past_year_total_committers":9,"past_year_total_bot_commits":0,"past_year_total_bot_committers":0,"past_year_mean_commits":69.55555555555556,"past_year_dds":0.5990415335463259,"last_synced_at":"2026-05-26T07:06:23.591Z","last_synced_commit":"a8d2e1ddf5d78def6cb1f2e76e0a1465ac9f35fb","created_at":"2023-09-12T10:44:54.030Z","updated_at":"2026-05-26T07:04:20.292Z","committers":[{"name":"github-actions","email":"github-actions@github.com","login":"invalid-email-address","count":746},{"name":"barometre-github-actions","email":"barometre-github-actions@github.com","login":null,"count":464},{"name":"Paul Leclercq","email":"paleclercq@gmail.com","login":"polomarcus","count":273},{"name":"gmguarino","email":"gmguarino1@gmail.com","login":"gmguarino","count":257},{"name":"Rambier Estelle","email":"38014937+estellerambier","login":"estellerambier","count":232},{"name":"vmateos1","email":"vincent.mateos1@gmail.com","login":"vmateos1","count":77},{"name":"Pib","email":"alexis.pibrac@gmail.com","login":"apibrac","count":35},{"name":"Arnaud","email":"benoits.arnaud@gmail.com","login":"arnaudbenoits","count":20},{"name":"Theo Alves Da Costa","email":"theo.alvesdacosta@ekimetrics.com","login":"TheoLvs","count":15},{"name":"c.chin.elise@gmail.com","email":"elise.chin@hotmail.fr","login":"elise-chin","count":12},{"name":"Beef","email":"bast.gauthier@gmail.com","login":null,"count":11},{"name":"Bastien Gauthier","email":"bastien.gauthier@ntymail.com","login":"BastienGauthier","count":10},{"name":"camille","email":"camille.borrett@dgccrf.finances.gouv.fr","login":null,"count":8},{"name":"dependabot[bot]","email":"49699333+dependabot[bot]","login":"dependabot[bot]","count":6},{"name":"AwaSacko","email":"89972887+AwaSacko","login":"AwaSacko","count":5},{"name":"greg-lep","email":"gregoire.lepault@gmail.com","login":"greg-lep","count":5},{"name":"TheoSchwartz","email":"theo.schwartz@orange.fr","login":"TheoSchwartz","count":4},{"name":"Thibault Jauneau","email":"thibault.jauneau@hec.edu","login":"thibault-jauneau","count":3},{"name":"ArnaudWald","email":"arnaudwald@gmail.com","login":"ArnaudWald","count":2},{"name":"Sebastien Bourgeois","email":"sebastienbourgeois60@gmail.com","login":"sebastienbourgeois","count":2},{"name":"Hanane","email":"116921007+HananeMaghlazi","login":"HananeMaghlazi","count":1},{"name":"JeanSauvignon","email":"56596801+JeanSauvignon","login":"JeanSauvignon","count":1},{"name":"Rémi","email":"113794754+RR-DataSciences","login":"RR-DataSciences","count":1},{"name":"Thibault Jauneau","email":"thibault.jauneau@aircall.io","login":null,"count":1},{"name":"Claude","email":"noreply@anthropic.com","login":null,"count":1},{"name":"mikaml","email":"michael.laidet@gmail.com","login":"Thenewnative","count":1}],"past_year_committers":[{"name":"gmguarino","email":"gmguarino1@gmail.com","login":"gmguarino","count":251},{"name":"barometre-github-actions","email":"barometre-github-actions@github.com","login":null,"count":214},{"name":"vmateos1","email":"vincent.mateos1@gmail.com","login":"vmateos1","count":77},{"name":"Pib","email":"alexis.pibrac@gmail.com","login":"apibrac","count":35},{"name":"Arnaud","email":"benoits.arnaud@gmail.com","login":"arnaudbenoits","count":20},{"name":"Paul Leclercq","email":"paleclercq@gmail.com","login":"polomarcus","count":19},{"name":"camille","email":"camille.borrett@dgccrf.finances.gouv.fr","login":null,"count":8},{"name":"JeanSauvignon","email":"56596801+JeanSauvignon","login":"JeanSauvignon","count":1},{"name":"Claude","email":"noreply@anthropic.com","login":null,"count":1}],"commits_url":"https://commits.ecosyste.ms/api/v1/hosts/GitHub/repositories/dataforgoodfr%2Fquotaclimat/commits","host":{"name":"GitHub","url":"https://github.com","kind":"github","last_synced_at":"2026-05-28T00:00:09.340Z","repositories_count":6243779,"commits_count":883362401,"contributors_count":34951131,"owners_count":1158498,"icon_url":"https://github.com/github.png","host_url":"https://commits.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://commits.ecosyste.ms/api/v1/hosts/GitHub/repositories"}},"issues_stats":{"full_name":"dataforgoodfr/quotaclimat","html_url":"https://github.com/dataforgoodfr/quotaclimat","last_synced_at":"2026-05-24T06:02:26.859Z","status":"active","issues_count":53,"pull_requests_count":652,"avg_time_to_close_issue":2996011.775,"avg_time_to_close_pull_request":1075010.7272727273,"issues_closed_count":40,"pull_requests_closed_count":583,"pull_request_authors_count":23,"issue_authors_count":4,"avg_comments_per_issue":0.39622641509433965,"avg_comments_per_pull_request":0.3205521472392638,"merged_pull_requests_count":401,"bot_issues_count":0,"bot_pull_requests_count":189,"past_year_issues_count":15,"past_year_pull_requests_count":119,"past_year_avg_time_to_close_issue":821278.2,"past_year_avg_time_to_close_pull_request":367278.56043956045,"past_year_issues_closed_count":10,"past_year_pull_requests_closed_count":91,"past_year_pull_request_authors_count":8,"past_year_issue_authors_count":2,"past_year_avg_comments_per_issue":0.9333333333333333,"past_year_avg_comments_per_pull_request":0.17647058823529413,"past_year_bot_issues_count":0,"past_year_bot_pull_requests_count":21,"past_year_merged_pull_requests_count":78,"created_at":"2023-09-12T10:45:24.822Z","updated_at":"2026-05-24T06:02:26.859Z","repository_url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/repositories/dataforgoodfr%2Fquotaclimat","issues_url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/repositories/dataforgoodfr%2Fquotaclimat/issues","issue_labels_count":{"enhancement":9,"wontfix":1,"help wanted":1,"good first issue":1,"documentation":1,"bug":1},"pull_request_labels_count":{"dependencies":189,"python":41,"wontfix":9},"issue_author_associations_count":{"COLLABORATOR":53},"pull_request_author_associations_count":{"COLLABORATOR":457,"CONTRIBUTOR":188,"NONE":6,"OWNER":1},"issue_authors":{"polomarcus":36,"gmguarino":15,"RDiPiazza":1,"apibrac":1},"pull_request_authors":{"polomarcus":295,"dependabot[bot]":189,"estellerambier":60,"gmguarino":37,"apibrac":29,"arnaudbenoits":12,"BastienGauthier":7,"HananeMaghlazi":3,"SprinTech":2,"btst-ai":2,"RDiPiazza":2,"elise-chin":2,"TheoSchwartz":2,"JeanSauvignon":1,"thibault-jauneau":1,"TheoLvs":1,"RR-DataSciences":1,"vmateos1":1,"Thenewnative":1,"greg-lep":1,"sebastienbourgeois":1,"err53":1,"camilleborrett":1},"host":{"name":"GitHub","url":"https://github.com","kind":"github","last_synced_at":"2026-05-26T00:00:23.324Z","repositories_count":14705142,"issues_count":33981483,"pull_requests_count":111325339,"authors_count":11274478,"icon_url":"https://github.com/github.png","host_url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/repositories","owners_url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/owners","authors_url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors"},"past_year_issue_labels_count":{"enhancement":1},"past_year_pull_request_labels_count":{"dependencies":19,"python":19},"past_year_issue_author_associations_count":{"COLLABORATOR":15},"past_year_pull_request_author_associations_count":{"COLLABORATOR":97,"CONTRIBUTOR":19,"NONE":1},"past_year_issue_authors":{"gmguarino":14,"apibrac":1},"past_year_pull_request_authors":{"gmguarino":35,"apibrac":29,"dependabot[bot]":19,"polomarcus":19,"arnaudbenoits":12,"camilleborrett":1,"JeanSauvignon":1,"vmateos1":1},"maintainers":[{"login":"polomarcus","count":330,"url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors/polomarcus"},{"login":"estellerambier","count":60,"url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors/estellerambier"},{"login":"gmguarino","count":52,"url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors/gmguarino"},{"login":"apibrac","count":30,"url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors/apibrac"},{"login":"arnaudbenoits","count":12,"url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors/arnaudbenoits"},{"login":"BastienGauthier","count":7,"url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors/BastienGauthier"},{"login":"HananeMaghlazi","count":3,"url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors/HananeMaghlazi"},{"login":"RDiPiazza","count":3,"url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors/RDiPiazza"},{"login":"btst-ai","count":2,"url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors/btst-ai"},{"login":"TheoSchwartz","count":2,"url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors/TheoSchwartz"},{"login":"elise-chin","count":2,"url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors/elise-chin"},{"login":"Thenewnative","count":1,"url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors/Thenewnative"},{"login":"greg-lep","count":1,"url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors/greg-lep"},{"login":"sebastienbourgeois","count":1,"url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors/sebastienbourgeois"},{"login":"err53","count":1,"url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors/err53"},{"login":"JeanSauvignon","count":1,"url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors/JeanSauvignon"},{"login":"thibault-jauneau","count":1,"url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors/thibault-jauneau"},{"login":"RR-DataSciences","count":1,"url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors/RR-DataSciences"},{"login":"vmateos1","count":1,"url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors/vmateos1"}],"active_maintainers":[{"login":"gmguarino","count":49,"url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors/gmguarino"},{"login":"apibrac","count":30,"url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors/apibrac"},{"login":"polomarcus","count":19,"url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors/polomarcus"},{"login":"arnaudbenoits","count":12,"url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors/arnaudbenoits"},{"login":"JeanSauvignon","count":1,"url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors/JeanSauvignon"},{"login":"vmateos1","count":1,"url":"https://issues.ecosyste.ms/api/v1/hosts/GitHub/authors/vmateos1"}]},"events":{"total":{"DeleteEvent":156,"MemberEvent":1,"PullRequestEvent":264,"ForkEvent":1,"IssuesEvent":27,"WatchEvent":6,"IssueCommentEvent":45,"PushEvent":635,"PullRequestReviewCommentEvent":17,"PullRequestReviewEvent":44,"CreateEvent":172,"CommitCommentEvent":298},"last_year":{"DeleteEvent":77,"PullRequestEvent":91,"ForkEvent":1,"IssuesEvent":10,"WatchEvent":1,"IssueCommentEvent":19,"PushEvent":338,"PullRequestReviewEvent":13,"PullRequestReviewCommentEvent":6,"CreateEvent":86,"CommitCommentEvent":121}},"keywords":[],"dependencies":[{"ecosystem":"actions","filepath":".github/workflows/homepage_lemonde.yml","sha":null,"kind":"manifest","created_at":"2023-02-18T23:01:51.813Z","updated_at":"2023-02-18T23:01:51.813Z","repository_link":"https://github.com/dataforgoodfr/quotaclimat/blob/main/.github/workflows/homepage_lemonde.yml","dependencies":[{"id":7772708063,"package_name":"actions/setup-python","ecosystem":"actions","requirements":"v2","direct":true,"kind":"composite","optional":false},{"id":7772708074,"package_name":"actions/checkout","ecosystem":"actions","requirements":"v2","direct":true,"kind":"composite","optional":false},{"id":7772708075,"package_name":"snok/install-poetry","ecosystem":"actions","requirements":"v1","direct":true,"kind":"composite","optional":false}]},{"ecosystem":"actions","filepath":".github/workflows/scrap_youtube.yml","sha":null,"kind":"manifest","created_at":"2023-02-18T23:01:51.980Z","updated_at":"2023-02-18T23:01:51.980Z","repository_link":"https://github.com/dataforgoodfr/quotaclimat/blob/main/.github/workflows/scrap_youtube.yml","dependencies":[{"id":7772710729,"package_name":"actions/setup-python","ecosystem":"actions","requirements":"v2","direct":true,"kind":"composite","optional":false},{"id":7772710736,"package_name":"actions/checkout","ecosystem":"actions","requirements":"v2","direct":true,"kind":"composite","optional":false},{"id":7772710740,"package_name":"snok/install-poetry","ecosystem":"actions","requirements":"v1","direct":true,"kind":"composite","optional":false}]},{"ecosystem":"actions","filepath":".github/workflows/scrap_sitemap.yml","sha":null,"kind":"manifest","created_at":"2022-12-01T12:31:28.593Z","updated_at":"2022-12-01T12:31:28.593Z","repository_link":"https://github.com/dataforgoodfr/quotaclimat/blob/main/.github/workflows/scrap_sitemap.yml","dependencies":[{"id":6776239550,"package_name":"actions/setup-python","ecosystem":"actions","requirements":"v2","direct":true,"kind":"composite","optional":false},{"id":6776239551,"package_name":"actions/checkout","ecosystem":"actions","requirements":"v2","direct":true,"kind":"composite","optional":false},{"id":6776239552,"package_name":"snok/install-poetry","ecosystem":"actions","requirements":"v1","direct":true,"kind":"composite","optional":false}]},{"ecosystem":"actions","filepath":".github/workflows/scrap_tv_program.yml","sha":null,"kind":"manifest","created_at":"2022-12-01T12:31:28.614Z","updated_at":"2022-12-01T12:31:28.614Z","repository_link":"https://github.com/dataforgoodfr/quotaclimat/blob/main/.github/workflows/scrap_tv_program.yml","dependencies":[{"id":6776239553,"package_name":"actions/setup-python","ecosystem":"actions","requirements":"v2","direct":true,"kind":"composite","optional":false},{"id":6776239554,"package_name":"actions/checkout","ecosystem":"actions","requirements":"v2","direct":true,"kind":"composite","optional":false},{"id":6776239555,"package_name":"snok/install-poetry","ecosystem":"actions","requirements":"v1","direct":true,"kind":"composite","optional":false}]},{"ecosystem":"actions","filepath":".github/workflows/check_integration.yml","sha":null,"kind":"manifest","created_at":"2023-09-24T02:24:01.418Z","updated_at":"2023-09-24T02:24:01.418Z","repository_link":"https://github.com/dataforgoodfr/quotaclimat/blob/main/.github/workflows/check_integration.yml","dependencies":[{"id":13908667744,"package_name":"actions/setup-python","ecosystem":"actions","requirements":"v2","direct":true,"kind":"composite","optional":false},{"id":13908667745,"package_name":"actions/checkout","ecosystem":"actions","requirements":"v2","direct":true,"kind":"composite","optional":false},{"id":13908667746,"package_name":"snok/install-poetry","ecosystem":"actions","requirements":"v1","direct":true,"kind":"composite","optional":false}]},{"ecosystem":"actions","filepath":".github/workflows/db_backup_on_scaleway.yml","sha":null,"kind":"manifest","created_at":"2023-09-24T02:24:01.446Z","updated_at":"2023-09-24T02:24:01.446Z","repository_link":"https://github.com/dataforgoodfr/quotaclimat/blob/main/.github/workflows/db_backup_on_scaleway.yml","dependencies":[{"id":13908667747,"package_name":"actions/setup-python","ecosystem":"actions","requirements":"v2","direct":true,"kind":"composite","optional":false},{"id":13908667748,"package_name":"actions/checkout","ecosystem":"actions","requirements":"v2","direct":true,"kind":"composite","optional":false},{"id":13908667749,"package_name":"snok/install-poetry","ecosystem":"actions","requirements":"v1","direct":true,"kind":"composite","optional":false}]},{"ecosystem":"actions","filepath":".github/workflows/main.yml","sha":null,"kind":"manifest","created_at":"2023-09-24T02:24:01.467Z","updated_at":"2023-09-24T02:24:01.467Z","repository_link":"https://github.com/dataforgoodfr/quotaclimat/blob/main/.github/workflows/main.yml","dependencies":[{"id":13908667750,"package_name":"actions/setup-python","ecosystem":"actions","requirements":"v2","direct":true,"kind":"composite","optional":false},{"id":13908667751,"package_name":"actions/checkout","ecosystem":"actions","requirements":"v2","direct":true,"kind":"composite","optional":false},{"id":13908667752,"package_name":"snok/install-poetry","ecosystem":"actions","requirements":"v1","direct":true,"kind":"composite","optional":false}]},{"ecosystem":"actions","filepath":".github/workflows/scrap_sitemap_and_ingest_db.yml","sha":null,"kind":"manifest","created_at":"2023-09-24T02:24:01.482Z","updated_at":"2023-09-24T02:24:01.482Z","repository_link":"https://github.com/dataforgoodfr/quotaclimat/blob/main/.github/workflows/scrap_sitemap_and_ingest_db.yml","dependencies":[{"id":13908667753,"package_name":"actions/setup-python","ecosystem":"actions","requirements":"v2","direct":true,"kind":"composite","optional":false},{"id":13908667754,"package_name":"actions/checkout","ecosystem":"actions","requirements":"v2","direct":true,"kind":"composite","optional":false},{"id":13908667755,"package_name":"snok/install-poetry","ecosystem":"actions","requirements":"v1","direct":true,"kind":"composite","optional":false}]}],"score":7.20934025660291,"created_at":"2023-09-12T07:49:20.264Z","updated_at":"2026-05-28T08:01:58.841Z","avatar_url":"https://github.com/dataforgoodfr.png","language":"Jupyter Notebook","category":"Sustainable Development","sub_category":"Knowledge Platforms","monthly_downloads":0,"total_dependent_repos":0,"total_dependent_packages":0,"readme":"# L’Observatoire des médias sur l’écologie (OME) ![badge](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/polomarcus/579237daab71afbb359338e2706b7f36/raw/test.json)\n\n\n![](quotaclimat/utils/coverquotaclimat.png)\nThe OME is run by a consortium of organisations comprising [Data for Good](https://dataforgood.fr/), [Eleven Strategy](https://eleven-strategy.fr/), [Expertises Climat](https://expertisesclimat.fr/), [Mediatree](https://mediatree.fr/), [Pour plus de climat dans les médias](https://climatmedias.org/), [QuotaClimat](https://quotaclimat.org/) and [éclaircies](https://eclaircies.co/). \n\nIt is supported and guided by [ADEME](https://www.ademe.fr/) as part of a call for proposals on ‘Communs Sobriété et résilience des territoires’ launched in 2022. This call for proposals aims to bring together all willing stakeholders to produce open resources, which make a significant contribution to climate change mitigation and adaptation by creating and sharing these resources.\nThe Observatory receives methodological guidance and financial support from the [Arcom](https://www.arcom.fr/).\n\nThe OME aims to measure the proportion of environmental topics in media news content. It provides the general public with objective, reliable and standardised data, enabling them to analyse how the media covers environmental issues.\n\nMore at https://observatoiremediaecologie.fr/\n\n\n### Introduction Videos\n- 2022-09-28, Introduction by Eva Morel (Quota Climat): from 14:10 to 32:00 https://www.youtube.com/watch?v=GMrwDjq3rYs\n- 2022-11-29 Project status and prospects by Estelle Rambier (Data): from 09:00 to 25:00 https://www.youtube.com/watch?v=cLGQxHJWwYA\n- 2024-03 Project tech presentation by Paul Leclercq (Data) : https://www.youtube.com/watch?v=zWk4WLVC5Hs\n\n# Development and Contribution\n## Index\n- [I want to contribute! Where do I start?](#contrib)\n- [Development](#wrench-development)\n  - [File Structure](#file_folder-file-structure)\n  - [Setting up the environment](#nut_and_bolt-setting-up-the-environment)\n\n# 🤱 I want to contribute! Where do I start?\n\n1. Learn about the project by watching the introduction videos mentioned above.\n2. Create an issue or/and join https://dataforgood.fr/join and the Slack #offseason_quotaclimat.\n3. Introduce yourself on Slack #offseason_quotaclimat\n\n##  :wrench: Development\n\n## Contributing\n\n### :nut_and_bolt: Setting up the environment\nDoing the following step will enable your local environement to be aligned with the one of any other collaborator.\n\nFirst install pyenv:\n\n\u003ctable\u003e\n\u003ctr\u003e\n\u003ctd\u003e OS \u003c/td\u003e \u003ctd\u003e Command \u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n\u003ctd\u003e MacOS \u003c/td\u003e\n\u003ctd\u003e\n\n```bash\ncd -\nbrew install pyenv # pyenv itself\nbrew install pyenv-virtualenv # integration with Python virtualenvsec\n```\n\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n\u003ctd\u003e Ubuntu \u003c/td\u003e\n\u003ctd\u003e\n\n```bash\nsudo apt-get update; sudo apt-get install make build-essential libssl-dev zlib1g-dev \\\nlibbz2-dev libreadline-dev libsqlite3-dev wget curl llvm \\\nlibncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev\n\ncurl https://pyenv.run | bash\n```\n\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n\u003ctd\u003e Windows \u003c/td\u003e\n\u003ctd\u003e\nAn installation using miniconda is generally simpler than a pyenv one on Windows.\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\n\nMake the shell pyenv aware:\n\n\u003ctable\u003e\n\u003ctr\u003e\n\u003ctd\u003e OS \u003c/td\u003e \u003ctd\u003e Command \u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n\u003ctd\u003e MacOS \u003c/td\u003e\n\u003ctd\u003e\n\n```bash\neval \"$(pyenv init --path)\"\neval \"$(pyenv init -)\"\neval \"$(pyenv virtualenv-init -)\"\n```\n\n\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n\u003ctd\u003e Ubuntu \u003c/td\u003e\n\u003ctd\u003e\n\n```bash\nexport PYENV_ROOT=\"$HOME/.pyenv\"\ncommand -v pyenv \u003e/dev/null || export PATH=\"$PYENV_ROOT/bin:$PATH\"\neval \"$(pyenv init -)\"\neval \"$(pyenv virtualenv-init -)\"\n```\n\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n\u003ctd\u003e Windows \u003c/td\u003e\n\u003ctd\u003e\n\n:fr: Dans Propriétés systèmes \u003e Paramètres système avancés \u003e  Variables d'environnement...\nChoisissez la variable \"Path\" \u003e Modifier... et ajoutez le chemin de votre installation python, où se trouve le python.exe. (par défaut, C:\\Users\\username\\AppData\\Roaming\\Python\\Scripts\\ )\n\n:uk: In System Properties \u003e Advanced \u003e  Environment Variables...\nChoose the variable \"Path\" \u003e Edit... et add the path to your python's installation, where is located the pyhton.exe (by default, this should be at C:\\Users\\username\\AppData\\Roaming\\Python\\Scripts\\ )\n\nIn the console, you can now try :\n```bash\npoetry --version\n```\n\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\n\n\n\nLet's install a python version (for windows, this step have been done with miniconda):\n```bash\npyenv install 3.11.6 # this will take time\n```\nCheck if it works properly, this command:\n```bash\npyenv versions\n```\nshould return:\n```bash\n  system\n  3.11.6\n```\n\nThen you are ready to create a virtual environment. Go in the project folder, and run:\n```bash\n  pyenv virtualenv 3.11.6 quotaclimat\n  pyenv local quotaclimat\n```\n\nIn case of a version upgrade you can perform this command to switch\n```\neval \"$(pyenv init --path)\"\npyenv activate 3.11.6/envs/quotaclimat\n```\n\nYou now need a tool to manage dependencies. Let's use poetry.\nOn windows, if not already installed, you will need a VS installation.\n\nLink : https://wiki.python.org/moin/WindowsCompilers#Microsoft_Visual_C.2B-.2B-_14.x_with_Visual_Studio_2022_.28x86.2C_x64.2C_ARM.2C_ARM64.29\n\n```bash\npip install poetry\npoetry update\npoetry lock\n```\nNLDA : I have not been able to work with wordcloud on windows. \n\nWhen you need to install a new dependency (use a new package, e.g. nltk), run \n```bash\npoetry add ntlk\n```\nUpdate dependencies\n```\npoetry self update\n```\n\nAfter commiting to the repo, other team members will be able to use the exact same environment you are using. \n\n## Docker\nFirst, have docker and compose [installed on your computer](https://docs.docker.com/compose/install/#installation-scenarios)\n\nThen to start the different services\n```\n## To run only one service, have a look to docker-compose.yml and pick one service :\ndocker compose up metabase\ndocker compose up ingest_to_db\ndocker compose up mediatree\ndocker compose up test\n```\n\n### docker secrets\ninside the \"secrets\" folder, you should have these 4 files, you can put dummy values or ask Quota Climat team for the real ones.\n```\nsecrets: # https://docs.docker.com/compose/use-secrets/\n  pwd_api:\n    file: secrets/pwd_api.txt\n  username_api:\n    file: secrets/username_api.txt\n  bucket:\n    file: secrets/scw_bucket.txt\n  bucket_secret:\n    file: secrets/scw_bucket_secret.txt\n```\n\nIf you add a new dependency, don't forget to rebuild\n```\ndocker compose build test # or ingest_to_db, mediatree etc\n```\n### Explore postgres data using Metabase - a BI tool\n```\ndocker compose up metabase -d\n```\nWill give you access to Metabase to explore the SQL table `sitemap table` or `keywords` here : http://localhost:3000/\n\nTo connect to it you have use the variables used inside `docker-compose.yml` :\n* password: password\n* username: user\n* db: barometre\n* host : postgres_db\n\n#### Production metabase\nIf we encounter [a OOM error](https://www.metabase.com/docs/latest/troubleshooting-guide/running.html#heap-space-outofmemoryerrors), we can set this env variable : `JAVA_OPTS=-Xmx2g`\n\n### Web Press - How to scrap \nThe scrapping of sitemap.xml is done using the library [advertools.](https://advertools.readthedocs.io/en/master/advertools.sitemaps.html#)\n\nA great way to discover sitemap.xml is to check robots.txt page of websites : https://www.midilibre.fr/robots.txt\n\nWhat medias to parse ? This [document](https://www.culture.gouv.fr/Thematiques/Presse-ecrite/Tableaux-des-titres-de-presse-aides2) is a good start.\n\nLearn more about [site maps here](https://developers.google.com/search/docs/crawling-indexing/sitemaps/news-sitemap?visit_id=638330401920319694-749283483\u0026rd=1\u0026hl=fr).\n\n#### Scrap every sitemaps\nBy default, we use a env variable `ENV` to only parse from localhost. If you set this value to another thing that `docker` or `dev`, it will parse everything.\n\n## Test\nThanks to the nginx container, we can have a local server for sitemap :\n* http://localhost:8000/sitemap_news_figaro_3.xml\n\n```\ndocker compose up -d nginx # used to scrap sitemap locally - a figaro like website with only 3 news\n# docker compose up test with entrypoint modified to sleep\n# docker exec test bash\npytest -vv --log-level DEBUG test # \"test\" is the folder containing tests\n# Only one test\npytest -vv --log-level DEBUG -k detect\n# OR\ndocker compose up test # test is the container name running pytest test\n```\n\n## Deploy\nEvery commit on the `main` branch will build an deploy to the Scaleway container registry a new image that will be deployed. Have a look to `.github/deploy-main.yml`.\n\nLearn [more here.](https://www.scaleway.com/en/docs/tutorials/use-container-registry-github-actions/)\n\n## Monitoring\nWith Sentry, with env variable `SENTRY_DSN`.\n\nLearn more here : https://docs.sentry.io/platforms/python/configuration/options/\n\n### Send logs to Sentry\nBy setting `SENTRY_LOGGING` to `\"true\"` we can send the job logs to sentry (as Scaleway Cockpit is not always reliable).\n\nBe aware that the first 5GB/month of logs are free, after that it's 0,5$/GB.\n\n## Mediatree - Import data\nMediatree Documentation API : https://keywords.mediatree.fr/docs/\n\nYou must contact QuotaClimat team to 2 files with the API's username and password inside : \n* secrets/pwd_api.txt\n* secrets/username_api.txt\n\nOtherwise, a mock api response is available at https://github.com/dataforgoodfr/quotaclimat/blob/main/test/sitemap/mediatree.json\n\nYou can check the API with\n```\ncurl -X POST https://keywords.mediatree.fr/api/auth/token/ \\\n               -H \"Content-Type: application/x-www-form-urlencoded\" \\\n               -d \"grant_type=password\" \\\n               -d \"username=USERNAME\" \\\n               -d \"password=PASSWORD\"\n```\n\n```\ncurl -X GET \"https://keywords.mediatree.fr/api/epg/?channel=tf1\u0026start_gte=2024-09-01T00:00:00\u0026start_lte=2024-09-01T23:59:59\u0026token=TOKEN_RECEIVED_FROM_PREVIOUS_QUERY\"\n```\n\n\n### Run\n```\ndocker compose up mediatree\n```\n\n### Configuration - Batch import\n### Based on time\nIf our media perimeter evolves, we have to reimport it all using env variable `START_DATE` like in docker compose (epoch second format : 1705409797). By default, it will import 1 day, you can modify it with `NUMBER_OF_PREVIOUS_DAYS` (integer).\n\nOtherwise, default is yesterday midnight date (default cron job).\n\n#### Production safety nets\nAs Scaleway Serverless service can be down, if some dates are missing until today, it will start back from the latest date saved until today.\n\n### Replay data\nWhen dictionary change, we have to replay our data to update already saved data.\n**As pandas to_sql with a little tweak can use upsert (update/insert)**, if we want to update already saved rows, we have to use :\n* `START_DATE` \n* `NUMBER_OF_PREVIOUS_DAYS` \n\nFor example to replay data from 2024-05-30 to 2024-05-01 we do from docker compose job \"mediatree\" (or scaleway job):\n* `START_DATE` with unix timestamp of 2024-05-30 (1717020556)\n* `NUMBER_OF_PREVIOUS_DAYS` to 30 to get back to 2024-05-01.\n\n**Warning**: it might take several hours.\n\n### Based on channel\nUse env variable `CHANNEL` like in docker compose (string: tf1)\n\nOtherwise, default is all channels\n\n### Update without querying Mediatre API\nIn case we have a new word detection logic - and already saved data from Mediatree inside our DB (otherwise see Batch import based on time or channel) - we can re-apply it to all saved keywords inside our database.\n\n⚠️ in this case, as we won't requery Mediatree API so we can miss some chunks, but it's faster. Choose wisely between importing/updating.\n\nWe should use env variable `UPDATE`  like in docker compose (should be set to \"true\")\n\nIn order to see actual change in the local DB, run the test first `docker compose up test` and then these commands :\n```\ndocker exec -ti quotaclimat-postgres_db-1 bash # or docker compose exec postgres_db bash\npsql -h localhost --port 5432 -d barometre -U user\n--\u003e enter password : password\nUPDATE keywords set number_of_keywords=1000 WHERE id = '71b8126a50c1ed2e5cb1eab00e4481c33587db478472c2c0e74325abb872bef6';\nUPDATE keywords set number_of_keywords=1000 WHERE id = '975b41e76d298711cf55113a282e7f11c28157d761233838bb700253d47be262';\n```\n\nAfter having updated `UPDATE` env variable to true inside docker-compose.yml and running `docker compose up mediatree` you should see these logs : \n```\n update_pg_keywords.py:20 | Difference old 1000 - new_number_of_keywords 0\n```\n\nWe can adjust batch update with these env variables (as in the docker-compose.yml): \n```\nBATCH_SIZE: 50000 # number of records to update in one batch\n```\n### Update only one channel \nUse env variable `CHANNEL` like in docker compose (string: tf1) with `UPDATE` to true\n\n### Batch program data\n`UPDATE_PROGRAM_ONLY` to true will only update program metadata, otherwise, it will update program metadata and all theme/keywords calculations.\n\n`UPDATE_PROGRAM_CHANNEL_EMPTY_ONLY` to true will only update program metadata with empty value : \"\".\n\n### Batch update from a date\nWith +1 millions rows, we can update from an offset to fix a custom logic by using `START_DATE_UPDATE` (YYYY-MM-DD - default first day of the current month), the default will use the end of the month otherwise you can specify `END_DATE` (optional) (YYYY-MM-DD) to batch update PG from a date range.\n\nEnv variables list : \n* START_DATE_UPDATE : string (YYYY-MM-DD ) - default to today - minus NUMBER_OF_DAYS (date is included in the query)\n* END_DATE : string (YYYY-MM-DD ) - default to end of the month (date is included in the query)\n* NUMBER_OF_DAYS : integer default to 7 days - number of days to update from (START_DATE_UPDATE - NUMBER_OF_DAYS) until START_DATE_UPDATE if START_DATE_UPDATE is empty\n* STOP_WORD_KEYWORD_ONLY: boolean, default to False. If true will only update rows whose plaintext match top stop words' keyword. It uses to speed up update.\n* BIODIVERSITY_ONLY: boolean (default=false), if true will only update rows that have at least one number_of_biodiversity_* \u003e 0\n\nExample inside the docker-compose.yml mediatree service -\u003e START_DATE_UPDATE: 2024-04-01 - default END_DATE will be 2024-04-30\n \nWe can use [a Github actions to start multiple update operations with different date, set it using the matrix](https://github.com/dataforgoodfr/quotaclimat/blob/main/.github/workflows/scaleway-start-import-job-update.yml)\n\n\n#### Production executions\n~55 minutes to update 50K rows on a mVCPU 2240 - 4Gb RAM on Scaleway.\nEvery month has ~80K rows.\n\n## SQL Tables evolution\nUsing [Alembic](https://alembic.sqlalchemy.org/en/latest/autogenerate.html) Auto Generating Migrations¶ we can add a new column inside `models.py` and it will automatically make the schema evolution :\n\n```\n# If changes have already been applied (on your feature vranch) and you have to recreate your alembic file by doing :\n# 1. change to your main branch \ngit  switch main\n# 2. start test container (docker compose up testconsole -d / docker compose exec testconsole bash) and run \"pytest -vv -k api\" to rebuild the state of the DB (or drop table the table you want) - just let it run a few seconds.\n# 3. rechange to your WIP branch \ngit switch -\n# 4. connect to the test container : docker compose up testconsole -d / docker compose exec testconsole bash\n# 5. reapply the latest saved state : \npoetry run alembic stamp head\n# 6. Save the new columns\npoetry run alembic revision --autogenerate -m \"Add new column test for table keywords\"\n# this should generate a file to commit inside \"alembic/versions\"\n# 7. to apply it we need to run, from our container\npoetry run alembic upgrade head\n```\n\nInside our Dockerfile, we call this line \n```\n# to migrate SQL tables schema if needed\nRUN alembic upgrade head\n```\n### Channel metadata\nIn order to maintain channel perimeter (weekday, hours) up to date, we save the current version inside `postgres/channel_metadata.json`, if we modify this file the next deploy will update every lines of inside Postgresql table `channel_metadata`.\n\n## Keywords\n## Produce keywords list from Excel files\nHow to update `quotaclimat/data_processing/mediatree/keyword/keyword.py` from shared excel files ?\nDownload files locally to \"document-experts\" from Google Drive (ask on Slack) then :\n\nMacro category sheet must be downloaded as a TSV as Dictionnaire - OME.xlsx - Catégories Transversales.tsv.\n\n```\n# Be sure to have updated the folder \"document-experts\" before running it :\npoetry run python3 transform_excel_to_json.py\n```\n\n## Program Metadata table\nThe media perimeter is defined here : \"quotaclimat/data_processing/mediatree/channel_program_data.py\"\n\nTo evolve the media perimeter, we use `program_grid_start` and `program_grid_end` columns to version all evolutions.\n\nTo calculate the right total duration for each channel, after updating \"quotaclimat/data_processing/mediatree/channel_program_data.py\" you need to execute this command to update `postgres/program_metadata.json` \n```\npoetry run python3 transform_program.py\n```\nThe SQL queries are based on this file that generate the Program Metadata table.\n\nProgram data will not be updated to avoid lock concurrent issues when using `UPDATE=true` for keywords logic. Note: The default case will update them.\n\n**With the docker-entrypoint.sh this command is done automatically, so for production uses, you will not have to run this command.**\n\n# Mediatre to S3\nFor a security nets, we have configured at data pipeline from Mediatree API to S3 (Object Storage Scaleway) with partition :\n* country/year/month/day/channel\nIf France, country code is None for legacy purposes.\n\nEnv variable used :\n* START_DATE (integer) (unixtimestamp such as mediatree service)\n* NUMBER_OF_PREVIOUS_DAYS (integer): default 7 days to check if something missing\n* CHANNEL: (such as mediatree service)\n* BUCKET : Scaleway Access key\n* BUCKET_SECRET : Scaleway Secret key\n* BUCKET_NAME\n* DEFAULT_WINDOW_DURATION: int (default=20), the time window to divide the mediatree's 2 minute chunk (must be 120 secondes / DEFAULT_WINDOW_DURATION == 0)\n* COUNTRY : 3 letter country code (default = fra - [Source](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-3)), see country.py to see them all - to get all countries the code is \"all\". \n\n# Stop words\nTo prevent advertising keywords to blow up statistics, we remove stop words based on the number of times a keyword is said in the same context.\n\nThe result will be saved inside postgresql table: stop_word.\n\nThis table is read by the service \"mediatree\" to remove stop words from the field \"plaintext\" to avoid to count them.\n\nEnv variables used : \n* START_DATE (integer) (unixtimestamp such as mediatree service)\n* NUMBER_OF_PREVIOUS_DAYS (integer): default 7 days\n* MIN_REPETITION (integer) : default 15 - Number of minimum repetition of a stop word\n* CONTEXT_TOTAL_LENGTH (integer) : default 80 - the length of the advertising context (sentence) saved\n* FILTER_DAYS_STOP_WORD (integer): default 30 - number of days to filter the last stop words saved from - to speed up update execution\n \n## Remove a stop word\nTo remove a false positive, we set to false the `validated` attribute :\n```\ndocker exec -ti quotaclimat-postgres_db-1 bash # or docker compose exec postgres_db bash\npsql -h localhost --port 5432 -d barometre -U user\n--\u003e enter password : password\nUPDATE stop_word set validated=false WHERE id = 'MY_ID';\n```\n\n## Production monitoring\n* Use scaleway\n* Use [Ray dashboard] on port 8265\n\n## Bump version\n[poetry bump](https://python-poetry.org/docs/cli/#version)\n```\npoetry version minor\n```\n\n## Materialized view - dbt\nWe can define some slow queries to make them efficient with materialized views using [DBT](https://www.getdbt.com/), used via docker :\n```\ndocker compose up testconsole -d\ndocker compose exec testconsole bash\n\u003e dbt debug  # check if this works\n# caution: this seed will reinit the keywords and program_metadata tables\n\u003e dbt seed --select program_metadata --select keywords --full-refresh  # will empty your local db - order is important\n\u003e dbt run --models homepage_environment_by_media_by_month # change by your file name\n\u003e poetry run pytest --log-level DEBUG -vv my_dbt_project/pytest_tests # unit test \n```\n\n**Protips**: [Explore these data with postgres data using Metabase locally](https://github.com/dataforgoodfr/quotaclimat?tab=readme-ov-file#explore-postgres-data-using-metabase---a-bi-tool)\n\n### DBT production\nTo update monthly our materialized view in production we have to use this command ([automatically done inside our docker-entrypoint](https://github.com/dataforgoodfr/quotaclimat/blob/main/docker-entrypoint.sh#L17)) that is run on every deployement of api-import (daily) :\n```\npoetry run dbt run --full-refresh\n```\n\n#### Causal query - too slow\nBecause this query is too massive, we set it month by month and avoid using a full-refresh. See units tests and docker-entrypoint.sh to see how.\n\nIf we change the DBT code, we have to relaunch this command to have a refreshed view (or wait the next daily cron).\n\n### SRT to Mediatree Format\nSome Speech to Text data come from other sources than Mediatree, so we have to transform those source into the mediatree format to process them.\n\n#### Run germany\nOnly for german data using parquet\n```\ndocker compose up srt\n```\nor\n```bash\ndocker compose up testconsole -d\ndocker compose exec testconsole bash\n/app/ cd i8n/\n/app/i8n# poetry run python3 srt-to-mediatree-format-parquet.py\n```\n#### Run belgium\nOnly for belgian data using .csv\n\nWarning: this job is not automated as the process depending on getting the data is manual (emails), so we have to modify [the script here](https://github.com/dataforgoodfr/quotaclimat/blob/01e5ede5152d4113c68bcf994f13c7b2baa30dd6/i8n/srt-to-mediatree-format.py#L259-L262).\n```\ndocker compose up testconsole -d\ndocker compose exec testconsole bash\n/app/ cd i8n/\n/app/i8n# poetry run python3 srt-to-mediatree-format.py\n```\n\n### Fix linting\nBefore committing, make sure that the line of codes you wrote are conform to PEP8 standard by running:\n```bash\npoetry run black .\npoetry run isort .\npoetry run flake8 .\n```\nThere is a debt regarding the cleanest of the code right now. Let's just not make it worth for now.\n\n# Labelstudio\nFor the Climate Safeguards project we ingest the data present in the Labelstudio databases into the Barometre database. This way we can analyse the annotations of the factcheckers and extract key insights. There are two main tables that we need to ingest:\n* `task`\n* `task_completion`\nThe `task` table contains an item (a 2 minute segment) with some metadata and an ID, and the `task_completion` table contains the annotations for that task. As the Labelstudio databases are separate from each other, we create a `labelstudio_task_aggregate` table and `labelstudio_task_completion_aggregate` table to perform a union of all the tasks and annotations. We also create `task_aggregate_id` and `task_completion_aggregate_id` to uniquely identify each task and annotation based on a hash of the table's id, project_id column and country column.\n## SQLAlchemy models\nThe models for the two tables can be found in `quotaclimat/data_ingestion/labelstudio/models.py`, any update to these models can be tracked with `alembic`, as the migration tool has been setup to track `TargetBase` as well as the already existing tables.\n## Source configuration\nThe sources for the Labelstudio ingestion can be found in `quotaclimat/data_ingestion/labelstudio/configs.py`. The `db_config` variable consists of a list of record (python dictionaries) with the source database name and a mapping of the project ids to countries:\n```python\ndb_config = [\n  {\n    \"database\": \"\u003clabelstudio_db\u003e\", \n    \"countries\": {\n      1: \"\u003ccountry_1\u003e\", \n      2: \"\u003ccountry_2\u003e\", \n      3: \"\u003ccountry_3\u003e\",\n    }\n  },\n]\n```\nWhen a new source is added (if a new Labelstudio instance is deployed) it suffices to add the source to the record list. (This assumes that all sources are on the same DB instace, as is the case at the time of writing).\n## Local execution\nIn order to execute the ingestion script locally you will need either have a working labelstudio locally, or to connect to the remote labelstudio with a read-only user.\nSet your credentials in the `docker-compose.yml` file:\n```\nLABELSTUDIO_INGESTION_POSTGRES_USER: \u003cuser\u003e\nLABELSTUDIO_INGESTION_POSTGRES_PASSWORD: \u003cpassword\u003e\n```\nand run the script via the test console:\n```bash\ndocker compose up testconsole -d \ndocker compose exec testconsole bash\npoetry run python -m quotaclimat.data_ingestion.labelstudio.ingest_labelstudio\n```\n\n# Analytics\nIn order to improve the performance of the dashboards hosted on Metabase, intermediate tables are calculated using `dbt` in the `analytics` schema. These can be found in `my_dbt_project/models/analytics`. The idea is to add a second layer to our database where we will store the more elaborated data used for our visualization. A schema of this evolution is seen below:\n![Data tiers diagram](docs/images/data_tiers.png \"Data Tiers\")\nThese dbt models need to be run using the `--target analytics` command. You can test these locally using the test console:\n```bash\ndocker compose up testconsole -d \ndocker compose exec testconsole bash\n# Seed the labelstudio tables\npoetry run dbt seed --select program_metadata --select labelstudio_task_aggregate --select labelstudio_task_completion_aggregate\n# run the dbt model on the analytics target\npoetry run dbt run --target analytics --select task_global_completion\n```\n## Thanks\n* [Paul Leclercq] (https://www.epauler.fr/)\n* [Eleven-Strategy](https://www.welcometothejungle.com/fr/companies/eleven-strategy)\n* [Kevin Tessier](https://kevintessier.fr)","funding_links":[],"readme_doi_urls":[],"works":{},"citation_counts":{},"total_citations":0,"keywords_from_contributors":["climate-change"],"project_url":"https://ost.ecosyste.ms/api/v1/projects/39120","html_url":"https://ost.ecosyste.ms/projects/39120"}