Landbruget.dk

Organizes data from 18+ Danish government sources into a single, queryable platform to collect, clean, and publish agricultural, environmental, and regulatory data so that journalists, researchers, and citizens can hold the industry accountable.
https://github.com/klimabevaegelsen/landbruget.dk

Category: Consumption
Sub Category: Agriculture and Nutrition

Last synced: about 16 hours ago
JSON representation

Repository metadata

Et projekt for at formidle data om dansk landbrug

README.md

Landbruget.dk

Making Danish agricultural data transparent and universally accessible.

Landbruget.dk organizes data from 18+ Danish government sources into a single, queryable platform. We collect, clean, and publish agricultural, environmental, and regulatory data so that journalists, researchers, and citizens can hold the industry accountable.

Project Structure

landbruget.dk/
├── frontend/               # Next.js 16 — interactive map + data visualization
├── frontend-pesticide/     # Next.js 16 — PFAS/pesticide exposure maps
├── data-explorer/          # Next.js 16 — browser-based SQL queries on Parquet files
├── backend/                # Python data pipelines (medallion architecture)
│   ├── pipelines/          # 12 data pipelines (CHR, unified, climate, etc.)
│   └── common/             # Shared utilities (DuckDB, R2, CRS)
├── supabase/               # PostgreSQL migrations + Edge Functions
├── schema/                 # Data catalog (183 datasets) + relationship docs
├── docs/                   # Pipeline index, data lineage, troubleshooting
├── scripts/                # Utility scripts (worktree setup)
└── .github/                # CI/CD workflows (30+ GitHub Actions)

Tech Stack

Layer Technology Purpose
Frontend Next.js 16, React 19, TypeScript, Tailwind CSS 4 Web applications
Maps MapLibre GL JS, PMTiles Geospatial visualization
Backend Python 3.11+, DuckDB, ibis-framework Data pipelines
Database Supabase (PostgreSQL 15 + PostGIS) Storage + API
Data Storage Cloudflare R2 Raw + processed data
CI/CD GitHub Actions Pipeline orchestration
Deployment Vercel Frontend hosting
Linting oxlint (frontend), ruff (backend) Code quality
Testing Playwright (frontend), pytest (backend) Quality assurance

Quick Start

Prerequisites

  • Node.js 18+
  • Python 3.11
  • uv
  • Supabase CLI

Workspace Setup

./scripts/setup-worktree.sh

This installs frontend dependencies, Playwright browsers, and the shared Python workspace via uv, pinned to Python 3.11 via .python-version, then verifies npm test, npm run lint, and uv run pytest can resolve their local tooling.

Frontend

cd frontend
cp .env.example .env.local    # Configure Supabase credentials
npm ci
npm run dev                   # http://localhost:3000

Data Explorer

cd data-explorer
cp .env.local.example .env.local  # Add Google API key for Gemini
npm install
npm run dev                       # http://localhost:3000

Backend Pipelines

uv sync --python 3.11 --all-packages --group dev

# Run a specific pipeline
cd backend/pipelines/unified_pipeline
uv run python -m unified_pipeline bronze --source cadastral

Data Architecture

Medallion Architecture

All data flows through three layers:

  • Bronze — Raw data preserved exactly as received from sources. No transformations.
  • Silver — Cleaned, validated, and standardized. Type coercion, deduplication, format normalization.
  • Gold — Analysis-ready. Joins across datasets on CVR/CHR/BFE identifiers, derived metrics.

Coordinate Reference System

All geospatial processing uses EPSG:25832 (UTM 32N, meters). Data is transformed to EPSG:4326 (WGS84) only at the final Supabase upload step.

Data Identifiers

All datasets join on one or more of:

Identifier Name Format Purpose
CVR Company Registration 8 digits Links to companies
CHR Central Herd Register 6 digits Links to livestock herds
BFE Cadastral ID Variable Links to land parcels

Data Sources

We collect from 18+ official Danish government sources including:

  • Landbrugsstyrelsen — Field boundaries, crop data, agricultural subsidies
  • Fødevarestyrelsen (FVM) — Livestock registry (CHR), veterinary data, pig movements
  • Miljøstyrelsen — Pesticide database (BMD), environmental company registry (DMA)
  • Geodatastyrelsen — Cadastral data, administrative boundaries
  • Danmarks Statistik — Agricultural statistics
  • DMI — Weather and climate data
  • Arbejdstilsynet — Workplace safety inspections
  • Datafordeleren — Property ownership data
  • GEUS — Borehole pesticide data (Dataverse)

See docs/PIPELINE_INDEX.md for the full pipeline documentation.

Pipelines

Pipeline Purpose Schedule
unified_pipeline 18+ government data sources Weekly (Mon 2 AM UTC)
chr_pipeline Livestock registry + veterinary data Weekly
svineflytning_pipeline Pig movement tracking Weekly (Wed 2 AM UTC)
climate Farm-level CO2e emissions On demand
bmd_scraper Pesticide database Monthly
dma_scraper Environmental company registry Monthly
drive_data_pipeline Google Drive regulatory docs On demand
bbr_buildings Building registry Monthly
arbejdstilsynet_inspections Workplace safety inspections On demand
h3_pfas_exposure_pipeline PFAS exposure mapping Weekly
property_owners_sftp Property ownership data Manual

Development

Running Tests

# Frontend
cd frontend && npm test         # Playwright E2E
cd frontend && npm run lint     # oxlint

# Backend
uv run --all-packages --group dev pytest   # pytest
uv run --all-packages --group dev ruff check backend
uv run --all-packages --group dev ruff format backend

Branch Naming

Format: <type>/<short-description> — e.g. feat/map-view, fix/chr-data-load

Types: feat, fix, docs, refactor, test, chore, ci, perf, build

Commit Messages

<type>(<scope>): <subject>

Examples: feat(frontend): add interactive map view, fix(pipeline): correct CHR transformation

Contributing

  1. Create a branch from main following the naming convention above
  2. Make your changes
  3. Run all tests (npm test + uv run --all-packages --group dev pytest)
  4. Run linters (npm run lint + uv run --all-packages --group dev ruff check backend)
  5. Open a pull request — all PRs require review before merge

License

Code

The source code in this repository is licensed under the MIT License.

Data

The MIT license does not cover the data. This includes both:

  • Upstream data ingested from Danish public-sector sources (Landbrugsstyrelsen, CHR Registry, Geodatastyrelsen, Miljøstyrelsen, Danmarks Statistik, DMI, and others — see docs/PIPELINE_INDEX.md), and
  • Derived datasets we publish to the Cloudflare R2 CDN (JSON, Parquet, PMTiles).

Each dataset retains the licensing terms of its original source — typically Danish government open-data terms or, where applicable, a Creative Commons license. Where the source is not openly licensed, the original copyright and conditions of the issuing public authority apply. Reusing data from this project requires complying with the upstream source's terms.

See /om-os for the project's overall data policy and /kilder for per-source attribution.

The data is provided "as is and as available" — no warranty is given as to completeness, accuracy, or timeliness.


Owner metadata


GitHub Events

Total
Last Year

Committers metadata

Last synced: 3 days ago

Total Commits: 1,435
Total Committers: 9
Avg Commits per committer: 159.444
Development Distribution Score (DDS): 0.172

Commits in past year: 1,258
Committers in past year: 8
Avg Commits per committer in past year: 157.25
Development Distribution Score (DDS) in past year: 0.163

Name Email Commits
Martin Collignon 2****n 1188
dependabot[bot] 4****] 150
Alexander Lindkjær 3****r 50
Rahul Sahoo r****6@g****m 16
aleksanderbl29 g****b@a****k 14
Việt Hoàng 4****o 11
EdNg115 n****s@g****m 4
Jameleddine Amri 4****a 1
Claude Code n****y@a****m 1

Committer domains:


Issue and Pull Request metadata

Last synced: 4 days ago

Total issues: 79
Total pull requests: 158
Average time to close issues: about 1 month
Average time to close pull requests: 3 days
Total issue authors: 4
Total pull request authors: 10
Average comments per issue: 0.51
Average comments per pull request: 1.09
Merged pull request: 98
Bot issues: 0
Bot pull requests: 19

Past year issues: 46
Past year pull requests: 110
Past year average time to close issues: 18 days
Past year average time to close pull requests: about 19 hours
Past year issue authors: 3
Past year pull request authors: 9
Past year average comments per issue: 0.3
Past year average comments per pull request: 1.22
Past year merged pull request: 60
Past year bot issues: 0
Past year bot pull requests: 19

More stats: https://issues.ecosyste.ms/repositories/lookup?url=https://github.com/klimabevaegelsen/landbruget.dk

Top Issue Authors

  • martincollignon (71)
  • AlexanderLindkjaer (5)
  • EdwardNgo (2)
  • LilMonk (1)

Top Pull Request Authors

  • martincollignon (104)
  • dependabot[bot] (19)
  • aleksanderbl29 (11)
  • EdwardNgo (10)
  • AlexanderLindkjaer (6)
  • LilMonk (4)
  • gbrian (1)
  • azafoura (1)
  • STAR-173 (1)
  • p-leena-reddy-111 (1)

Top Issue Labels

  • data (15)
  • v1 (9)
  • help wanted (6)
  • source (5)
  • enhancement (5)
  • ready for implementation (2)
  • good first issue (2)
  • api (1)
  • bug (1)
  • documentation (1)
  • question (1)

Top Pull Request Labels

  • python:uv (19)
  • dependencies (19)

Score: 6.6039438246004725