ccvi-data
Establish a scientifically informed tool that enables policymakers and researchers to assess and map current global risks to human security arising from climate and conflict hazards, their intersections and the potential for harmful interactions.
https://github.com/ccew-unibw/ccvi-data
Category: Climate Change
Sub Category: Climate Data Processing and Analysis
Last synced: about 19 hours ago
JSON representation
Repository metadata
Data Generation for the Climate-Conflict-Vulnerability Index (CCVI)
- Host: GitHub
- URL: https://github.com/ccew-unibw/ccvi-data
- Owner: ccew-unibw
- License: gpl-3.0
- Created: 2025-05-23T09:08:30.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2025-06-02T10:02:56.000Z (26 days ago)
- Last Synced: 2025-06-19T08:11:44.926Z (9 days ago)
- Language: Python
- Homepage: https://climate-conflict.org
- Size: 326 KB
- Stars: 8
- Watchers: 2
- Forks: 1
- Open Issues: 2
- Releases: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
README.md
[!NOTE]
This repository does not reflect the previous data generation visible on climate-conflict.org. We plan to switch to this new data pipeline for the 2025-Q2 release, with testing currently still under way.
ccvi-data: Data processing for the Climate—Conflict—Vulnerability Index (CCVI)
The Climate Conflict Vulnerability Index (CCVI) is the result of a joint research project between the Center for Crisis Early Warning (CCEW) at University of the Bundeswehr Munich, the FutureLab "Security, Ethnic Conflicts and Migration" at the Potsdam Institute for Climate Impact Research (PIK), and the German Federal Foreign Office.
The goal of the project is to establish a scientifically informed tool that enables policymakers and researchers to assess and map current global risks to human security arising from climate and conflict hazards, their intersections and the potential for harmful interactions. Additionally, the CCVI reveals how vulnerabilities can amplify the impacts of climate and conflict hazards, increasing risks to human security.
The data and documentation of our conceptual and technical approach are available at https://climate-conflict.org.
The data is updated quarterly and gridded to 0.5 degrees (ca. 55km by 55km at the equator).
Table of Contents
Overview
The repository is broadly structured along the pillars of the CCVI, with corresponding climate
, conflict
and vulnerability
folders containing respective indicators. Additionally, the base
folder contains base classes providing the implementation structure and classes for each dataset used, while the utils
folder contains functionality shared across indicators and dimensions. The main ccvi.py
script initializes all components and orchestrates running the CCVI data pipeline. Available configuration can be set in the config.yaml
.
To run the CCVI data pipeline, use uv run ccvi.py
after the setup.
Setup
The project can be setup and run on a single workstation.
Technical Requirements
- Compute: CPU only. Some parallelization, though good single-core performance is a plus.
- Memory: < 128GB
- Storage: < 2TB
Data Requirements
All data used in the CCVI is publically available. The project depends both on locally downloaded input data and data accessed via APIs, downloaded automatically as part of the data pipeline.
Locally required input files need to be downloaded into the input subfolder of the storage directory defined in the config. See the comments in the .yaml for which files are currently required and where to download them.
Data downloaded via APIs, but also local input data, may require registering with the data providers. API keys and other required secrets are read from a .env
file, which must be setup according to the .env.template
.
Environment
Python
The project was developed and tested on python3.12.
R
The project uses R for some of its calculations via the r2py bridge. For this to work, R needs to be installed on your enviroment. The project was tested on R 4.3.2.
Python Dependencies:
This project uses uv
for Python package, version and virtual environment management. uv needs to be installed before it can be used to setup the environment.
To setup the virtual environment, use:
uv sync
after cloning the repository. This should install required packages and the python version.
Additional packages can be installed similar to pip with:
uv add <package-name>
For more information see the official uv
documentation.
The repository is formated via ruff.
Configuration
The config.yaml
is the central configuration file for the indicator. For a full description of the settings, see the file itself.
The following configurations are available
global
: Settings applicable everywhere. Includes regeneration config, enabling the forced regeneration of processing steps despite cached versions.global: # Start year for data processing and indicator generation. start_year: 2015 # Path under which the input/processing/output storage folders are contained/will be created. storage_path: data # IDs added to regenerate force regeneration of indicator calculation preprocessing or data # loading even if current versions are in storage. Aggregate scores are always regenerated. regenerate: indicator: - pillar_dimension_id preprocessing: - data_key data: - data_key
data
: Maps data source keys to their filenames relative to theinput/
directory.data: vdem: V-Dem-CY-Full+Others-v14.rds countries: geoBoundariesCGAZ_ADM0.gpkg land_mask: ne_50m_land.zip # ... other data sources
indicators
: Nested dictionary structure (pillar > dimension > id
) containing parameters specific to each indicator, defined by each class individially.indicators: CON: # Pillar level: # Dimension intensity: # Indicator ID # Specific parameters for CON_level_intensity normalization_quantile: 0.99 # ...
aggregation
: Nested dictionary structure (pillar > dimension
) containing parameters for risk scores, pillar, and dimension aggregations.aggregation: CON: # Pillar level: # Dimension method: mean # standard methods are mean, gmean or pmean weights: None # None (equal), or weight for all components as composite_id: weight pairs normalize: True # whether to re-normalize to 0-1 after aggreation step # ... RISK: # Risk scores CCVI: method: mean weights: None normalize: True # ...
Framework & Architecture
The project framework was designed with modularity in mind. Individual indicator and data sources can easily be modified and replaces without affecting the whole project.
The architecture follows the composite index logic of the CCVI. Base classes were designed to handle the core functionality and provide a unique interface for Datasets
, Indicators
, and our two main aggregation levels Dimensions
and Pillars. Additionally, shared ConfigParser
, StorageManager
and GlobalBasegrid
classes provide the framework for the geospatial resultion, to read config, and to cache processing steps and store results.
Data structure
All CCVI scores are stored as .parquet
files from pandas DataFrames with a ('pgid', 'year', 'quarter')
MultiIndex, where pgid
stands for PRIO-GRID id, an unique identifier for each grid cell.
Datasets
base.objects.Dataset
The Dataset
class provides the basic framework to add datasets to the CCVI. Datasets are responsible for encapsulating all logic related to accessing, downloading, and performing initial preprocessing of specific external data sources. Each Dataset subclass, is tailored to a particular source and needs to implement at minimum a load_data()
method and set their data_key
class attribute. The local
attribute distinguishes between local file sources and API-based sources, with required local files defined by the corresponding data_key(s) in the config. Each dataset is initialized with the shared ConfigParser
instance and sets up its own StorageManager
instance. Caching preprocessing steps is handled in a subfolder within the processing/ directory, named after its data_key, which is created based on the needs_storage
attribute. Dataset classes often include further methods for more specific data processing.
Indicators
base.objects.Indicator
The Indicator
class provides a framework for the processing steps each indicator needs to implement and orchestrates them. Each Indicator subclass creates one or more Dataset instances, loads and the data and applies specific calculations to transform this data into 0-1 score. Each Indicator is initialized with pillar, dim, and id identifiers, along with shared ConfigParser
and GlobalBaseGrid
instances and sets up its own StorageManager
. Caching preprocessing steps is handled in a subfolder within the processing/ directory, named after their composite_id
attribute, depending on the requires_processing_storage
attribute. Finished indicator scores and any raw values are stored as .parquet files in the output/ directory, named after the composite_id
. An internal generated flag, checked at initialization via StorageManager.check_component_generated()
, determines if an up-to-date version of the indicator's output already exists in storage.
The core workflow is defined by a series of abstract methods which subclasses implement:
load_data()
preprocess_data()
create_indicator()
normalize()
An optional add_raw_value() method can also be overridden. The workflow, (re-)generation checks and storage are orchestrated in the run()
method.
Dimensions and Pillars
base.objects.AggregatScore
, base.objects.Dimension
, base.objects.Pillar
, ccvi.CCVI
The Dimension
and Pillar
classes represent the aggregation levels within the CCVI structure, with the top-level CCVI
class performing the final risk score aggregations. Dimension classes aggregate multiple Indicator scores, while Pillar classes aggregate Dimension scores. The common logic for these aggregations is provided by the AggregateScore
base class, which is inherited by the Dimension, Pillar, and CCVI classes.
Aggregate score classes initialized with a list of their constituent objects (e.g., a list of Indicator instances for a Dimension) and the shared ConfigParser. They retrieve their specific aggregation parameters from the config and create their own StorageManager, using their composite_id (e.g., CON_level) as output filename. Aggregate scores are always in the data pipeline unless the same instance is run twice.
Similar to indicators, the run()
method orchestrates the aggregation process: it
- validates the input components and checks if they have been generated,
- runs any missing components,
- loads the data from these components via
load_components()
, - calculates aggreate scores via
aggregate()
. - saves the final aggregated score as
.parquet
file to the output/ folder.
An optional add_exposure() modifies the data before aggregation depending on the has_exposure
attribute, which is implemented for the climate pillar in the CCVI in the climate.shared.ClimateDimension
subclass.
The CCVI
top-level class does not store its own scores directly, but creates a DataFrame with all CCVI components and stores it in a versioned subfolder in the 'YYYY-Q#' (e.g. '2025-Q1') subfolder. It also creates and stores data recency metadata, denoting when the underlying datasources for each indicator were last updated.
Utilities
base.objects.ConfigParser
, base.objects.StorageManager
, base.objects.GlobalBaseGrid
The framework relies on three core utility classes for its fundamental operations:
- The
ConfigParser
it loads and validates the config.yaml file and provides structured access to global (including regeneration), data source, indicator, and aggregation configurations. - The output structure, all indicator score I/O and some caching is handled by the
StorageManager
. This class creates the standard input/, processing/, and output/ directory structure and manages component-specific subfolders within processing/. It offers methods to save and load pandas DataFrames (as Parquet files), build file paths, and check for file existence or up-to-date generation of component outputs. It also manages the composite_id of the indicator and aggregate scores. - The
GlobalBaseGrid
defines and manages the standard 0.5°x0.5° geospatial grid for the index. It handles the creation or loading of this grid, preprocesses country boundaries, filters water areas, and matches grid cells to countries, providing spatial resolution for all gridded indicators. The generation and caching of the base grid is orchestrated in therun()
method.
Contributions
We welcome bug reports through issues. While the version found on on https://climate-conflict.org is developed internally, with this repository we want to enable anyone to extend and adapt the CCVI to their needs and requirements, and create their own custom versions.
License
This project is licensed under the GNU General Public License v3.0.
See the LICENSE file for details.
Disclaimer
The project is funded by the German Federal Foreign Office. The views and opinions expressed in this projects, such as country assignments and boundaries, are those of the author(s) and do not necessarily reflect the official policy or position of any agency of the German government.
Owner metadata
- Name: Center for Crisis Early Warning
- Login: ccew-unibw
- Email: [email protected]
- Kind: organization
- Description: Projects developed at the Center for Crisis Early Warning @ UniBw M
- Website: https://www.unibw.de/ciss-en/ccew
- Location:
- Twitter:
- Company:
- Icon url: https://avatars.githubusercontent.com/u/212884315?v=4
- Repositories: 1
- Last ynced at: 2025-05-23T11:07:46.104Z
- Profile URL: https://github.com/ccew-unibw
GitHub Events
Total
- Issues event: 1
- Watch event: 5
- Public event: 1
- Push event: 5
- Pull request event: 3
- Fork event: 1
- Create event: 3
Last Year
- Issues event: 1
- Watch event: 5
- Public event: 1
- Push event: 5
- Pull request event: 3
- Fork event: 1
- Create event: 3
Committers metadata
Last synced: 7 days ago
Total Commits: 9
Total Committers: 3
Avg Commits per committer: 3.0
Development Distribution Score (DDS): 0.444
Commits in past year: 9
Committers in past year: 3
Avg Commits per committer in past year: 3.0
Development Distribution Score (DDS) in past year: 0.444
Name | Commits | |
---|---|---|
DaMitti | d****i@p****e | 5 |
Moritz Stefaner | m****z@s****u | 3 |
Daniel Mittermaier | d****r@u****e | 1 |
Committer domains:
- unibw.de: 1
- stefaner.eu: 1
- posteo.de: 1
Issue and Pull Request metadata
Last synced: 2 days ago
Total issues: 3
Total pull requests: 7
Average time to close issues: N/A
Average time to close pull requests: 23 minutes
Total issue authors: 1
Total pull request authors: 4
Average comments per issue: 0.0
Average comments per pull request: 0.43
Merged pull request: 4
Bot issues: 0
Bot pull requests: 0
Past year issues: 3
Past year pull requests: 7
Past year average time to close issues: N/A
Past year average time to close pull requests: 23 minutes
Past year issue authors: 1
Past year pull request authors: 4
Past year average comments per issue: 0.0
Past year average comments per pull request: 0.43
Past year merged pull request: 4
Past year bot issues: 0
Past year bot pull requests: 0
Top Issue Authors
- DaMitti (3)
Top Pull Request Authors
- DaMitti (3)
- MoritzStefaner (2)
- stefanosedano (1)
- MittermaierUniBW (1)
Top Issue Labels
- enhancement (2)
Top Pull Request Labels
Dependencies
- cartopy >=0.24.1
- cdsapi >=0.7.6
- clean-text >=0.6.0
- concentrationmetrics >=0.6.0
- country-converter >=1.3
- dask >=2025.5.0
- dotenv >=0.9.9
- duckdb >=1.2.2
- earthengine-api >=1.5.15
- geopandas >=1.0.1
- h5netcdf >=1.6.1
- joblib >=1.4.2
- jupyter >=1.1.1
- netcdf4 >=1.7.2
- numpy >=2.2.4
- openpyxl >=3.1.5
- pandas >=2.2.3
- panel-imputer >=0.7.0
- pendulum >=3.0.0
- pyarrow >=19.0.1
- pyeto *
- pyproj >=3.7.1
- pyreadr >=0.5.3
- python-dotenv >=1.1.0
- pyyaml >=6.0.2
- rich >=14.0.0
- rioxarray >=0.19.0
- rpy2 ==3.5.9
- schedule >=1.2.2
- scikit-learn >=1.6.1
- scipy >=1.15.2
- shapely >=2.1.0
- swifter >=1.4.0
- tqdm >=4.67.1
- tropycal >=1.3
- typer >=0.15.4
- wbgapi >=1.0.12
- xarray >=2025.3.1
- 194 dependencies
Score: 3.401197381662156