GeoTessera

A foundation model that can process time-series satellite imagery for applications such as land classification and canopy height prediction.
https://github.com/ucam-eo/geotessera

Category: Natural Resources
Sub Category: Soil and Land

Keywords from Contributors

standards

Last synced: about 6 hours ago
JSON representation

Repository metadata

Python library for the Tessera embeddings

README.md

GeoTessera

Python library for accessing and working with Tessera geospatial foundation model embeddings.

Overview

GeoTessera provides access to geospatial embeddings from the Tessera
foundation model
, which processes
Sentinel-1 and Sentinel-2 satellite imagery to generate 128-channel
representation maps at 10m resolution. These embeddings compress a full year of
temporal-spectral features into dense representations optimized for downstream
geospatial analysis tasks. Read more details about the model.

Coverage map

Request missing embeddings

This repo provides precomputed embeddings for multiple years and regions.
Embeddings are generated by randomly sampling tiles within each region to ensure broad spatial coverage.

If some years (2017–2024) / areas are still missing for your use case, please submit an Embedding Request:

  • 👉 Open an Embedding Request
  • Please include: your organization, intended use, ROI as a bounding box with four points (lon,lat, 4 decimals), and the year(s).

After you submit the request, we will prioritize your ROI and notify you via a comment in the issue once the embeddings are ready.

Important Notice ⚠️

On 20th August 2025, we updated the data processing pipeline of GeoTessera to resolve the issue of tiling artifacts, as shown below. We have retained the embeddings generated before August 20, as they remain effective for use in small-scale areas. After the 2024 embedding generation is completed, we will reprocess the tiles affected by tiling artifacts. If you observe such artifacts during use and they significantly impact performance, please raise the issue here, and we will prioritize reprocessing your request.

Pipeline Change

Please note that if the artifacts you observe are slanted, this is not a bug in the pipeline but rather a result of the Sentinel-1/2 satellite trajectories. Currently, Tessera cannot completely eliminate such artifacts, as they reflect the inherent characteristics of the raw data. However, we have observed that they have minimal impact on downstream tasks.

Table of Contents

Installation

pip install geotessera

For development:

git clone https://github.com/ucam-eo/geotessera
cd geotessera
pip install -e .

Architecture

Core Concepts

GeoTessera is built around a simple two-step workflow:

  1. Retrieve embeddings: Fetch raw numpy arrays for a geographic bounding box
  2. Export to desired format: Save as raw numpy arrays or convert to georeferenced GeoTIFF files

Coordinate System and Tile Grid

The Tessera embeddings use a 0.1-degree grid system:

  • Tile size: Each tile covers 0.1° × 0.1° (approximately 11km × 11km at the equator)
  • Tile naming: Tiles are named by their center coordinates (e.g., grid_0.15_52.05)
  • Tile bounds: A tile at center (lon, lat) covers:
    • Longitude: [lon - 0.05°, lon + 0.05°]
    • Latitude: [lat - 0.05°, lat + 0.05°]
  • Resolution: 10m per pixel (variable number of pixels per tile depending on latitude)

File Structure and Downloads

When you request embeddings, GeoTessera downloads files directly via HTTP to temporary locations:

Embedding Files (via fetch_embedding)

  1. Quantized embeddings (grid_X.XX_Y.YY.npy):

    • Shape: (height, width, 128)
    • Data type: int8 (quantized for storage efficiency)
    • Contains the compressed embedding values
  2. Scale files (grid_X.XX_Y.YY_scales.npy):

    • Shape: (height, width) or (height, width, 128)
    • Data type: float32
    • Contains scale factors for dequantization
  3. Dequantization: final_embedding = quantized_embedding * scales

  4. Temporary Storage: Files are downloaded to temp locations and automatically cleaned up after processing

Landmask Files (for GeoTIFF export)

When exporting to GeoTIFF, additional landmask files are fetched:

  • Landmask tiles (grid_X.XX_Y.YY.tiff):
    • Provide UTM projection information
    • Define precise geospatial transforms
    • Contain land/water masks
    • Also downloaded to temp locations and cleaned up after use

Data Flow

User Request (lat/lon bbox)
Parquet Registry Lookup (find available tiles from registry.parquet)
Direct HTTP Downloads to Temp Files
    ├── embedding.npy (quantized) → temp file
    └── embedding_scales.npy → temp file
Dequantization (multiply arrays)
Automatic Cleanup (delete temp files)
Output Format
    ├── NumPy arrays → Direct analysis
    └── GeoTIFF → GIS integration

Storage Note: Only the Parquet registry (~few MB) is cached locally. All embedding data is downloaded on-demand to temporary files and immediately cleaned up, resulting in zero persistent storage overhead for tile data.

Quick Start

Check Available Data

Before downloading, check what data is available:

# Generate a coverage map showing all available tiles
geotessera coverage --output coverage_map.png

# Generate a coverage map for the UK
geotessera coverage --country uk

# View coverage for a specific year
geotessera coverage --year 2024 --output coverage_2024.png

# Customize the visualization
geotessera coverage --year 2024 --tile-color blue --tile-alpha 0.3 --dpi 150

Download Embeddings

Download embeddings as either numpy arrays or GeoTIFF files:

# Download as GeoTIFF (default, with georeferencing)
geotessera download \
  --bbox "-0.2,51.4,0.1,51.6" \
  --year 2024 \
  --output ./london_tiffs

# Download as raw numpy arrays (with metadata JSON)
geotessera download \
  --bbox "-0.2,51.4,0.1,51.6" \
  --format npy \
  --year 2024 \
  --output ./london_arrays

# Download using a GeoJSON/Shapefile region
geotessera download \
  --region-file cambridge.geojson \
  --format tiff \
  --year 2024 \
  --output ./cambridge_tiles

# Download specific bands only
geotessera download \
  --bbox "-0.2,51.4,0.1,51.6" \
  --bands "0,1,2" \
  --year 2024 \
  --output ./london_rgb

Create Visualizations

Generate web maps from downloaded GeoTIFFs:

# Create an interactive web map
geotessera visualize \
  ./london_tiffs \
  --type web \
  --output ./london_web

# Create an RGB mosaic
geotessera visualize \
  ./london_tiffs \
  --type rgb \
  --bands "30,60,90" \
  --output ./london_rgb

# Serve the web map locally
geotessera serve ./london_web --open

Python API

Core Methods

The library provides two main methods for retrieving embeddings:

from geotessera import GeoTessera

# Initialize the client
gt = GeoTessera()

# Method 1: Fetch a single tile
embedding, crs, transform = gt.fetch_embedding(lon=0.15, lat=52.05, year=2024)
print(f"Shape: {embedding.shape}")  # e.g., (1200, 1200, 128)
print(f"CRS: {crs}")  # Coordinate reference system from landmask

# Method 2: Fetch all tiles in a bounding box
bbox = (-0.2, 51.4, 0.1, 51.6)  # (min_lon, min_lat, max_lon, max_lat)
tiles_to_fetch = gt.registry.load_blocks_for_region(bounds=bbox, year=2024)
embeddings = gt.fetch_embeddings(tiles_to_fetch)

for year, tile_lon, tile_lat, embedding_array, crs, transform in embeddings:
    print(f"Tile ({tile_lat}, {tile_lon}): {embedding_array.shape}")

Export Formats

Export as GeoTIFF

# Export embeddings for a region as individual GeoTIFF files
# Step 1: Get the tiles for the region
bbox = (-0.2, 51.4, 0.1, 51.6)
tiles_to_fetch = gt.registry.load_blocks_for_region(bounds=bbox, year=2024)

# Step 2: Export those tiles as GeoTIFFs
files = gt.export_embedding_geotiffs(
    tiles_to_fetch=tiles_to_fetch,
    output_dir="./output",
    bands=None,  # Export all 128 bands (default)
    compress="lzw"  # Compression method
)

print(f"Created {len(files)} GeoTIFF files")

# Export specific bands only (e.g., first 3 for RGB visualization)
files = gt.export_embedding_geotiffs(
    tiles_to_fetch=tiles_to_fetch,
    output_dir="./rgb_output",
    bands=[0, 1, 2]  # Only export first 3 bands
)

Work with NumPy Arrays

# Fetch and process embeddings directly
tiles_to_fetch = gt.registry.load_blocks_for_region(bounds=bbox, year=2024)
embeddings = gt.fetch_embeddings(tiles_to_fetch)

for year, tile_lon, tile_lat, embedding, crs, transform in embeddings:
    # Compute statistics
    mean_values = np.mean(embedding, axis=(0, 1))  # Mean per channel
    std_values = np.std(embedding, axis=(0, 1))    # Std per channel

    # Extract specific pixels
    center_pixel = embedding[embedding.shape[0]//2, embedding.shape[1]//2, :]

    # Apply custom processing
    processed = your_analysis_function(embedding)

Visualization Functions

from geotessera.visualization import (
    create_rgb_mosaic,
    visualize_global_coverage
)
from geotessera.web import (
    create_coverage_summary_map,
    geotiff_to_web_tiles
)

# Create an RGB mosaic from multiple GeoTIFF files
create_rgb_mosaic(
    geotiff_paths=["tile1.tif", "tile2.tif"],
    output_path="mosaic.tif",
    bands=(0, 1, 2)  # RGB bands
)

# Generate web tiles for interactive maps
geotiff_to_web_tiles(
    geotiff_path="mosaic.tif",
    output_dir="./web_tiles",
    zoom_levels=(8, 15)
)

# Create a global coverage visualization
visualize_global_coverage(
    tessera_client=gt,
    output_path="global_coverage.png",
    year=2024,  # Or None for all years
    width_pixels=2000,
    tile_color="red",
    tile_alpha=0.6
)

CLI Reference

download

Download embeddings for a region in your preferred format:

geotessera download [OPTIONS]

Options:
  -o, --output PATH         Output directory [required]
  --bbox TEXT              Bounding box: 'min_lon,min_lat,max_lon,max_lat'
  --region-file PATH       GeoJSON/Shapefile to define region
  -f, --format TEXT        Output format: 'tiff' or 'npy' (default: tiff)
  --year INT               Year of embeddings (default: 2024)
  --bands TEXT             Comma-separated band indices (default: all 128)
  --compress TEXT          Compression for TIFF format (default: lzw)
  --list-files             List all created files with details
  -v, --verbose            Verbose output

Output formats:

  • tiff: Georeferenced GeoTIFF files with UTM projection
  • npy: Raw numpy arrays with metadata.json file

visualize

Create visualizations from GeoTIFF files:

geotessera visualize INPUT_PATH [OPTIONS]

Options:
  -o, --output PATH        Output directory [required]
  --type TEXT              Visualization type: rgb, web, coverage
  --bands TEXT             Comma-separated band indices for RGB
  --normalize              Normalize bands
  --min-zoom INT           Min zoom for web tiles (default: 8)
  --max-zoom INT           Max zoom for web tiles (default: 15)
  --force                  Force regeneration of tiles

coverage

Generate a world map showing data availability:

geotessera coverage [OPTIONS]

Options:
  -o, --output PATH        Output PNG file (default: tessera_coverage.png)
  --year INT               Specific year to visualize
  --tile-color TEXT        Color for tiles (default: red)
  --tile-alpha FLOAT       Transparency 0-1 (default: 0.6)
  --tile-size FLOAT        Size multiplier (default: 1.0)
  --dpi INT                Output resolution (default: 100)
  --width INT              Figure width in inches (default: 20)
  --height INT             Figure height in inches (default: 10)
  --no-countries           Don't show country boundaries

serve

Serve web visualizations locally:

geotessera serve DIRECTORY [OPTIONS]

Options:
  -p, --port INT           Port number (default: 8000)
  --open/--no-open         Auto-open browser (default: open)
  --html TEXT              Specific HTML file to serve

info

Display information about GeoTIFF files or the library:

geotessera info [OPTIONS]

Options:
  --geotiffs PATH          Analyze GeoTIFF files/directory
  --dataset-version TEXT   Tessera dataset version
  -v, --verbose            Verbose output

Registry System

Overview

GeoTessera uses a Parquet-based registry system to efficiently manage and access the large Tessera dataset:

  • Single Parquet file: All tile metadata stored in one efficient registry.parquet file
  • Fast queries: Uses pandas DataFrames for efficient spatial and temporal filtering
  • Block-based organization: Internal 5×5 degree geographic blocks for efficient queries
  • Minimal storage: Registry file is ~few MB and cached locally
  • Integrity checking: SHA256 checksums ensure data integrity during downloads
    • Embedding files verified using hash column
    • Scales files verified using scales_hash column
    • Landmask files verified using landmasks registry hash column
    • Enabled by default for data integrity and security
    • Can be disabled with verify_hashes=False, --skip-hash CLI flag, or GEOTESSERA_SKIP_HASH=1 environment variable

Registry Sources

The registry can be loaded from multiple sources (in priority order):

  1. Local file (via --registry-path or registry_path parameter)
  2. Local directory (via --registry-dir or registry_dir parameter, looks for registry.parquet)
  3. Remote URL (via --registry-url or registry_url parameter)
  4. Default remote (from https://dl2.geotessera.org/{version}/registry.parquet)
# Use local registry file
gt = GeoTessera(registry_path="/path/to/registry.parquet")

# Use local registry directory
gt = GeoTessera(registry_dir="/path/to/registry-dir")

# Use custom remote registry
gt = GeoTessera(registry_url="https://example.com/registry.parquet")

# Use default remote registry (downloads and caches automatically)
gt = GeoTessera()  # Default behavior

Registry Structure

The Parquet registry contains columns for:

  • Coordinates: lon, lat (tile center coordinates)
  • Year: year (data year, 2017-2024)
  • Hash: sha256 (file integrity checksum)
  • Paths: File paths for embeddings, scales, and landmasks
  • Block info: Internal 5×5 degree block identifiers for efficient queries
# Example registry query
import pandas as pd
registry = pd.read_parquet("registry.parquet")
print(registry.head())
#    lon    lat  year                              sha256  ...
# 0.15  52.05  2024  abc123...

How Registry Loading Works

  1. Load Parquet registry → Download and cache registry file (if not local)
  2. Request tiles for bbox → Query DataFrame for tiles in region
  3. Filter by year → Select tiles matching requested year
  4. Find available tiles → Return list of matching tiles
  5. Direct HTTP download → Fetch tiles on-demand to temp files with hash verification
  6. Automatic cleanup → Delete temp files after processing

Data Organization

Tessera Data Structure

Remote Server (https://dl2.geotessera.org)
├── v1/                              # Dataset version
│   ├── registry.parquet             # Parquet registry with all metadata
│   ├── 2024/                        # Year
│   │   ├── grid_0.15_52.05/         # Tile (named by center coords)
│   │   │   ├── grid_0.15_52.05.npy              # Quantized embeddings
│   │   │   └── grid_0.15_52.05_scales.npy       # Scale factors
│   │   └── ...
│   └── landmasks/
│       ├── grid_0.15_52.05.tiff     # Landmask with projection info
│       └── ...

Local Cache Structure

~/.cache/geotessera/                 # Default cache location
└── registry.parquet                  # Cached Parquet registry (~few MB)

# Note: Embedding and landmask tiles are NOT cached persistently.
# They are downloaded to temporary files and immediately cleaned up after use.

Coordinate Reference Systems

  • Embeddings: Stored in simple arrays, referenced by center coordinates
  • GeoTIFF exports: Use UTM projection from corresponding landmask tiles
  • Web visualizations: Reprojected to Web Mercator (EPSG:3857)

Cache Configuration

GeoTessera caches only the Parquet registry file (~few MB). Embedding and landmask tiles are downloaded to temporary files and immediately cleaned up after use.

Python API

from geotessera import GeoTessera

# Use custom cache directory for registry
gt = GeoTessera(cache_dir="/path/to/cache")

# Use default cache location (recommended)
gt = GeoTessera()

CLI

# Specify custom cache directory
geotessera download --cache-dir /path/to/cache ...

# Use default cache location
geotessera download ...

Default Cache Locations

When cache_dir is not specified, the registry is cached in platform-appropriate locations:

  • Linux/macOS: $XDG_CACHE_HOME/geotessera or ~/.cache/geotessera
  • Windows: %LOCALAPPDATA%/geotessera

Hash Verification

GeoTessera verifies SHA256 checksums for all downloaded files (embeddings, scales, and landmasks) by default to ensure data integrity. You can disable this verification if needed:

Python API

from geotessera import GeoTessera

# Disable hash verification via parameter
gt = GeoTessera(verify_hashes=False)

# Or use environment variable
import os
os.environ['GEOTESSERA_SKIP_HASH'] = '1'
gt = GeoTessera()

CLI

# Disable hash verification with flag
geotessera download --bbox "0,52,0.2,52.2" --year 2024 -o ./data --skip-hash

# Or use environment variable
GEOTESSERA_SKIP_HASH=1 geotessera download --bbox "0,52,0.2,52.2" --year 2024 -o ./data

Note: Hash verification is enabled by default for security. Only disable it in trusted environments or for testing purposes.

Contributing

Contributions are welcome! Please see our Contributing Guide for details.
This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use Tessera in your research, please cite the arXiv paper:

@misc{feng2025tesseratemporalembeddingssurface,
      title={TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and Analysis}, 
      author={Zhengpeng Feng and Clement Atzberger and Sadiq Jaffer and Jovana Knezevic and Silja Sormunen and Robin Young and Madeline C Lisaius and Markus Immitzer and David A. Coomes and Anil Madhavapeddy and Andrew Blake and Srinivasan Keshav},
      year={2025},
      eprint={2506.20380},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2506.20380}, 
}

Links

Star History

Star History Chart


Owner metadata


GitHub Events

Total
Last Year

Committers metadata

Last synced: 3 months ago

Total Commits: 169
Total Committers: 9
Avg Commits per committer: 18.778
Development Distribution Score (DDS): 0.189

Commits in past year: 169
Committers in past year: 9
Avg Commits per committer in past year: 18.778
Development Distribution Score (DDS) in past year: 0.189

Name Email Commits
Anil Madhavapeddy a****l@r****g 137
frankfeng f****g@d****e 14
Robin Young 5****g 6
Sadiq Jaffer s****q@t****m 3
Frank Feng 6****3 3
E-Ping Rau e****s@g****m 2
Nicolas Karasiak n****k@e****m 2
Srinivasan Keshav 6****8 1
GitHub Actions Bot a****s@g****m 1

Committer domains:


Issue and Pull Request metadata

Last synced: 3 months ago

Total issues: 44
Total pull requests: 8
Average time to close issues: 5 days
Average time to close pull requests: 4 days
Total issue authors: 39
Total pull request authors: 6
Average comments per issue: 1.11
Average comments per pull request: 0.75
Merged pull request: 5
Bot issues: 0
Bot pull requests: 0

Past year issues: 44
Past year pull requests: 8
Past year average time to close issues: 5 days
Past year average time to close pull requests: 4 days
Past year issue authors: 39
Past year pull request authors: 6
Past year average comments per issue: 1.11
Past year average comments per pull request: 0.75
Past year merged pull request: 5
Past year bot issues: 0
Past year bot pull requests: 0

More stats: https://issues.ecosyste.ms/repositories/lookup?url=https://github.com/ucam-eo/geotessera

Top Issue Authors

  • epingchris (3)
  • barbarametzler (2)
  • kt-sa7716 (2)
  • ratsakatika (2)
  • rbnyng (1)
  • DalelanW (1)
  • sampathyetiraj-cpu (1)
  • CBonannella (1)
  • JBehanRio (1)
  • jfprieur (1)
  • RossDF (1)
  • Rudigithub12345 (1)
  • jdoblas (1)
  • miquel-espinosa (1)
  • yoshitos (1)

Top Pull Request Authors

  • avsm (3)
  • sadiqj (1)
  • epingchris (1)
  • nkarasiak (1)
  • olli4 (1)
  • rbnyng (1)

Top Issue Labels

  • embedding-request (32)
  • enhancement (2)
  • bug (1)

Top Pull Request Labels


Package metadata

pypi.org: geotessera

Python library interface to the Tessera geofoundation model embeddings

  • Homepage: https://github.com/ucam-eo/geotessera
  • Documentation: https://geotessera.readthedocs.io
  • Licenses: ISC License Copyright 2025 Anil Madhavapeddy <anil@recoil.org> Copyright 2025 Frank Feng Permission to use, copy, modify, and/or distribute this software for any purpose with or without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies. THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
  • Latest release: 0.6.0 (published 5 months ago)
  • Last Synced: 2025-10-30T08:52:17.060Z (3 months ago)
  • Versions: 8
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 698 Last month
  • Rankings:
    • Dependent packages count: 8.836%
    • Average: 29.312%
    • Dependent repos count: 49.788%
  • Maintainers (2)

Dependencies

pyproject.toml pypi
  • matplotlib *
  • numpy *
  • pooch *
  • tqdm >=4.67.1
.github/workflows/update_map.yml actions
  • actions/checkout v4 composite
  • astral-sh/setup-uv v5 composite
  • stefanzweifel/git-auto-commit-action v5 composite
uv.lock pypi
  • affine 2.4.0
  • alabaster 1.0.0
  • attrs 25.3.0
  • babel 2.17.0
  • certifi 2025.6.15
  • charset-normalizer 3.4.2
  • click 8.2.1
  • click-plugins 1.1.1.2
  • cligj 0.7.2
  • colorama 0.4.6
  • contourpy 1.3.2
  • cycler 0.12.1
  • docutils 0.21.2
  • fonttools 4.58.5
  • geopandas 1.1.1
  • geotessera 0.2.0
  • idna 3.10
  • imagesize 1.4.1
  • jinja2 3.1.6
  • kiwisolver 1.4.8
  • markdown-it-py 3.0.0
  • markupsafe 3.0.2
  • matplotlib 3.10.3
  • mdurl 0.1.2
  • numpy 2.3.1
  • packaging 25.0
  • pandas 2.3.0
  • pillow 11.3.0
  • platformdirs 4.3.8
  • pooch 1.8.2
  • pygments 2.19.2
  • pyogrio 0.11.0
  • pyparsing 3.2.3
  • pyproj 3.7.1
  • python-dateutil 2.9.0.post0
  • pytz 2025.2
  • rasterio 1.4.3
  • requests 2.32.4
  • rich 14.0.0
  • roman-numerals-py 3.1.0
  • shapely 2.1.1
  • six 1.17.0
  • snowballstemmer 3.0.1
  • sphinx 8.2.3
  • sphinxcontrib-applehelp 2.0.0
  • sphinxcontrib-devhelp 2.0.0
  • sphinxcontrib-htmlhelp 2.1.0
  • sphinxcontrib-jsmath 1.0.1
  • sphinxcontrib-qthelp 2.0.0
  • sphinxcontrib-serializinghtml 2.0.0
  • tqdm 4.67.1
  • tzdata 2025.2
  • urllib3 2.5.0

Score: 14.164405314584045