Recent Releases of GeoTessera
GeoTessera - Reduce startup time, improved coordinate clamping and reduces the size of coverage data for the globe viewer
This release reduces startup time for the library, improved coordinate clamping and reduces the size of coverage data for the globe viewer.
- Auto-snap coordinates to valid tile centers in
fetch_embeddinganddownload_tile, so callers no longer need to compute exact 0.05-offset grid centers themselves (#166 #164 @avsm, reported by @tonyboston-au) - Replaced tile/landmask dictionary caches with direct pandas MultiIndex lookups on
(year, lon_i, lat_i)and(lon_i, lat_i), simplifying the registry internals (#176 @avsm, reported by @sk818 in #175) - Coverage JSON output split into per-year files (
coverage_YYYY.json) to reduce payload size for the globe viewer (@avsm) - Globe viewer now detects land vs ocean from the coverage texture pixels instead of storing
no_coverage/landmaskslists in JSON (@avsm)
Natural Resources - Soil and Land
- Python
Published by avsm 4 months ago
GeoTessera - Querying single tiles and MIT LICENSE clarification
This release adds convenience options for querying single tiles.
- New
--tileoption added todownloadandcoveragecommands for single-tile queries by any point within the tile (@avsm) - Enhanced
--bboxoption to support both single-tile and bounding box formats (@avsm)
Licensing and docs clarifications as well:
- License clarification to fix mismatch between README and LICENSE and clarify MIT license (reported @adamjstewart in torchgeo/torchgeo#3243, fix by @avsm)
- Removed support request section due to resource limitations (@sk818)
Natural Resources - Soil and Land
- Python
Published by avsm 5 months ago
GeoTessera - Registry and embeddings scanning improvements
This release contains registry tooling improvements.
- Retired Pooch text manifest generation in favour of Parquet manifests (@avsm)
- Added tolerance for incomplete embedding directories during registry scans (@avsm)
- Improved warning grouping and diagnostics output (@avsm)
- Missing embeddings now written to a file for easier debugging (@avsm)
Natural Resources - Soil and Land
- Python
Published by avsm 6 months ago
GeoTessera - WIndows fixes and more robust embeddings discovery
This release adds Windows platform support, more robust tolerance to interrupted scripts leaving temporary files around, and documentation fixes for coordinate printing and tile discovery.
Windows Support
Added Windows testing infrastructure in CI and applied code fixes (@avsm):
- New conda-based CI workflow for Windows runners
- PowerShell test suite (
tests/cli.ps1) for Windows compatibility - Cross-platform path handling improvements throughout the codebase
Bug Fixes
-
Fixed lon/lat printing order into a standardized coordinate order to lon/lat throughout CLI output. (Reported @GieziJo fix by @avsm).
-
Fixed tile discovery false negatives arising from temporary files by removing pattern pre-filtering in
discover_tiles()(Report from @sadiqj, fix @avsm) -
Fixed Windows file handling by closing temporary files before overwriting. (Fix from @dra27)
Natural Resources - Soil and Land
- Python
Published by avsm 6 months ago
GeoTessera - v0.7.1: Zarr support
This release adds Zarr format support for efficient cloud-native data access and includes improvements to registry management tools. Thanks to @mayrajeo for the Zarr feature contribution!
Zarr Format Support
- New
--format zarroption fordownloadcommand: Download embeddings as Zarr archives for efficient chunked access- Cloud-native format that's optimised for both local and cloud storage with built-in compression
- xarray integration for analysis workflows
- Metadata preservation includes CRS, scales, and georeferencing information
- Usage:
geotessera download --bbox '...' --format zarr --output embeddings.zarr
Registry Improvements
- New
scancommand forgeotessera-registry: Utility to scan directories of embeddings and build registry metadata- Efficiently indexes large collections of embedding files and validates file integrity and extracts metadata. Only for registry maintainers.
Bug Fixes
- Fixed antimeridian handling in country point-in-polygon tests for accurate tile-country mapping, in the global coverage maps.
Natural Resources - Soil and Land
- Python
Published by avsm 7 months ago
GeoTessera - v0.7.0
v0.7.0 (2025-11-11)
This release moves to a Parquet-based registry for more efficient handling of the growing embeddings metadata for TESSERA. It no longer maintains a central cache, instead preferring the user to specify an embeddings directory within which the remote registry tiles are mirrored (as npy files) and additional mosaics and GeoTIFFs are generated. This helps make efficient use of disk space due to the large size of the embeddings.
There are also new APIs for efficiently sampling embeddings for point data, and to generate mosaics for classifiers over ROIs.
Note that there are significant interface changes throughout this release compared to 0.6; please read the migration notes below. The library will continue to evolve as we add more usecases, so please create issues on https://github.com/ucam-eo/geotessera with your wishlists!
- GeoParquet registry support: Transitioned from text-based manifests to Parquet files (
registry.parquet, `landmasks.parquet') for all tile metadata - Remove caching layer for tiles: All embedding and landmask tiles are now directly downloaded to temporary files and only the Parquet registry is cached, since users were finding that embeddings storage was being duplicated in the old tile cache. This leads to a significant reduction in disk space.
- Enhanced hash verification: SHA256 verification now covers all downloaded files:
- Embedding files (
.npy) verified usinghashcolumn from registry - Scales files are also verified using the
scales_hashcolumn from the registry - Landmask files (
.tiff) verified usinghashcolumn from landmasks registry - Can be disabled via
verify_hashes=Falseparameter,--skip-hashCLI flag, or theGEOTESSERA_SKIP_HASH=1environment variable - Hash verification is enabled by default for data integrity
- Embedding files (
- Lazy iterators for reducing memory usage for large ROIs.
Note that the default registry hosting is now at https://dl2.geotessera.org/v1/ instead of the older server, as we had to upgrade our hosting to support the large number of embeddings being generated for global coverage. We plan on bringing more diverse hosting options online before the end of 2025.
CLI Changes
-
New global options:
--registry-path- Specify registry.parquet file--registry-url- Specify registry URL--cache-dir- Control registry cache location (replacesTESSERA_DATA_DIR)- Removed
--auto-updateand--manifests-repo-url
-
Enhanced
infocommand: Shows tiles per year and total landmask counts using fast pandas operations -
Enhanced
coveragecommand: Generate a 3D globegl globe with coverage textures for HTML viewing. -
New
--dry-runoption fordownloadcommand: Calculate total download size without downloading- Shows file count, total size, number of tiles, year, and format
- Accounts for existing files (resume capability) - only counts files that would be downloaded
- For NPY format: calculates exact sizes from registry for embeddings, scales, and landmasks
- For TIFF format: provides size estimates (4x quantized size due to float32 conversion)
- Useful for planning downloads and estimating bandwidth/storage requirements
- Usage:
geotessera download --bbox '...' --dry-run
-
New
--skip-hashoption fordownloadcommand: Skip SHA256 hash verification- Disables hash verification for embedding, scales, and landmask files
- Can also be controlled via
GEOTESSERA_SKIP_HASH=1environment variable - Hash verification is enabled by default for security
- Usage:
geotessera download --bbox '...' --skip-hash
Registry CLI Changes
- New
export-manifestscommand: Convert Parquet registry files to Pooch-format text manifests for backwards compatibility- Reads
registry.parquetandlandmasks.parquetfiles - Generates block-based text registry files in
registry/embeddings/andregistry/landmasks/subdirectories - Creates separate entries for
.npyand_scales.npyfiles with their respective hashes - Useful for maintaining the tessera-manifests repository
- Usage:
geotessera-registry export-manifests /path/to/v1 --output-dir ~/src/git/ucam-eo/tessera-manifests
- Reads
Infrastructure Improvements
- CRAM test suite: Added comprehensive CLI tests using CRAM (Command-line Regression Acceptance Testing)
- Dumb terminal support: Added
TERM=dumbsupport for non-interactive environments and CI pipelines - Logging system: Migrated from print statements to Python's standard
loggingmodule for better integration
Breaking Changes
-
NPY Download Format:
geotessera download --format npynow saves quantized embeddings with scales instead of dequantized embeddings- New structure: Files saved in
embeddings/{year}/grid_{lon}_{lat}.npy(quantized) and_scales.npy(float32 scales) - Landmasks included: Saved in
landmasks/landmask_{lon}_{lat}.tifstructure - No JSON metadata: Removed JSON metadata files (use registry for metadata)
- Resume capability: Can interrupt and restart downloads without re-downloading existing files
- If you have existing NPY downloads, re-download with new version. Downloaded directories can now be reused with
GeoTessera(embeddings_dir=...)
- New structure: Files saved in
-
Registry API Changes: Internal registry methods now return tuple for better resource management
Registry.fetch()now returns(file_path, needs_cleanup)tuple instead of just pathRegistry.fetch_landmask()now returns(file_path, needs_cleanup)tuple instead of just path- These are internal changes - most users won't be affected
-
Registry Format Requirements: Updated schema for Parquet registry files
registry.parquetnow requires bothfile_sizeandscales_hashcolumnslandmasks.parquetrequiresfile_sizecolumnfile_sizeused for accurate download progress reporting with total sizescales_hashstores SHA256 hash for scales files separately from embedding hash- Registry validation will fail if required columns are missing
- Regenerate registries with latest
geotessera-registry scanto include new columns
-
Environment variables:
TESSERA_REGISTRY_DIRandTESSERA_DATA_DIRdeprecated in favor of CLI parameters -
Registry format: Completely new backend that migrates from text manifests to GeoParquet.
-
Cache behavior: Only the registry is now cached, and not tile data to allow clients to manage their own disk usage.
New API Features
-
Tilesclass: New abstraction for working with Tessera tiles- Provides unified interface for tile manipulation as either GeoTIFF or dequantized NumPy arrays
- Simplifies conversion between formats
- Accessible via
from geotessera.tiles import Tiles
-
GeoTessera(embeddings_dir=...): New constructor parameter for local tile reuse- Points to directory containing pre-downloaded tiles
- Expected structure:
embeddings/{year}/grid_{lon}_{lat}.npyand_scales.npy,landmasks/landmask_{lon}_{lat}.tif - Automatically uses local files when available, downloads only if missing
-
sample_embeddings_at_points(points, year, embeddings_dir=None, refresh=False): Efficient point sampling- Extract embedding values at arbitrary lon/lat coordinates
- Supports multiple input formats: list of tuples, GeoJSON FeatureCollection, GeoPandas GeoDataFrame
- Automatically groups points by tile for efficient batch processing
- Optional metadata return (tile info, pixel coords, CRS)
- Can override instance
embeddings_dirper call - Example:
embeddings = gt.sample_embeddings_at_points([(lon, lat), ...], year=2024)
-
fetch_embedding(..., refresh=False): New parameter to force re-download- When
refresh=True, re-downloads even if local tiles exist inembeddings_dir - Useful for updating tiles or verifying data integrity
- When
-
New Registry size query methods: Public API for querying file sizes from registry
registry.get_tile_file_size(year, lon, lat)- Get size of an embedding tile in bytesregistry.get_landmask_file_size(lon, lat)- Get size of a landmask tile in bytesregistry.calculate_download_requirements(tiles, output_dir, format_type)- Calculate total download size for a list of tiles- These methods replace direct registry DataFrame access and provide proper error handling
- Used internally by CLI
--dry-runoption and available for programmatic use - Example:
size = gt.registry.get_tile_file_size(2024, 0.15, 52.05)
-
embeddings_count(bbox, year): Get count of tiles in a bounding box- Returns total number of embedding tiles within a geographic region
- Useful for planning downloads and estimating processing requirements
- Example:
count = gt.embeddings_count((min_lon, min_lat, max_lon, max_lat), 2024)
-
export_coverage_map(output_file): Export coverage data to JSON- Generates global coverage map showing which tiles have embeddings for which years
- Returns dictionary with tile coverage information
- Optionally saves to JSON file for use in visualizations
-
generate_coverage_texture(coverage_data, output_file): Generate coverage texture for globe visualization- Creates 3600x1800 pixel equirectangular projection texture
- Each pixel represents a 0.1-degree tile, colored by coverage status
- Used with
coveragecommand for 3D globe visualizations, but also for your own visualisations
-
dequantize_embedding(quantized_embedding, scales): Public utility function for dequantization- Converts quantized embeddings to float32 by multiplying with scale factors
- Useful when working directly with downloaded quantized NPY files, but use the Tiles class for normal usage.
- Example:
embedding = dequantize_embedding(quantized, scales)
Migration Notes
From v0.6.0 to v0.7.0:
- Update initialization code to use new
cache_dirparameter instead of environment variables - Remove any custom
TESSERA_DATA_DIRorTESSERA_REGISTRY_DIRenvironment variable usage - Expect reduced disk usage as tiles are no longer cached but potentially more downloads.
- If using NPY downloads: Re-download tiles with new format to get quantized structure
- To reuse downloaded tiles: Use
GeoTessera(embeddings_dir="path/to/tiles")when initializing - For point sampling: Replace manual tile iteration with
sample_embeddings_at_points()
Natural Resources - Soil and Land
- Python
Published by avsm 7 months ago