Correctness Contract¶

This page describes Rasteret's user-visible correctness guarantees. It is the contract that contributors must preserve.

Rasteret aims to be GDAL/rasterio-aligned for supported inputs. When Rasteret cannot safely match that behavior, it fails loudly with actionable errors instead of guessing.

Scope (what Rasteret supports)¶

Remote, tiled GeoTIFFs / COGs that support byte-range reads (HTTP/S3/GCS/Azure).
Local tiled GeoTIFFs are supported for development/testing, but Rasteret is optimized for object stores.

Rasteret does not try to read everything GDAL can:

Non-tiled (striped) TIFFs are rejected (no TileOffsets/TileByteCounts).
Some TIFF encodings/layouts are intentionally unsupported and raise NotImplementedError.

Georeferencing¶

transform semantics are PixelIsArea-normalized (GDAL-style). GeoTIFFs using PixelIsPoint conventions are corrected so pixel grids align with rasterio.
Rotated/sheared transforms are not supported in the core reader.

Read semantics (AOI/window)¶

Rasteret's defaults are rasterio-aligned:

all_touched=False by default for polygon masking.
When filled=True, pixels outside the requested AOI/window or outside raster coverage are filled with:
the COG nodata value when present, otherwise
0 (preserving native dtype).

`valid_mask` (ML-safe outputs)¶

All primary reads return a boolean valid_mask that is True only where a pixel is both:

1) inside the requested AOI/window, and\ 2) inside actual raster coverage (not padded/fill pixels).

This lets ML pipelines avoid training on filled pixels without changing the read dtype/values.

Data types¶

Rasteret preserves the native COG dtype by default (e.g., Sentinel-2 uint16, AEF int8).
Masking/filling does not silently promote to a floating dtype; use valid_mask to distinguish fill from real data.

TorchGeo interop¶

Collection.to_torchgeo_dataset() provides pipeline-level interop: it returns a standard TorchGeo GeoDataset so samplers/DataLoader/training code stay in TorchGeo, while Rasteret provides fast pixel I/O underneath.

Pixel placement uses rasterio.merge.merge(bounds=..., res=...) semantics, matching what TorchGeo's own RasterDataset._merge_or_stack() calls. This is handled by rio_semantics.py, which delegates placement entirely to rasterio and does not reimplement merge/warp logic. South-up rasters (e.g. AEF, where transform.e > 0) are normalised to north-up before merge, consistent with TorchGeo's WarpedVRT approach in _load_warp_file().

If requested bands have different resolutions, Rasteret fails fast by default. To opt into resampling bands onto a common grid in the TorchGeo adapter, pass allow_resample=True to Collection.to_torchgeo_dataset(...).

What "fail loudly" means¶

For unsupported inputs, Rasteret raises explicit errors like:

"requires a tiled GeoTIFF"
"unsupported TIFF compression"
"chunky multi-sample TIFFs require an explicit band_index"

These errors are preferred over partial reads or heuristic fallbacks that could silently produce wrong pixels.

Contributor validation checklist¶

When you change anything in the read pipeline (fetch/, core/, TorchGeo adapter):

uv run pytest -q
uv run pytest --network -q (when the change affects real cloud reads)
uv run mkdocs build --strict