Benchmarks¶

Apples-to-apples time-series comparison using docs/tutorials/05_torchgeo_comparison.ipynb.

Environment: Ubuntu Linux, Python 3.13, us-west-2 EC2 instance.

Data source: Sentinel-2 L2A via Earth Search v1 (COGs on S3 us-west-2).

Controlled variables: Same scenes, same sampler, same DataLoader, same stack_samples collate. Both paths output identical [batch, T, C, H, W] tensors.

TorchGeo runs with its recommended GDAL settings (rasterio_best_practices from Pangeo COG best practices) for best-case remote COG performance:

GDAL_DISABLE_READDIR_ON_OPEN = "EMPTY_DIR"
AWS_NO_SIGN_REQUEST = "YES"
GDAL_MAX_RAW_BLOCK_CACHE_SIZE = "200000000"  # 200 MB
GDAL_SWATH_SIZE = "200000000"
VSI_CURL_CACHE_SIZE = "200000000"

This gives TorchGeo its best-case scenario for remote COG reads.

What "Index/header" time measures¶

TorchGeo: Time spent in rasterio.open("/vsicurl/...") calls. Each call triggers HTTP HEAD + 1-3 range requests to parse the IFD (Image File Directory) that contains tile offsets and byte counts. For T timesteps with ~3 HTTP round-trips each at ~100 ms, this adds up to seconds of pure header overhead before any pixel data flows.
Rasteret: Time to read the pre-built Parquet index from local disk. Tile offsets and byte counts are already cached, so no network I/O is needed.

Both are measuring the same logical step: making tile layout metadata available before pixel reads begin.

Backends tested¶

Backend	Transport	Install
TorchGeo	GDAL `/vsicurl/` (sequential)	`uv pip install torchgeo`
Rasteret	Rust-native HTTP via obstore (concurrent)	`uv pip install rasteret`

Cold start (first run)¶

Section	TorchGeo	Rasteret	Speedup	Shape
1: Single AOI, 15 scenes	9.08 s	1.14 s	8.0x	`[2, 15, 1, 256, 256]`
2: Multi-AOI, 30 scenes	42.05 s	2.25 s	18.7x	`[4, 30, 1, 256, 256]`
3: Cross-CRS, 12 scenes	12.47 s	0.59 s	21.3x	`[2, 12, 1, 256, 256]`

Warm cache (immediate re-run)¶

Section	TorchGeo	Rasteret	Speedup	Shape
1: Single AOI, 15 scenes	9.14 s	0.81 s	11.3x	`[2, 15, 1, 256, 256]`
2: Multi-AOI, 30 scenes	29.68 s	2.60 s	11.4x	`[4, 30, 1, 256, 256]`
3: Cross-CRS, 12 scenes	3.61 s	1.06 s	3.4x	`[2, 12, 1, 256, 256]`

Key observations¶

The difference grows with scene count: the rasterio/GDAL path re-parses headers over HTTP per file (sequential), while Rasteret reads cached headers from disk and fetches pixels concurrently.
Cross-CRS adds reprojection overhead (~0.1-0.3 s) to both paths.

Where the difference comes from¶

	rasterio/GDAL path	Rasteret path
Index	`rasterio.open()` per COG over HTTP	Pre-built GeoParquet (disk read)
Time-series read	Sequential `rasterio.merge()` per timestep	All T timesteps via `asyncio.gather`
HTTP per timestep	HEAD + IFD + pixel ranges	Pixel ranges only (headers cached)
Concurrency	Sequential	`asyncio.gather` across T x C reads

Reproducibility¶

# Fresh run
uv run python -m nbconvert --execute docs/tutorials/05_torchgeo_comparison.ipynb

# Immediate re-run (warm cache)
uv run python -m nbconvert --execute docs/tutorials/05_torchgeo_comparison.ipynb

Results vary with network conditions and EC2 instance placement.

Why cold-start numbers matter¶

The "cold start" row above is the number that matters most. Every new notebook kernel, every new VM, every Kubernetes pod restart, every CI runner, every colleague cloning your repo starts cold. There is no warm HTTP cache, no OS page cache, no pre-opened file handles.

TorchGeo re-parses every COG header over HTTP on each cold start. Rasteret reads the same headers from a local Parquet file in milliseconds. The gap widens with scene count: 30 scenes means 120+ HTTP round-trips that Rasteret skips entirely.

These benchmarks cover 12-30 Sentinel-2 scenes. The speedup grows with scene count, so larger workloads should widen the gap. If you're running Rasteret on bigger collections, different sensors, or production pipelines, share your timings on GitHub Discussions or Discord - it helps the community understand where the real ceiling is.