Migrating From Rasterio¶
Rasteret changes the shape of remote COG workflows: build a reusable collection, filter metadata first, then read only the pixels you need. For the measured TorchGeo/rasterio comparison, see Benchmarks.
The Mental Shift¶
In a rasterio + STAC API workflow, you manage files:
- find STAC items
- pick asset URLs
- open each raster
- compute windows
- handle CRS and resolution differences
- schedule concurrent reads
- stack the results
In Rasteret, you manage a Collection.
The collection is a table of raster records. It stores metadata and COG header information; the pixels stay in the original COGs. You filter the table first, then choose the output surface you need.
A Common Task¶
Suppose you want to read geospatial images for an AOI and stack the result for analysis or model input.
Manual rasterio shape¶
This code is intentionally incomplete, because real code also needs retries, provider auth, nodata handling, and edge cases. The shape is the point.
from concurrent.futures import ThreadPoolExecutor
import numpy as np
import pystac_client
import rasterio
from rasterio.windows import from_bounds
catalog = pystac_client.Client.open("https://earth-search.aws.element84.com/v1")
search = catalog.search(
collections=["sentinel-2-l2a"],
bbox=(-122.5, 37.7, -122.4, 37.8),
datetime="2024-01-01/2024-06-30",
)
items = list(search.items())
def read_band(item, bounds):
href = item.assets["red"].href
with rasterio.open(href) as src:
# Real code also needs CRS alignment and bounds checks here.
window = from_bounds(*bounds, transform=src.transform)
return src.read(1, window=window)
with ThreadPoolExecutor(max_workers=10) as pool:
arrays = list(pool.map(lambda item: read_band(item, my_bounds), items))
data = np.stack(arrays)
Rasteret shape¶
import rasteret
collection = rasteret.build(
"earthsearch/sentinel-2-l2a",
name="s2-training",
bbox=(-122.5, 37.7, -122.4, 37.8),
date_range=("2024-01-01", "2024-06-30"),
)
# OR if you have your own COGs
collection = rasteret.build_from_table(
"path/to/table_with_cogs_metadata.parquet",
name="my-cogs",
enrich_cog=True,
)
filtered = collection.subset(cloud_cover_lt=20)
data = filtered.get_numpy(
geometries=(-122.5, 37.7, -122.4, 37.8),
bands=["B04", "B08"],
)
Rasteret handles the repeated setup:
- catalog/record normalization
- COG header parsing during build
- raster CRS sidecars
- tile and byte-range planning
- async cloud reads
- output assembly
Build Once, Read Many Times¶
The build step creates a reusable collection. It stores metadata such as:
- record IDs and timestamps
- footprint geometry
- bounding boxes
- asset hrefs
- native raster CRS sidecars
- per-band COG header metadata
- any extra columns you keep, such as labels or splits
It does not move pixels into Parquet. Pixel data remains in the original COGs and is read on demand.
Add Your Own Metadata¶
If your workflow has labels, split assignments, AOI IDs, or quality flags, add them as columns in the collection table. Rasteret keeps pixels in the COGs, so metadata joins can happen in Arrow-native tools without rewriting raster data.
For the full GeoPandas -> DuckDB -> Rasteret pattern, see Bring Your Own AOIs, Points, And Metadata.
When To Use Which Tool¶
| Task | Rasterio is a good fit | Rasteret is a good fit |
|---|---|---|
| Inspect one local TIFF | yes | not necessary |
| Debug TIFF tags or profiles | yes | not the focus |
| Read 1 or 2 files just once | yes | maybe not worth building |
| Repeated TIFF reads from cloud storage | possible, but you write the plumbing | yes |
| ML training over many scenes | possible, but setup grows quickly | yes |
| Metadata joins, splits, and labels | external glue code | yes, these can be extra columns in the collection |
| TorchGeo/xarray/NumPy outputs from one source | custom wrappers | yes, via built-in output surfaces |
Next: How-To Guides