Skip to content

Migrating From Rasterio

Rasteret changes the shape of remote COG workflows: build a reusable collection, filter metadata first, then read only the pixels you need. For the measured TorchGeo/rasterio comparison, see Benchmarks.

The Mental Shift

In a rasterio + STAC API workflow, you manage files:

  • find STAC items
  • pick asset URLs
  • open each raster
  • compute windows
  • handle CRS and resolution differences
  • schedule concurrent reads
  • stack the results

In Rasteret, you manage a Collection.

The collection is a table of raster records. It stores metadata and COG header information; the pixels stay in the original COGs. You filter the table first, then choose the output surface you need.

A Common Task

Suppose you want to read geospatial images for an AOI and stack the result for analysis or model input.

Manual rasterio shape

This code is intentionally incomplete, because real code also needs retries, provider auth, nodata handling, and edge cases. The shape is the point.

from concurrent.futures import ThreadPoolExecutor

import numpy as np
import pystac_client
import rasterio
from rasterio.windows import from_bounds

catalog = pystac_client.Client.open("https://earth-search.aws.element84.com/v1")
search = catalog.search(
    collections=["sentinel-2-l2a"],
    bbox=(-122.5, 37.7, -122.4, 37.8),
    datetime="2024-01-01/2024-06-30",
)
items = list(search.items())


def read_band(item, bounds):
    href = item.assets["red"].href
    with rasterio.open(href) as src:
        # Real code also needs CRS alignment and bounds checks here.
        window = from_bounds(*bounds, transform=src.transform)
        return src.read(1, window=window)


with ThreadPoolExecutor(max_workers=10) as pool:
    arrays = list(pool.map(lambda item: read_band(item, my_bounds), items))

data = np.stack(arrays)

Rasteret shape

import rasteret

collection = rasteret.build(
    "earthsearch/sentinel-2-l2a",
    name="s2-training",
    bbox=(-122.5, 37.7, -122.4, 37.8),
    date_range=("2024-01-01", "2024-06-30"),
)

# OR if you have your own COGs
collection = rasteret.build_from_table(
    "path/to/table_with_cogs_metadata.parquet",
    name="my-cogs",
    enrich_cog=True,
)

filtered = collection.subset(cloud_cover_lt=20)

data = filtered.get_numpy(
    geometries=(-122.5, 37.7, -122.4, 37.8),
    bands=["B04", "B08"],
)

Rasteret handles the repeated setup:

  • catalog/record normalization
  • COG header parsing during build
  • raster CRS sidecars
  • tile and byte-range planning
  • async cloud reads
  • output assembly

Build Once, Read Many Times

The build step creates a reusable collection. It stores metadata such as:

  • record IDs and timestamps
  • footprint geometry
  • bounding boxes
  • asset hrefs
  • native raster CRS sidecars
  • per-band COG header metadata
  • any extra columns you keep, such as labels or splits

It does not move pixels into Parquet. Pixel data remains in the original COGs and is read on demand.

Add Your Own Metadata

If your workflow has labels, split assignments, AOI IDs, or quality flags, add them as columns in the collection table. Rasteret keeps pixels in the COGs, so metadata joins can happen in Arrow-native tools without rewriting raster data.

For the full GeoPandas -> DuckDB -> Rasteret pattern, see Bring Your Own AOIs, Points, And Metadata.

When To Use Which Tool

Task Rasterio is a good fit Rasteret is a good fit
Inspect one local TIFF yes not necessary
Debug TIFF tags or profiles yes not the focus
Read 1 or 2 files just once yes maybe not worth building
Repeated TIFF reads from cloud storage possible, but you write the plumbing yes
ML training over many scenes possible, but setup grows quickly yes
Metadata joins, splits, and labels external glue code yes, these can be extra columns in the collection
TorchGeo/xarray/NumPy outputs from one source custom wrappers yes, via built-in output surfaces

Next: How-To Guides