rasteret.integrations.torchgeo¶
TorchGeo GeoDataset adapter for Rasteret collections.
RasteretGeoDataset is a standard TorchGeo GeoDataset subclass. It
replaces the I/O backend (async obstore instead of rasterio/GDAL) while
honoring the full GeoDataset contract: index, crs, res,
__getitem__(GeoSlice) -> Sample. Compatible with all TorchGeo samplers,
collation helpers (stack_samples, concat_samples), transforms, and
dataset composition (IntersectionDataset, UnionDataset).
Typical usage¶
dataset = collection.to_torchgeo_dataset(
bands=["B04", "B03", "B02"],
chip_size=256,
)
sampler = RandomGeoSampler(dataset, size=256, length=100)
loader = DataLoader(dataset, sampler=sampler, collate_fn=stack_samples)
GeoDataset contract (what TorchGeo requires)¶
Rasteret honors all of these:
__getitem__(GeoSlice) -> Sample: returns adict[str, Any]index: GeoPandas GeoDataFrame withIntervalIndexnamed"datetime"and Shapely footprint geometrycrs: set from the collection's EPSG coderes: derived from the first record's COG metadata transform- Dataset composition:
IntersectionDataset(rasteret_ds, other_ds)andUnionDatasetwork correctly
Sample dict keys¶
Standard keys (always present):
bounds:Tensorof spatial boundstransform:Tensorof affine transform coefficientsimage:Tensorwith shape[C, H, W](or[T, C, H, W]whentime_series=True), whenis_image=Truemask:Tensorwith shape[H, W](or[T, H, W]), whenis_image=False(channel dim squeezed whenC == 1, matching TorchGeoRasterDatasetconventions)
Rasteret additions (optional, do not break interop):
label: scalar or tensor label from a metadata column, whenlabel_fieldis set. TorchGeo's collate functions handle arbitrary keys, so this passes throughstack_samplesandconcat_sampleswithout issue.
Rasteret's low-level read APIs return a valid_mask for ML-safe workflows, but it
is intentionally not included in TorchGeo samples by default to preserve
TorchGeo dataset composition behavior (dataset1 & dataset2, dataset1 | dataset2).
Notes / limitations¶
- When
chip_sizeis set, Rasteret guarantees fixed chip output shape even when floating point bounds would otherwise cause off-by-one rounding. - When
time_series=Falseand the requested slice overlaps multiple records, Rasteret selects the first record and logs a warning (it does not mosaic/merge overlapping scenes in the adapter). - Rasteret requires all requested bands to share the same resolution for TorchGeo sampling. To opt into resampling bands onto a common grid, pass
allow_resample=TruetoCollection.to_torchgeo_dataset(...).
For train/val/test splits, see ML Training with Splits.