Changelog¶
v0.3.13¶
Added¶
core/reader_pool.py—AsyncCOGReaderPool: new internal module that owns the background event loop and thread for all async COG reads. Previously the adapter had its own private async machinery and the other read surfaces recreated a newCOGReaderper call viarun_sync. Now there is one place.core/window_read.py—read_collection_window(): new internal module for chip-level pixel reads. Reads band metadata, asset URLs, and CRS directly from Arrow columns instead of materialising full Python objects per row viaiterate_rasters().BandRegistryalias resolution happens once per call, not once per row × band.Collection.read_window(group_by="datetime"): all byte-range requests across every timestep are planned upfront and dispatched in a singleasyncio.gather. A 15-scene time-series is one round of concurrent fetches, not 15 serial ones.
Changed¶
- All sync read surfaces share one
AsyncCOGReaderPool:get_numpy,get_xarray,get_gdf, andread_windowpreviously used two different concurrency models —run_sync(create/destroy aCOGReaderper call) for the former three, and a persistent pool only forread_window. All four now go through the same shared pool on theCollection.run_syncis removed fromexecution.py.
Removed¶
- Built-in TorchGeo adapter:
Collection.to_torchgeo_dataset()andRasteretGeoDatasetare removed (~1 250 lines of adapter code, ~1 060 lines of tests). The adapter duplicated TorchGeo's ownGeoDatasetcontract and kept a private async read path that is now properly centralised incore/. The TorchGeo integration lives in TorchGeo itself asRasteretDatasetintorchgeo.datasets(TorchGeo 0.10+). Rasteret's side of the boundary isto_table()+read_window(). Migrate with:
Fixed¶
SyntaxWarninginreader_pool.py:returninside afinallyblock is aSyntaxWarningin Python 3.11+; replaced with try/except and return outside the finally.-
reader_poolnow a required parameter inexecution.py: was typedOptionalbut called unconditionally, turning a missing pool into anAttributeErroronNoneinstead of a clearTypeError. -
Tabular AOIs for polygon reads:
get_numpy(...),get_xarray(...), andget_gdf(...)now acceptgeometry_column=...whengeometriesis an Arrow/GeoArrow AOI table or Arrow C stream producer. - Arrow-native point-table metadata:
sample_points(...)now preserves non-geometry point-table columns in the returned Arrow table. - GeoArrow CRS inference for AOIs and points: Rasteret now uses GeoArrow CRS
metadata when available and requires
geometry_crs=...for tabular geometry inputs that do not carry CRS metadata.
Changed¶
- Local COG files work in custom builds:
build_from_stac()andbuild_from_table(..., enrich_cog=True)can now index local tiled GeoTIFF/COG files referenced by normal paths orfile://URLs. - Clearer STAC band-map checks:
build_from_stac()now catches missing or mismatched STAC asset names before reading raster headers. For common Sentinel-2-style catalogs withB01/B02asset names, Rasteret can infer the map when no provider-specific map is registered. - GeoDataFrame pixel outputs preserve AOI metadata:
get_gdf(...)joins non-geometry AOI table columns back to results bygeometry_id. - Runtime geometry docs now focus on user tools: the former enriched collection workflow guide is now Bring Your Own AOIs, Points, And Metadata.
- UTC-strict datetime contract across build/load paths:
build_from_table()and STAC table normalization now accept RFC3339 datetime strings only when they carry explicit offsets, normalize them to UTC instants, and reject naive datetime strings.as_collection()andload()now fail fast whendatetimeis not an Arrow timestamp column or uses a non-UTC timezone. - GeoParquet build date filtering now stays correct for string datetimes:
when a dataset descriptor points at a source table whose datetime column is
stored as RFC3339 strings,
build(..., date_range=...)now normalizes the table first and applies the date filter afterward instead of silently relying on unsafe string pushdown semantics. - Static STAC date filtering now parses real RFC3339 timestamps: local/static catalog builds compare item datetimes as UTC instants instead of lexicographic strings, so offset-bearing timestamps filter correctly.
- TorchGeo index creation keeps UTC-compatible datetime semantics: tz-aware collection datetimes are normalized before TorchGeo interval index creation, avoiding mixed timezone behavior in dataset composition.
v0.3.12¶
Fixed¶
- Planar multi-band point sampling batching:
sample_points()now batches distinct per-band tile ranges for planar COGs such as AEF while preserving per-band offsets and values. - Point sampling concurrency:
match="all"now samples matching rasters in bounded batches with one shared COG reader instead of awaiting each raster serially. - Point-aware index filtering: point sampling now uses per-point GeoArrow geometry filters instead of one broad point-envelope bbox when narrowing indexed collections.
- Indexed collection shard narrowing on remote filesystems:
source_partpruning now preserves the backing PyArrow filesystem and avoids reapplying record-index filters to already-planned scan datasets. - Source-part direct scans: indexed reads now open only the selected
source_partParquet shards when available instead of opening the full/datadataset before pruning.
v0.3.11¶
Added¶
- Native Arrow / GeoArrow Collection export:
Collectionnow implements Arrow PyCapsule protocol surfaces for stream, array, and schema export, allowing Arrow-compatible consumers to read collection metadata directly. - GeoArrow WKB footprint metadata: Rasteret now exports the
geometryfootprint column asgeoarrow.wkbwith CRS metadata set toOGC:CRS84. - Row-level raster CRS sidecar: enriched and normalized collections now carry
an Arrow-friendly
crsstring sidecar such asEPSG:32632, while preserving existingproj:epsgfor Rasteret runtime compatibility. Collection.crs: returns unique row-level raster CRS codes as strings, usingcrswhen present and falling back to legacyproj:epsg.- Arrow inputs for table builders:
build_from_table(...)now accepts Arrow-compatible in-memory objects and PyCapsule producers, including DuckDB/Polars-style Arrow stream exporters.
Changed¶
- Footprint CRS and raster CRS are now separated in Arrow exports:
GeoArrow metadata on
geometrydescribes the footprint geometry CRS (OGC:CRS84), while raster-native CRS remains row-level metadata incrsandproj:epsg. - COG enrichment writes consistent CRS sidecars: when parsed GeoTIFF header
CRS is used to backfill CRS metadata, Rasteret now keeps
crsandproj:epsgaligned. - Hugging Face streaming collections participate in Arrow export: HF-backed collections now expose enriched Arrow schemas and batches instead of appearing empty to Arrow consumers.
- Band/cloud read errors are more actionable: unresolved band requests and cloud read failures now include available asset/metadata keys, data-source mapping context, and provider-specific credential/requester-pays hints.
- Point sampling output schema is leaner:
sample_points()no longer emitscloud_coverby default, keeping the result focused on point, record, band, value, and CRS fields. - Docs now follow a reader-first path: the MkDocs nav now keeps Getting Started, Concepts, and Transitioning from Rasterio as the first linear path, with task-focused How-To Guides before notebook Tutorials.
- How-to and explanation docs were consolidated: table/Arrow workflows, collection management, catalog/cloud access, point sampling, TorchGeo, AEF, schema, interop, and benchmark pages were tightened around the current Collection APIs and CRS model.
- Benchmark docs now include the README evidence set: the benchmark page now includes the Google Earth Engine time-series table and the supporting benchmark/cost/scaling images from the docs asset set.
- AEF similarity notebook clarity: renamed intermediate variables by access path, removed the DuckDB pivot from the reference-vector step, and made each approach produce comparable match polygons.
- Record table terminology cleanup: refreshed Parquet/Arrow table examples
to prefer "record table" wording while keeping
--manifest-urlas a legacy CLI alias.
Fixed¶
- Fixed Arrow single-batch export for empty collections with non-empty schemas.
- Fixed requested-schema PyCapsule handling by importing requested Arrow schema capsules before applying compatible PyArrow casts.
- Fixed mixed-raster-CRS Arrow export so raster CRS is not incorrectly written as geometry column CRS.
- GeoDataFrame raster transforms:
get_gdf(...)rows now include the read-windowtransform, making record-wise array outputs easier to vectorize or map without reconstructing georeferencing in user code.
Tested¶
- Full test suite during Arrow interop validation:
364 passed, 42 skipped. - Verified Arrow interop with PyArrow table/reader/capsule paths.
- Verified
GeoDataFrame.from_arrow(collection)reads footprint geometry asOGC:CRS84. - Verified DuckDB 1.5.1 imports geometry as
GEOMETRY('OGC:CRS84')and keepscrsasVARCHAR. - Verified LanceDB 0.30.2 preserves
geoarrow.wkband thecrssidecar. - Verified docs with
uv run --extra docs mkdocs build --strict -d /tmp/rasteret-mkdocs-check.
v0.3.10¶
Changed¶
- AEF load alias routing:
rasteret.load("aef/v1-annual")now opens the published Source Cooperative/datacollection while keepingindex.parquetattached as the runtime sidecar index for pushdown. The publicload()implementation did not change; this is a catalog descriptor routing fix. - Examples cleanup: removed the older
ml_training_with_splits.pyscript; the split/label pattern remains documented in the ML training how-to and TorchGeo tutorial notebooks. - AEF how-to refresh: rewrote the AEF embeddings guide around the
maintained
rasteret.load("aef/v1-annual")flow and moved build/DuckDB guidance to custom-data use cases.
Fixed¶
- Collection export bbox normalization:
Collection.export()now normalizes legacy list-stylebboxcolumns through a chunk-safe struct conversion, avoidingChunkedArray/duplicatebboxfailures on export-reload workflows.
v0.3.9¶
Changed¶
- Filtered CRS auto-detection for xarray reads: internal batch iteration
now respects active collection filters when scanning
proj:epsgforget_xarray(...)auto-CRS selection. This avoids unintended reprojection on filtered subsets. - Nodata-aware integer xarray merge path: for
xr_combine="combine_first", Rasteret now prefers an integer-preserving merge path when_FillValuemetadata is present and consistent across overlapping datasets. - Per-band nodata propagation into xarray assembly: COG read nodata is
carried through RasterAccessor band results and attached as
_FillValueduring xarray construction.
Fixed¶
- AEF
get_xarraymemory blow-up regression: fixed a path where filtered reads could trigger unnecessary reprojection andfloat32upcast, causing significantly higher peak RAM than expected. - TorchGeo instant-timestamp slice regression:
RasteretGeoDataset.__getitem__now handles zero-length temporal slices (t.start == t.stop) with a closed interval so overlap checks do not drop valid records/chips.
Docs¶
- Updated
notebooks/07_aef_similarity_search.ipynbwith explicit memory notes for Franklin County / AEF 64-band runs: get_xarray: highest peak memory (~20-24 GB)get_gdf: medium peak memory (~3 GB)to_torchgeo_dataset: lower bounded-memory path (~3-5 GB)- Updated the notebook summary table to include these concrete memory ranges.
- Updated AEF notebooks to clarify that the AEF collection is already prebuilt
for 2018–2024 (Source Cooperative + Hugging Face), and that
rasteret.build()can be skipped in these workflows. - Updated parquet how-to/examples to use record table terminology and a first-run-friendly default SourceCoop Maxar record table.
Tested¶
- Verified
_detect_target_crs(...)on filtered subsets now returnsNonewhen all selected records are same-CRS. - Verified
sub.get_xarray(...)on AEF Franklin subset preservesint8dtype after merge. - Reproduced and fixed TorchGeo chip sampling regression:
from
used=0, skipped=16toused=16, skipped=0on the same Franklin sampler workload.
v0.3.8 - Hotfix¶
- Fix catalog.py to use sourcecoop and HF AEF index.parquet instead of reading large collection parquets
v0.3.7¶
Changed¶
- TorchGeo adapter semantic alignment:
Collection.to_torchgeo_dataset(...)now follows TorchGeoRasterDatasetquery behavior more closely. time_series=True: applies temporal overlap from the sampler/query slice, then stacks selected records into[T, C, H, W].time_series=False: mosaics overlapping records on the query grid with first-record precedence and nodata-aware filling.- TorchGeo docs clarity: interop and reference docs now describe the updated time-series and non-time-series slice behavior.
Tested¶
- Expanded
test_torchgeo_error_propagationwith semantic coverage for: - query-temporal filtering in
time_series=True - overlapping-record mosaicking in
time_series=False - Updated
test_torchgeo_networkwording to match the query-slice semantics.
v0.3.6¶
Added¶
- Bounded nodata fallback for
sample_points(): point sampling can now search outward from the base pixel under the input point usingmax_distance_pixels, measured in Chebyshev distance (square rings). - Neighbourhood window output for
sample_points():return_neighbourhoodnow supports: "off": no neighbourhood column"always": always return the full searched window"if_center_nodata": return the window only when the base pixel is nodata/NaN
Changed¶
- Point sampling semantics are now explicit and bounded:
sample_points()still samples the pixel containing the point first, but when that base pixel is nodata/NaN andmax_distance_pixels > 0, Rasteret searches outward ring by ring and chooses the closest valid candidate by exact point-to-pixel-rectangle distance. - Neighbourhood output schema: when
return_neighbourhood != "off",sample_points()returns a nullableneighbourhood_valueslist column in row-major order. With"if_center_nodata", rows whose base pixel is valid keepneighbourhood_values = NULL. - Neighbourhood mode validation: neighbourhood output now requires
max_distance_pixels > 0so the requested window size is always explicit.
Tested¶
- Expanded
test_executionfor bounded nodata fallback, exact-distance winner selection, cross-tile fallback, neighbourhood window output, NULL-window behavior for"if_center_nodata", and empty-result schema stability. - Expanded
test_public_api_surfacefor the newsample_points()arguments.
v0.3.5¶
Added¶
- Descriptor-backed dual-surface planning: catalog entries can now
describe separate record-table, index, and collection surfaces via
record_table_uri,index_uri, andcollection_uri. This lets Rasteret keep build-time metadata sources separate from published runtime collections. Collection.head(): first-class metadata preview API that uses the narrow record index when available instead of forcing a wide collection scan.- Internal Parquet read planner:
ParquetReadPlannerkeeps index-side and wide-scan filter state coherent acrosssubset(),where(),len(),head(), pixel reads, and TorchGeo entry points. - Hugging Face streaming runtime:
HFStreamingSourceand batch iterators now provide a stable metadata path forhf://datasets/...collections without routing through the olderdatasetsstreaming shutdown path.
Changed¶
- Index-first runtime filtering: when a collection has both a narrow index and a wide read-ready surface, Rasteret now plans metadata filters against the index first, then carries compatible predicates into the wide scan.
- GeoParquet bbox contract: bbox filtering now targets the canonical
bboxstruct (xmin,ymin,xmax,ymax) instead of older scalar bbox columns. Catalog tests, CLI fixtures, and filter tests were updated to match the GeoParquet 1.1 shape. - TorchGeo filter propagation:
to_torchgeo_dataset()now respects collection filters and geometry / bbox narrowing before chip sampling starts, reducing unnecessary raster candidates for gridded reads. - Local Parquet metadata reuse: local datasets prefer a
_metadatasidecar when present, and Rasteret now keeps an in-process Parquet dataset cache to reduce repeated footer/schema setup during interactive sessions. - Catalog surface reporting: CLI output and descriptor helpers now surface
record-table, index, and collection paths explicitly instead of collapsing
everything into
geoparquet_uri.
Fixed¶
- Hugging Face filter execution: Arrow expression handling on HF-backed metadata batches now fails more clearly and avoids the unstable runtime path that could crash during shutdown or long scans.
- AEF runtime aliasing: the built-in AEF descriptor now points at the published Terrafloww collection/index surfaces with explicit field-role and filter-capability hints.
- Read-path correctness on filtered collections:
len(),head(),sample_points(), TorchGeo reads, and other collection-backed reads now stay aligned when filters are staged across both record-index and wide-data surfaces.
Tested¶
- Expanded
test_collection_filtersfor bbox-struct filtering and TorchGeo prefilter behavior. - Expanded
test_catalog,test_cli, andtest_public_api_surfacefor the new descriptor surface roles and runtime load/build paths. - Expanded
test_huggingface,test_execution, andtest_torchgeo_adapterfor the new HF runtime path, filter propagation, and collection-read behavior.
v0.3.4¶
Added¶
- HuggingFace integration:
rasteret.load("hf://datasets/<org>/<repo>")resolves remote Parquet shards viahuggingface_huband loads them as a Collection. No local clone or download step needed. - AEF v1 Annual Rasteret Collection: prebuilt read-ready collection published on HuggingFace and Source Cooperative. 235K tiles, 64-band int8 embeddings, global coverage 2017–2024.
- AEF similarity search tutorial (
notebooks/07_aef_similarity_search): end-to-end embedding similarity search usingsample_points,get_xarray,get_gdf, and TorchGeoGridGeoSampler. Demonstrates DuckDB Arrow-native pivot for reference vectors and lonboard GPU-accelerated visualization.
Changed¶
- Unified tile engine:
RasterAccessorconsolidated into a single_read_tile()code path shared byget_xarray,get_numpy,get_gdf, and TorchGeo__getitem__. Four divergent tile-read implementations replaced with one. - COGReader simplified: removed duplicate decode paths, tightened the fetch → decompress → crop pipeline, reduced code surface by ~200 lines.
- TorchGeo edge-chip hardening: empty-read validation returns nodata-filled tensors for chips outside coverage; positive-overlap filtering skips false bbox-only candidates; fallback loop fills chips with nodata when all candidate tiles fail instead of crashing the DataLoader.
- Error surfacing:
get_gdfandget_numpywarn on partial band/geometry/record failures instead of returning silent empty results. Point geometry AOIs raiseUnsupportedGeometryErrorpointing tosample_points(). Exception chaining (raise ... from e) throughout the read pipeline.
Fixed¶
sample_pointsCOG read path: tile metadata validation was skipping valid tiles when the source raster had multiple matching records.xr_combineparameter now correctly plumbed throughget_xarray().
Tested¶
test_torchgeo_error_propagation: edge-chip, empty-read, and fallback loop coverage.test_huggingface: URI resolution and table loading mocks.test_public_api_surface: validates all public imports fromrasteret.test_public_network_smoke: live AEF reads and point sampling parity againstrasterio.sample().- Extended
test_executionwithget_gdferror path coverage.
v0.3.3¶
Performance¶
- Arrow-batch native point sampling:
sample_points()internals now use vectorized NumPy gathers andTable.from_batchesinstead of per-sample Python append +pa.concat_tables. Eliminates row materialization in the hot loop. - COGReader session reuse:
read_cog()accepts a sharedreader=parameter. Point sampling across multiple rasters now reuses a singleCOGReader(and its HTTP/2 connection pool) instead of creating one per raster, reducing connection overhead for multi-scene workloads.
Refactored¶
- Dedicated
point_samplingmodule: point sampling ownership moved fromexecution.pytorasteret.core.point_sampling.execution.pystays focused on area/chip reads (get_xarray,get_numpy,get_gdf). POINT_SAMPLES_SCHEMAdefined intypes.pyas the single source of truth for point sample output columns. Nullable fields explicitly modeled.- Point input helpers consolidated:
ensure_point_geoarrow,candidate_point_indices_for_raster, and related helpers moved intogeometry.py. - Strict tile/source alignment guard in
RasterAccessor.sample_points: validates that tile metadata matches the source raster before sampling.
Changed¶
- Landing page (
docs/index.md) code example simplified: single PyArrow import, cleanersample_pointscall, HF benchmark collapsed into admonition.
v0.3.2¶
Added¶
Collection.sample_points(): first-class point sampling API returning apyarrow.Table(point_index,record_id,datetime,band,value, CRS columns, and metadata fields). Supportsmatch="all"andmatch="latest".- New guide: Point Sampling and Masking.
Changed¶
- Masking controls surfaced at Collection level:
get_xarray(),get_gdf(), andget_numpy()now acceptall_touched=...directly. - Table-native point sampling inputs:
Collection.sample_points(...)now accepts Arrow tables and common dataframe/relation inputs withx_column/y_columnorgeometry_column(WKB/GeoArrow/shapely point column). - Point sampling aligns to rasterio
sample()semantics (nearest-pixel index math) for deterministic parity on real COGs. - Core read internals are more Arrow-native:
iterate_rasters()now scans columnar batches (nobatch.to_pylist()in the core read iterator),infer_data_source()uses filteredscanner.head(1),- multi-CRS detection in xarray path streams
proj:epsgcounts per batch, sample_points()uses vectorizedgeoarrow.point_coords.- Error messaging for non-OGC binary geometry input now explicitly points
DuckDB users to
ST_AsWKB(geom)when needed. - Network parity coverage is tighter:
- AOI windowing now matches rasterio
geometry_window()edge semantics exactly (fixes the WorldCover 1-pixel mismatch), - transient STAC API timeouts are retried during live builds,
- the AEF south-up TorchGeo oracle path is corrected for manual/explicit
parity runs via
WarpedVRT.
Tested¶
- Public network smoke:
test_public_network_smoke.pypasses with--network, including point sampling parity againstrasterio.sample(). - Network smoke (
test_network_smoke.py) passes for available providers (expected skips remain for missing optional extras/credentials).
Packaging¶
sedonadb>=0.2.0added to theexamplesextra for Arrow-native point workflow examples.
v0.3.1¶
Added¶
as_collection(): lightweight re-entry from an in-memory Arrow table or dataset back into a Collection. Validates contract columns and COG band metadata structs without re-running ingest, enrichment, or persistence. Use this after enriching a Collection's table with DuckDB, Polars, PyArrow, or any other tool. See Enriched Collection Workflows.
Changed¶
- Lifecycle docs: all entry points (
build*,load,as_collection) now cross-reference each other in docstrings and how-to guides. Contributing guide updated with four-layer architecture (Build → Query → Read → Re-entry). - Major TOM on-the-fly example uses
as_collection()+ explicitexport()instead ofbuild_from_table(enrich_cog=False).enrich_major_tom_columns()now preservesyear/monthpartition columns from the base Collection.
v0.3.0¶
Highlights¶
- License changed from AGPL-3.0-only to Apache-2.0.
- Dataset catalog:
build()with 12 pre-registered datasets across Earth Search, Planetary Computer, and AlphaEarth Foundation. Catalog entries can point to a STAC API or a GeoParquet file.register_local()for adding your own. build_from_stac()andbuild_from_table(): build a Collection from any STAC API or any Parquet/GeoParquet file with COG URLs (Source Cooperative exports, STAC GeoParquet, custom catalogs). No STAC API required for the table path. Optionalenrich_cog=Trueparses COG headers for accelerated reads.- Multi-cloud support: S3, Azure Blob, and GCS routing via URL auto-detection, with automatic fallback to anonymous access. obstore replaces direct aiohttp as the HTTP transport, adding unified multi-cloud routing.
create_backend()for authenticated reads with credential providers (e.g., Planetary Computer SAS tokens).- TorchGeo adapter:
collection.to_torchgeo_dataset()returns aGeoDatasetbacked by Rasteret's async COG reader. Supportstime_series=True([T, C, H, W]output),label_fieldfor per-sample labels,target_crsfor cross-CRS reprojection,allow_resample=Truefor mixed-resolution bands, andis_image=Falsefor mask-style datasets. Works with all TorchGeo samplers, collation helpers, and dataset composition (IntersectionDataset,UnionDataset). - Native dtype preservation: COG tiles return in their source dtype (uint16, int8, float32, etc.) instead of forcing float32.
- Rasterio-aligned masking: AOI reads default to
all_touched=Falseand fill outside-coverage pixels withnodatawhen present, otherwise0.read_cogreturns avalid_mask. - rioxarray removed: CRS encoding uses pyproj CF conventions directly.
The
xarrayextra no longer pulls rioxarray. - Extended TIFF header parsing: nodata, SamplesPerPixel, PlanarConfiguration, PhotometricInterpretation, ExtraSamples, GeoDoubleParams CRS support.
- Multi-CRS auto-reprojection: queries spanning multiple UTM zones
reproject to the most common CRS using GDAL's
calculate_default_transform. get_numpy(): lightweight NumPy output path returning[N, H, W](single band) or[N, C, H, W](multi-band) arrays. No extra dependencies beyond NumPy. Accepts bbox tuples, Arrow arrays, Shapely objects, or raw WKB.get_gdf(): GeoDataFrame output path for analysis workflows.- Enriched Parquet workflows: append arbitrary columns (splits, labels, AOI polygons, model scores) to a Collection's Parquet, query with DuckDB/PyArrow, and fetch pixels for matching rows on demand. See Enriched Collection Workflows.
- Major TOM on-the-fly: example workflow rebuilding Major TOM-style
patch-grid semantics from source Sentinel-2 COGs instead of
payload-in-Parquet. Benchmarked 3.9-6.5x faster than HF
datasetsParquet-filter reads. earthdataoptional extra:pip install rasteret[earthdata]for NASA Earthdata auto-credential detection.
Collection API¶
- Output paths:
get_xarray(),get_numpy(),get_gdf(),to_torchgeo_dataset(). All share the same async tile I/O underneath. - Inspection:
.bands,.bounds,.epsg,len(),__repr__(),.describe(),.compare_to_catalog(). - Filtering:
collection.subset(cloud_cover_lt=..., date_range=..., bbox=..., geometries=..., split=...)andcollection.where(expr)for raw Arrow expressions. - Sharing:
collection.export("path/")writes a portable copy;rasteret.load("path/")reloads it.list_collections()discovers cached collections in the workspace. - Three-tier schema: required columns (
id,datetime,geometry,assets), COG acceleration columns (per-band tile offsets and metadata), and user-extensible columns (split,label,cloud_cover, custom metadata). See Schema Contract. - Public exports:
Collection,CloudConfig,BandRegistry,DatasetDescriptor,DatasetRegistryare all importable fromrasteret.
Other changes¶
- Arrow-native geometry internals (GeoArrow replaces Shapely in hot paths).
- obstore as base dependency (multi-cloud support).
- CLI:
rasteret collections build|list|info|delete|import,rasteret datasets list|info|build|register-local|export-local|unregister-local. - TorchGeo
time_series=Trueuses spatial-only intersection, matching TorchGeo's ownRasterDatasetbehaviour where all spatially overlapping records are stacked regardless of the sampler's time slice. - Cloud workspace URIs (e.g.
s3://bucket/path) are preserved correctly inCollectionBuilderbase class.
Tested¶
- All four output paths (xarray, GeoDataFrame, NumPy, TorchGeo) tested against direct rasterio reads across 12 datasets (Sentinel-2, Landsat, NAIP, Copernicus DEM, ESA WorldCover, AEF, and more).
- TorchGeo adapter verified against the full GeoDataset contract:
IntervalIndex, samplers, collation, dataset composition, cross-CRS reprojection, and export/reload roundtrips.
Requirements¶
- Python 3.12+ required.
rasterio>=1.4.3,<1.5.0is a core dependency (used for geometry masking, CRS reprojection, and TorchGeo query-grid placement; not in the tile-read path).
Breaking changes¶
get_xarray()returns data in native COG dtype instead of always float32. Code that assumed float32 output may need adjustment.- The
xarrayextra no longer installs rioxarray. If you depend onds.rio.*methods, install rioxarray separately.