Skip to content

rasteret.core.execution

Data loading pipeline: iterate collection records, fetch tiles, and merge results into xarray or GeoDataFrame outputs.

bands is required because each raster in the collection may contain dozens of assets (bands). Specifying which bands to load avoids fetching unnecessary data and keeps memory usage predictable.

execution

Data loading pipeline for Collection reads.

This module orchestrates the read path:

  1. Iterate records in the Collection
  2. Load bands concurrently via COGReader
  3. Merge results into xarray.Dataset or geopandas.GeoDataFrame

Users access this via Collection.get_xarray() and Collection.get_gdf().

Classes

Functions

get_collection_xarray

get_collection_xarray(
    *,
    collection: "Collection",
    geometries: Any,
    bands: list[str],
    data_source: str | None = None,
    max_concurrent: int = 50,
    backend: object | None = None,
    target_crs: int | None = None,
    **filters: Any,
) -> Dataset

Load selected bands as an xarray.Dataset.

Parameters:

Name Type Description Default
collection Collection

Source collection.

required
geometries bbox tuple, pa.Array, Shapely, WKB bytes, or GeoJSON dict

Area(s) of interest.

required
bands list of str

Band codes to load (e.g. ["B04", "B08"]).

required
data_source str

Override the inferred data source for band mapping and URL signing.

None
max_concurrent int

Maximum concurrent HTTP requests (default 50).

50
backend StorageBackend

Pluggable I/O backend (e.g. ObstoreBackend).

None
target_crs int

Reproject all records to this EPSG code before merging. When None and the collection spans multiple CRS zones, auto-reprojection to the most common CRS is triggered.

None
filters kwargs

Additional keyword arguments forwarded to Collection.subset().

{}

Returns:

Type Description
Dataset

Band arrays in native COG dtype (e.g. uint16 for Sentinel-2). CRS is encoded via CF conventions (spatial_ref coordinate with WKT2, PROJJSON, and GeoTransform). Multi-CRS queries are auto-reprojected.

Examples:

>>> ds = get_collection_xarray(
...     collection=col,
...     geometries=(77.55, 13.01, 77.58, 13.08),
...     bands=["B04", "B08"],
... )
>>> ds.B04.dtype
dtype('uint16')
Source code in src/rasteret/core/execution.py
def get_collection_xarray(
    *,
    collection: "Collection",
    geometries: Any,
    bands: list[str],
    data_source: str | None = None,
    max_concurrent: int = 50,
    backend: object | None = None,
    target_crs: int | None = None,
    **filters: Any,
) -> xr.Dataset:
    """Load selected bands as an ``xarray.Dataset``.

    Parameters
    ----------
    collection : Collection
        Source collection.
    geometries : bbox tuple, pa.Array, Shapely, WKB bytes, or GeoJSON dict
        Area(s) of interest.
    bands : list of str
        Band codes to load (e.g. ``["B04", "B08"]``).
    data_source : str, optional
        Override the inferred data source for band mapping and URL signing.
    max_concurrent : int
        Maximum concurrent HTTP requests (default 50).
    backend : StorageBackend, optional
        Pluggable I/O backend (e.g. ``ObstoreBackend``).
    target_crs : int, optional
        Reproject all records to this EPSG code before merging. When
        ``None`` and the collection spans multiple CRS zones,
        auto-reprojection to the most common CRS is triggered.
    filters : kwargs
        Additional keyword arguments forwarded to ``Collection.subset()``.

    Returns
    -------
    xarray.Dataset
        Band arrays in native COG dtype (e.g. ``uint16`` for Sentinel-2).
        CRS is encoded via CF conventions (``spatial_ref`` coordinate with
        WKT2, PROJJSON, and GeoTransform). Multi-CRS queries are
        auto-reprojected.

    Examples
    --------
    >>> ds = get_collection_xarray(
    ...     collection=col,
    ...     geometries=(77.55, 13.01, 77.58, 13.08),
    ...     bands=["B04", "B08"],
    ... )
    >>> ds.B04.dtype
    dtype('uint16')
    """
    import xarray as xr

    # Auto-detect multi-CRS to prevent silent spatial data corruption
    # from merging tiles with incompatible coordinate systems.
    if target_crs is None:
        target_crs = _detect_target_crs(collection, filters)

    def _merge(datasets):
        logger.info("Merging %s datasets", len(datasets))
        merged = xr.merge(datasets, join="outer", compat="override")
        if "time" in merged.coords:
            return merged.sortby("time")
        return merged

    return _load_and_merge(
        collection=collection,
        geometries=geometries,
        bands=bands,
        for_xarray=True,
        merge_fn=_merge,
        data_source=data_source,
        max_concurrent=max_concurrent,
        backend=backend,
        target_crs=target_crs,
        **filters,
    )

get_collection_gdf

get_collection_gdf(
    *,
    collection: "Collection",
    geometries: Any,
    bands: list[str],
    data_source: str | None = None,
    max_concurrent: int = 50,
    backend: object | None = None,
    target_crs: int | None = None,
    **filters: Any,
) -> GeoDataFrame

Load selected bands as a geopandas.GeoDataFrame.

Parameters:

Name Type Description Default
collection Collection

Source collection.

required
geometries bbox tuple, pa.Array, Shapely, WKB bytes, or GeoJSON dict

Area(s) of interest.

required
bands list of str

Band codes to load.

required
data_source str

Override the inferred data source.

None
max_concurrent int

Maximum concurrent HTTP requests (default 50).

50
backend StorageBackend

Pluggable I/O backend.

None
target_crs int

Reproject all records to this EPSG code before building the GeoDataFrame.

None
filters kwargs

Additional keyword arguments forwarded to Collection.subset().

{}

Returns:

Type Description
GeoDataFrame

Band arrays in native COG dtype. Each row is a geometry-record pair with pixel data as columns.

Source code in src/rasteret/core/execution.py
def get_collection_gdf(
    *,
    collection: "Collection",
    geometries: Any,
    bands: list[str],
    data_source: str | None = None,
    max_concurrent: int = 50,
    backend: object | None = None,
    target_crs: int | None = None,
    **filters: Any,
) -> gpd.GeoDataFrame:
    """Load selected bands as a ``geopandas.GeoDataFrame``.

    Parameters
    ----------
    collection : Collection
        Source collection.
    geometries : bbox tuple, pa.Array, Shapely, WKB bytes, or GeoJSON dict
        Area(s) of interest.
    bands : list of str
        Band codes to load.
    data_source : str, optional
        Override the inferred data source.
    max_concurrent : int
        Maximum concurrent HTTP requests (default 50).
    backend : StorageBackend, optional
        Pluggable I/O backend.
    target_crs : int, optional
        Reproject all records to this EPSG code before building the
        GeoDataFrame.
    filters : kwargs
        Additional keyword arguments forwarded to ``Collection.subset()``.

    Returns
    -------
    geopandas.GeoDataFrame
        Band arrays in native COG dtype. Each row is a geometry-record
        pair with pixel data as columns.
    """
    return _load_and_merge(
        collection=collection,
        geometries=geometries,
        bands=bands,
        for_xarray=False,
        merge_fn=lambda dfs: gpd.GeoDataFrame(pd.concat(dfs, ignore_index=True)),
        data_source=data_source,
        max_concurrent=max_concurrent,
        backend=backend,
        target_crs=target_crs,
        **filters,
    )