Skip to content

rasteret

Top-level entry points for building, loading, and extending collections.

Most users need only a few of these:

  • build() - build a Collection from the catalog by ID.
  • build_from_stac() / build_from_table() - full-control builders for STAC APIs or existing Parquet.
  • load() - reload a previously built Collection from Parquet.
  • as_collection() - wrap a read-ready Arrow table/dataset as a Collection (no enrich/persist).
  • register() - add a custom catalog entry to the in-memory registry.
  • register_local() - register a local Collection as a catalog entry (persists to ~/.rasteret/datasets.local.json).
  • create_backend() - create an authenticated I/O backend for multi-cloud reads.

Entrypoint semantics:

Function Use when Rebuild/enrich?
build() / build_from_stac() / build_from_table() Ingesting external sources into Rasteret schema Yes
load() Reopening an existing persisted Collection No
as_collection() Re-wrapping a read-ready Arrow table/dataset in memory No

See Getting Started for usage examples.

rasteret

Functions

build

build(
    dataset: str,
    *,
    name: str,
    bbox: tuple[float, float, float, float] | None = None,
    date_range: tuple[str, str] | None = None,
    workspace_dir: str | Path | None = None,
    force: bool = False,
    max_concurrent: int = 50,
    query: dict[str, Any] | None = None,
    prefer_geoparquet: bool = False,
    stac_api: str | None = None,
    cloud_config: Any = None,
    backend: "StorageBackend | None" = None,
) -> "Collection"

Build a Collection from a registered dataset.

Looks up dataset in the :class:~rasteret.catalog.DatasetRegistry and routes to :func:build_from_stac or :func:build_from_table based on the descriptor's access fields.

The public AlphaEarth Foundation dataset ("aef/v1-annual") is already published as a read-ready Rasteret Collection, so this function delegates that ID to :func:load instead of rebuilding it.

For descriptors backed only by record_table_uri (for example local collections registered with :func:register_local), bbox and date_range are optional and ignored.

For auth-required datasets, Rasteret can auto-create a backend from a descriptor's s3_credentials_url when no explicit backend is passed. This requires valid credentials in the environment (or ~/.netrc) for the relevant provider.

Parameters:

Name Type Description Default
dataset str

Registry ID (e.g. "earthsearch/sentinel-2-l2a").

required
name str

Logical name for the collection.

required
bbox tuple of float

(minx, miny, maxx, maxy) bounding box. Required for STAC-backed descriptors.

None
date_range tuple of str

(start, end) ISO date strings. Required for STAC-backed descriptors.

None
workspace_dir str or Path

Cache directory. Defaults to ~/rasteret_workspace.

None
force bool

Rebuild even if a cache already exists.

False
max_concurrent int

Maximum concurrent COG header fetches.

50
query dict

Additional STAC search parameters.

None
prefer_geoparquet bool

Use the GeoParquet path when available.

False
stac_api str

Override the descriptor's default STAC API endpoint.

None
cloud_config CloudConfig

Cloud configuration for URL rewriting.

None
backend StorageBackend

I/O backend for authenticated range reads. When omitted, Rasteret auto-creates one for known auth-required datasets. See :func:create_backend.

None

Returns:

Type Description
Collection

Raises:

Type Description
KeyError

If dataset is not in the registry.

ValueError

If the descriptor has no configured access method, or if auth is required but no backend could be created.

Source code in src/rasteret/__init__.py
def build(
    dataset: str,
    *,
    name: str,
    bbox: tuple[float, float, float, float] | None = None,
    date_range: tuple[str, str] | None = None,
    workspace_dir: str | Path | None = None,
    force: bool = False,
    max_concurrent: int = 50,
    query: dict[str, Any] | None = None,
    prefer_geoparquet: bool = False,
    stac_api: str | None = None,
    cloud_config: Any = None,
    backend: "StorageBackend | None" = None,
) -> "Collection":
    """Build a Collection from a registered dataset.

    Looks up *dataset* in the :class:`~rasteret.catalog.DatasetRegistry`
    and routes to :func:`build_from_stac` or :func:`build_from_table`
    based on the descriptor's access fields.

    The public AlphaEarth Foundation dataset (``"aef/v1-annual"``) is already
    published as a read-ready Rasteret Collection, so this function delegates
    that ID to :func:`load` instead of rebuilding it.

    For descriptors backed only by ``record_table_uri`` (for example local
    collections registered with :func:`register_local`), ``bbox`` and
    ``date_range`` are optional and ignored.

    For auth-required datasets, Rasteret can auto-create a backend from a
    descriptor's ``s3_credentials_url`` when no explicit *backend* is passed.
    This requires valid credentials in the environment (or ``~/.netrc``)
    for the relevant provider.

    Parameters
    ----------
    dataset : str
        Registry ID (e.g. ``"earthsearch/sentinel-2-l2a"``).
    name : str
        Logical name for the collection.
    bbox : tuple of float, optional
        ``(minx, miny, maxx, maxy)`` bounding box.
        Required for STAC-backed descriptors.
    date_range : tuple of str, optional
        ``(start, end)`` ISO date strings.
        Required for STAC-backed descriptors.
    workspace_dir : str or Path, optional
        Cache directory. Defaults to ``~/rasteret_workspace``.
    force : bool
        Rebuild even if a cache already exists.
    max_concurrent : int
        Maximum concurrent COG header fetches.
    query : dict, optional
        Additional STAC search parameters.
    prefer_geoparquet : bool
        Use the GeoParquet path when available.
    stac_api : str, optional
        Override the descriptor's default STAC API endpoint.
    cloud_config : CloudConfig, optional
        Cloud configuration for URL rewriting.
    backend : StorageBackend, optional
        I/O backend for authenticated range reads.  When omitted,
        Rasteret auto-creates one for known auth-required datasets.
        See :func:`create_backend`.

    Returns
    -------
    Collection

    Raises
    ------
    KeyError
        If *dataset* is not in the registry.
    ValueError
        If the descriptor has no configured access method, or if
        auth is required but no backend could be created.
    """
    _validate_bbox(bbox)
    _validate_date_range(date_range)

    from rasteret.catalog import DatasetRegistry

    descriptor = DatasetRegistry.get(dataset)
    if descriptor is None:
        available = [d.id for d in DatasetRegistry.list()]
        raise KeyError(
            f"Dataset '{dataset}' not found in registry. " f"Available: {available}"
        )

    if descriptor.id == "aef/v1-annual" and descriptor.collection_uri:
        collection = load(descriptor.id, name=name)
        if bbox is not None or date_range is not None:
            collection = collection.subset(bbox=bbox, date_range=date_range)
        return collection

    # Auto-create backend for datasets that provide an explicit backend hint.
    #
    # Important: `requires_auth` alone is not enough to decide whether a backend
    # is mandatory. Some datasets work via URL signing or requester-pays config
    # (cloud_config) without needing an obstore credential
    # provider. We only fail fast when the descriptor provides a concrete hint
    # that build-time enrichment needs a backend (currently `s3_credentials_url`).
    resolved_backend = backend
    if resolved_backend is None and descriptor.s3_credentials_url:
        resolved_backend = _auto_backend_for_descriptor(descriptor)
        if resolved_backend is None and descriptor.requires_auth:
            extra_hint = ""
            url = (descriptor.s3_credentials_url or "").lower()
            if "earthdata" in url or "lpdaac" in url:
                extra_hint = (
                    " If you're using an Earthdata-backed dataset, install "
                    '"rasteret[earthdata]" to enable the Earthdata credential provider.'
                )
            raise ValueError(
                f"Dataset '{descriptor.id}' requires authentication for build-time "
                f"COG header enrichment but no backend could be created. Either pass "
                f"backend= explicitly (see rasteret.create_backend()), or configure "
                f"credentials via environment variables / ~/.netrc.{extra_hint}"
            )

    from rasteret.core.collection import _is_cloud_uri

    resolved_workspace: str | Path | None = workspace_dir
    if workspace_dir is not None:
        ws_str = str(workspace_dir)
        if _is_cloud_uri(ws_str):
            stem = ws_str.rstrip("/").rsplit("/", 1)[-1]
            if not stem.endswith(("_stac", "_records")):
                resolved_workspace = f"{ws_str.rstrip('/')}/{name}_records"
        else:
            workspace = Path(workspace_dir)
            if workspace.name.endswith(("_stac", "_records")):
                resolved_workspace = workspace
            else:
                resolved_workspace = workspace / f"{name}_records"

    # Resolve band_codes from descriptor for GeoParquet enrichment.
    descriptor_band_codes = (
        list(descriptor.band_map.keys()) if descriptor.band_map else None
    )

    # Local descriptors are already-built Collections. Prefer loading them as-is
    # rather than re-running enrichment/build logic.
    if descriptor.spatial_coverage == "local" and descriptor.collection_uri:
        local_path = Path(descriptor.collection_uri).expanduser()
        if local_path.exists():
            return load(local_path, name=name)

    def _build_from_geoparquet() -> "Collection":
        """Route a GeoParquet-backed descriptor through build_from_table.

        Constructs filter_expr from bbox_columns + bbox/date_range,
        filesystem for anonymous S3, and passes descriptor normalisation
        fields through.
        """
        import pandas as pd
        import pyarrow as pa
        import pyarrow.dataset as pads

        record_table_uri = descriptor.build_source_uri or ""

        # --- Construct filter_expr from bbox_columns + date_range ---
        filter_parts: list[pads.Expression] = []

        if bbox:
            if descriptor.bbox_columns:
                bc = descriptor.bbox_columns
                if "minx" in bc and "maxx" in bc and "miny" in bc and "maxy" in bc:
                    filter_parts.append(pads.field(bc["minx"]) <= bbox[2])
                    filter_parts.append(pads.field(bc["maxx"]) >= bbox[0])
                    filter_parts.append(pads.field(bc["miny"]) <= bbox[3])
                    filter_parts.append(pads.field(bc["maxy"]) >= bbox[1])
            else:
                bbox_source = descriptor.source_field("bbox")
                filter_parts.append(pads.field(bbox_source, "xmax") >= bbox[0])
                filter_parts.append(pads.field(bbox_source, "xmin") <= bbox[2])
                filter_parts.append(pads.field(bbox_source, "ymax") >= bbox[1])
                filter_parts.append(pads.field(bbox_source, "ymin") <= bbox[3])

        # --- Construct filesystem for anonymous S3 URIs ---
        fs = None
        if record_table_uri.startswith("s3://") and not descriptor.requires_auth:
            try:
                import pyarrow.fs as pafs

                cloud_region = "us-west-2"
                if descriptor.cloud_config:
                    cloud_region = descriptor.cloud_config.get("region", cloud_region)
                fs = pafs.S3FileSystem(anonymous=True, region=cloud_region)
            except Exception as exc:
                logger.warning(
                    "Could not create anonymous S3 filesystem for %s: %s",
                    descriptor.id,
                    exc,
                )

        # --- Strip scheme when filesystem is provided ---
        # PyArrow expects bare "bucket/key" paths with an explicit filesystem,
        # not full "s3://bucket/key" URIs.
        read_path = record_table_uri
        if fs is not None and record_table_uri.startswith("s3://"):
            read_path = record_table_uri[len("s3://") :]

        # Date range filter: inspect the actual source field type so Rasteret
        # can fail clearly instead of silently assuming integer years.
        if date_range:
            datetime_source = descriptor.source_field("datetime")
            if datetime_source:
                try:
                    schema = pads.dataset(
                        read_path,
                        format="parquet",
                        filesystem=fs,
                    ).schema
                except Exception as exc:
                    raise ValueError(
                        f"Could not inspect build source schema for dataset "
                        f"'{descriptor.id}' at {record_table_uri!r}: {exc}"
                    ) from exc
                if datetime_source not in schema.names:
                    raise ValueError(
                        f"Dataset '{descriptor.id}' declares canonical datetime -> "
                        f"{datetime_source!r}, but that column is missing from the "
                        f"build source at {record_table_uri!r}."
                    )
                dt_type = schema.field(datetime_source).type
                start = pd.Timestamp(date_range[0])
                end = pd.Timestamp(date_range[1])
                if pa.types.is_integer(dt_type):
                    filter_parts.append(pads.field(datetime_source) >= int(start.year))
                    filter_parts.append(pads.field(datetime_source) <= int(end.year))
                elif pa.types.is_timestamp(dt_type):
                    start_scalar = pa.scalar(start.to_pydatetime(), type=dt_type)
                    end_scalar = pa.scalar(end.to_pydatetime(), type=dt_type)
                    filter_parts.append(pads.field(datetime_source) >= start_scalar)
                    filter_parts.append(pads.field(datetime_source) <= end_scalar)
                elif pa.types.is_date32(dt_type) or pa.types.is_date64(dt_type):
                    start_scalar = pa.scalar(start.to_pydatetime().date(), type=dt_type)
                    end_scalar = pa.scalar(end.to_pydatetime().date(), type=dt_type)
                    filter_parts.append(pads.field(datetime_source) >= start_scalar)
                    filter_parts.append(pads.field(datetime_source) <= end_scalar)
                else:
                    raise ValueError(
                        f"Dataset '{descriptor.id}' uses source datetime column "
                        f"{datetime_source!r} with unsupported type {dt_type}. "
                        "Rasteret build() currently supports integer year, date32/date64, "
                        "or timestamp source columns for date_range pushdown."
                    )

        filter_expr = None
        if filter_parts:
            filter_expr = filter_parts[0]
            for part in filter_parts[1:]:
                filter_expr = filter_expr & part

        # --- URL rewrite patterns from cloud_config ---
        url_rewrite_patterns = None
        if descriptor.cloud_config:
            url_rewrite_patterns = descriptor.cloud_config.get("url_patterns")

        return build_from_table(
            read_path,
            name=name,
            data_source=descriptor.stac_collection or descriptor.id,
            workspace_dir=resolved_workspace,
            column_map=descriptor.column_map,
            href_column=descriptor.source_field("href")
            if descriptor.source_field("href") != "href"
            else descriptor.href_column,
            band_index_map=descriptor.band_index_map,
            url_rewrite_patterns=url_rewrite_patterns,
            filesystem=fs,
            filter_expr=filter_expr,
            enrich_cog=True,
            band_codes=descriptor_band_codes,
            cloud_config=cloud_config,
            max_concurrent=max_concurrent,
            force=force,
            backend=resolved_backend,
        )

    # GeoParquet-first path: descriptors that have no STAC API, or user
    # explicitly prefers GeoParquet.
    if descriptor.build_source_uri and (
        prefer_geoparquet or not (descriptor.stac_api and descriptor.stac_collection)
    ):
        return _build_from_geoparquet()

    # STAC path (default for STAC-backed descriptors)
    api = stac_api or descriptor.stac_api
    if api and (descriptor.stac_collection or descriptor.static_catalog):
        if not descriptor.static_catalog and (bbox is None or date_range is None):
            raise ValueError(
                f"Dataset '{dataset}' requires bbox and date_range for STAC queries."
            )
        return build_from_stac(
            name=name,
            stac_api=api,
            collection=descriptor.stac_collection or descriptor.id,
            data_source=descriptor.id,
            band_map=descriptor.band_map,
            band_index_map=descriptor.band_index_map,
            bbox=bbox,
            date_range=date_range,
            workspace_dir=workspace_dir,
            force=force,
            max_concurrent=max_concurrent,
            cloud_config=cloud_config,
            query=query,
            backend=resolved_backend,
            static_catalog=descriptor.static_catalog,
        )

    # GeoParquet fallback
    if descriptor.build_source_uri:
        return _build_from_geoparquet()

    raise ValueError(
        f"Dataset '{dataset}' has no STAC API or record-table URI configured."
    )

build_from_stac

build_from_stac(
    *,
    name: str,
    stac_api: str,
    collection: str,
    data_source: str | None = None,
    band_map: dict[str, str] | None = None,
    band_index_map: dict[str, int] | None = None,
    bbox: tuple[float, float, float, float] | None = None,
    date_range: tuple[str, str] | None = None,
    workspace_dir: str | Path | None = None,
    force: bool = False,
    max_concurrent: int = 50,
    cloud_config: Any = None,
    query: dict[str, Any] | None = None,
    backend: "StorageBackend | None" = None,
    static_catalog: bool = False,
) -> "Collection"

Build or load a local Parquet-backed collection from a STAC API.

Searches STAC, parses COG headers for tile metadata, and stores everything in a local Parquet index. On subsequent calls with the same parameters, Rasteret reuses the cached index (no STAC query / header parsing), but pixel reads still fetch remote tiles unless assets are local.

Parameters:

Name Type Description Default
name str

Logical name for the collection.

required
stac_api str

STAC API endpoint URL.

required
collection str

STAC collection ID (e.g. "sentinel-2-l2a").

required
data_source str

Optional Rasteret data-source key used for band mapping and cloud config lookup. Defaults to collection. Use this to namespace provider-specific conventions (e.g. "earthsearch/sentinel-2-l2a" vs "pc/sentinel-2-l2a") and avoid collisions.

None
band_map dict

Optional mapping of band code to STAC asset key. When omitted, Rasteret falls back to built-in mappings for known collections.

None
band_index_map dict

Optional mapping of band code to the 0-based band/sample index within a multi-sample GeoTIFF asset. Required when multiple requested bands map to the same STAC asset key (e.g. a single multi-band "image" COG).

None
bbox tuple of float

(minx, miny, maxx, maxy) bounding box for the search.

None
date_range tuple of str

(start, end) ISO date strings.

None
workspace_dir str or Path

Directory for the local cache. Defaults to ~/rasteret_workspace.

None
force bool

Rebuild even if a cache already exists.

False
max_concurrent int

Maximum concurrent COG header fetch operations.

50
cloud_config CloudConfig

Cloud configuration for URL rewriting.

None
query dict

Additional STAC search query parameters.

None
backend StorageBackend

I/O backend for authenticated range reads during COG header parsing. See :func:create_backend.

None

Returns:

Type Description
Collection
Source code in src/rasteret/__init__.py
def build_from_stac(
    *,
    name: str,
    stac_api: str,
    collection: str,
    data_source: str | None = None,
    band_map: dict[str, str] | None = None,
    band_index_map: dict[str, int] | None = None,
    bbox: tuple[float, float, float, float] | None = None,
    date_range: tuple[str, str] | None = None,
    workspace_dir: str | Path | None = None,
    force: bool = False,
    max_concurrent: int = 50,
    cloud_config: Any = None,
    query: dict[str, Any] | None = None,
    backend: "StorageBackend | None" = None,
    static_catalog: bool = False,
) -> "Collection":
    """Build or load a local Parquet-backed collection from a STAC API.

    Searches STAC, parses COG headers for tile metadata, and stores
    everything in a local Parquet index. On subsequent calls with the same
    parameters, Rasteret reuses the cached index (no STAC query / header
    parsing), but pixel reads still fetch remote tiles unless assets are local.

    Parameters
    ----------
    name : str
        Logical name for the collection.
    stac_api : str
        STAC API endpoint URL.
    collection : str
        STAC collection ID (e.g. ``"sentinel-2-l2a"``).
    data_source : str, optional
        Optional Rasteret data-source key used for band mapping and cloud config
        lookup. Defaults to *collection*. Use this to namespace provider-specific
        conventions (e.g. ``"earthsearch/sentinel-2-l2a"`` vs
        ``"pc/sentinel-2-l2a"``) and avoid collisions.
    band_map : dict, optional
        Optional mapping of band code to STAC asset key. When omitted,
        Rasteret falls back to built-in mappings for known collections.
    band_index_map : dict, optional
        Optional mapping of band code to the 0-based band/sample index within a
        multi-sample GeoTIFF asset. Required when multiple requested bands map
        to the same STAC asset key (e.g. a single multi-band ``"image"`` COG).
    bbox : tuple of float
        ``(minx, miny, maxx, maxy)`` bounding box for the search.
    date_range : tuple of str
        ``(start, end)`` ISO date strings.
    workspace_dir : str or Path, optional
        Directory for the local cache. Defaults to ``~/rasteret_workspace``.
    force : bool
        Rebuild even if a cache already exists.
    max_concurrent : int
        Maximum concurrent COG header fetch operations.
    cloud_config : CloudConfig, optional
        Cloud configuration for URL rewriting.
    query : dict, optional
        Additional STAC search query parameters.
    backend : StorageBackend, optional
        I/O backend for authenticated range reads during COG header
        parsing.  See :func:`create_backend`.

    Returns
    -------
    Collection
    """
    _validate_bbox(bbox)
    _validate_date_range(date_range)

    from rasteret.core.collection import Collection

    workspace_dir_path = Path(workspace_dir or Path.home() / "rasteret_workspace")
    resolved_source = data_source or str(collection)

    if date_range is not None:
        collection_name = Collection.create_name(name, date_range, resolved_source)
    else:
        # Static catalogs may not have a date range; use name + collection id.
        safe = name.lower().replace(" ", "-").replace("/", "-")
        collection_name = f"{safe}_{resolved_source.replace('/', '-')}"
    collection_path = workspace_dir_path / f"{collection_name}_stac"

    if collection_path.exists() and not force:
        return Collection._load_cached(collection_path)

    from rasteret.cloud import CloudConfig, backend_config_from_cloud_config
    from rasteret.ingest.stac_indexer import StacCollectionBuilder

    resolved_cloud_config = cloud_config or CloudConfig.get_config(resolved_source)

    if backend is None and resolved_cloud_config:
        from rasteret.fetch.cog import _create_obstore_backend

        cfg = backend_config_from_cloud_config(resolved_cloud_config)
        if cfg:
            backend = _create_obstore_backend(**cfg)

    builder = StacCollectionBuilder(
        data_source=resolved_source,
        stac_collection=str(collection),
        stac_api=stac_api,
        workspace_dir=collection_path,
        name=collection_name,
        band_map=band_map,
        band_index_map=band_index_map,
        cloud_config=resolved_cloud_config,
        max_concurrent=max_concurrent,
        backend=backend,
        static_catalog=static_catalog,
    )

    async def _build() -> Collection:
        return await builder.build_index(
            bbox=list(bbox) if bbox else None,
            date_range=date_range,
            query=query,
        )

    from rasteret.core.utils import run_sync

    return run_sync(_build())

build_from_table

build_from_table(
    path: "str | Path | pa.Table | pads.Dataset",
    *,
    name: str = "",
    data_source: str = "",
    workspace_dir: str | Path | None = None,
    column_map: dict[str, str] | None = None,
    href_column: str | None = None,
    band_index_map: dict[str, int] | None = None,
    url_rewrite_patterns: dict[str, str] | None = None,
    filesystem: Any | None = None,
    columns: list[str] | None = None,
    filter_expr: Any | None = None,
    enrich_cog: bool = False,
    band_codes: list[str] | None = None,
    cloud_config: Any = None,
    max_concurrent: int = 300,
    force: bool = False,
    backend: "StorageBackend | None" = None,
) -> "Collection"

Build a Collection from an external Parquet/GeoParquet record table.

A record table is a Parquet dataset where each row is a raster item (satellite scene, drone image, derived product, etc.) with at minimum id, datetime, geometry, assets, or columns that can be normalised into them via column_map and href_column.

This is the heavy ingest path: it can normalize schema and optionally enrich COG headers. For in-memory tables that are already read-ready, use :func:as_collection.

When enrich_cog=True, COG headers are parsed from the asset URLs and cached as {band}_metadata struct columns in the Parquet index, enabling fast tiled reads and TorchGeo integration.

When name is provided and workspace_dir is omitted, the collection is persisted to ~/rasteret_workspace/{name}_records/ so that it is discoverable via :meth:Collection.list_collections and the CLI.

Parameters:

Name Type Description Default
path str, Path, or pyarrow object

Path/URI to a Parquet/GeoParquet file or dataset directory, or an in-memory Arrow object (pyarrow.Table, pyarrow.dataset.Dataset, pyarrow.RecordBatch, pyarrow.RecordBatchReader, or an object implementing the Arrow PyCapsule protocol). Hugging Face dataset URIs are supported via hf://datasets/<org>/<name>[/subpath].

required
name str

Optional collection name. When given without workspace_dir, the collection is cached in the default workspace.

''
data_source str

Data source identifier for band mapping and URL policy.

''
workspace_dir str or Path

Persist the collection as partitioned Parquet at this path. Defaults to ~/rasteret_workspace/{name}_records/ when name is provided.

None
column_map dict

{source_name: contract_name} alias map. Source columns are preserved; contract-name columns are added as zero-copy aliases.

None
href_column str

Column containing COG URLs. When set and assets is absent after aliasing, the normalisation layer constructs the assets struct from this column and band_index_map.

None
band_index_map dict

{band_code: sample_index} for multi-band COGs.

None
url_rewrite_patterns dict

{source_prefix: target_prefix} for URL rewriting during assets construction.

None
filesystem FileSystem

PyArrow filesystem for reading remote URIs (e.g. S3FileSystem(anonymous=True)).

None
columns list of str

Scan-time column projection.

None
filter_expr Expression

Scan-time predicate pushdown.

None
enrich_cog bool

Parse COG headers and add per-band metadata columns.

False
band_codes list of str

Bands to enrich. Defaults to all bands in assets.

None
cloud_config CloudConfig

Cloud configuration for URL rewriting.

None
max_concurrent int

Maximum concurrent HTTP connections for COG header parsing.

300
force bool

Rebuild even if a cached collection already exists at the resolved workspace path.

False
backend StorageBackend

I/O backend for authenticated range reads during COG header parsing. See :func:create_backend.

None

Returns:

Type Description
Collection
Source code in src/rasteret/__init__.py
def build_from_table(
    path: "str | Path | pa.Table | pads.Dataset",
    *,
    name: str = "",
    data_source: str = "",
    workspace_dir: str | Path | None = None,
    column_map: dict[str, str] | None = None,
    href_column: str | None = None,
    band_index_map: dict[str, int] | None = None,
    url_rewrite_patterns: dict[str, str] | None = None,
    filesystem: Any | None = None,
    columns: list[str] | None = None,
    filter_expr: Any | None = None,
    enrich_cog: bool = False,
    band_codes: list[str] | None = None,
    cloud_config: Any = None,
    max_concurrent: int = 300,
    force: bool = False,
    backend: "StorageBackend | None" = None,
) -> "Collection":
    """Build a Collection from an external Parquet/GeoParquet record table.

    A record table is a Parquet dataset where each row is a raster item
    (satellite scene, drone image, derived product, etc.) with at minimum
    ``id``, ``datetime``, ``geometry``, ``assets``, or columns that can
    be normalised into them via ``column_map`` and ``href_column``.

    This is the heavy ingest path: it can normalize schema and optionally
    enrich COG headers. For in-memory tables that are already read-ready,
    use :func:`as_collection`.

    When ``enrich_cog=True``, COG headers are parsed from the asset URLs
    and cached as ``{band}_metadata`` struct columns in the Parquet index,
    enabling fast tiled reads and TorchGeo integration.

    When *name* is provided and *workspace_dir* is omitted, the collection
    is persisted to ``~/rasteret_workspace/{name}_records/`` so that it is
    discoverable via :meth:`Collection.list_collections` and the CLI.

    Parameters
    ----------
    path : str, Path, or pyarrow object
        Path/URI to a Parquet/GeoParquet file or dataset directory, **or**
        an in-memory Arrow object (``pyarrow.Table``, ``pyarrow.dataset.Dataset``,
        ``pyarrow.RecordBatch``, ``pyarrow.RecordBatchReader``, or an object
        implementing the Arrow PyCapsule protocol).
        Hugging Face dataset URIs are supported via
        ``hf://datasets/<org>/<name>[/subpath]``.
    name : str
        Optional collection name.  When given without *workspace_dir*,
        the collection is cached in the default workspace.
    data_source : str
        Data source identifier for band mapping and URL policy.
    workspace_dir : str or Path, optional
        Persist the collection as partitioned Parquet at this path.
        Defaults to ``~/rasteret_workspace/{name}_records/`` when
        *name* is provided.
    column_map : dict, optional
        ``{source_name: contract_name}`` alias map.  Source columns are
        preserved; contract-name columns are added as zero-copy aliases.
    href_column : str, optional
        Column containing COG URLs.  When set and ``assets`` is absent
        after aliasing, the normalisation layer constructs the ``assets``
        struct from this column and ``band_index_map``.
    band_index_map : dict, optional
        ``{band_code: sample_index}`` for multi-band COGs.
    url_rewrite_patterns : dict, optional
        ``{source_prefix: target_prefix}`` for URL rewriting during
        assets construction.
    filesystem : pyarrow.fs.FileSystem, optional
        PyArrow filesystem for reading remote URIs (e.g.
        ``S3FileSystem(anonymous=True)``).
    columns : list of str, optional
        Scan-time column projection.
    filter_expr : pyarrow.dataset.Expression, optional
        Scan-time predicate pushdown.
    enrich_cog : bool
        Parse COG headers and add per-band metadata columns.
    band_codes : list of str, optional
        Bands to enrich.  Defaults to all bands in ``assets``.
    cloud_config : CloudConfig, optional
        Cloud configuration for URL rewriting.
    max_concurrent : int
        Maximum concurrent HTTP connections for COG header parsing.
    force : bool
        Rebuild even if a cached collection already exists at the
        resolved workspace path.
    backend : StorageBackend, optional
        I/O backend for authenticated range reads during COG header
        parsing.  See :func:`create_backend`.

    Returns
    -------
    Collection
    """
    # Resolve workspace path with _records suffix convention so that
    # list_collections() and the CLI can discover this collection.
    from rasteret.core.collection import Collection, _is_cloud_uri
    from rasteret.ingest.normalize import build_collection_from_table
    from rasteret.ingest.parquet_record_table import (
        RecordTableBuilder,
        _apply_column_map_aliases,
        prepare_record_table,
    )

    resolved_workspace: str | Path | None = workspace_dir
    if workspace_dir is not None:
        ws_str = str(workspace_dir)
        if _is_cloud_uri(ws_str):
            # Cloud URIs: use string manipulation to avoid Path mangling.
            stem = ws_str.rstrip("/").rsplit("/", 1)[-1]
            if not stem.endswith(("_stac", "_records")):
                resolved_workspace = (
                    f"{ws_str.rstrip('/')}/{name}_records" if name else ws_str
                )
        else:
            ws = Path(workspace_dir)
            if not ws.name.endswith(("_stac", "_records")):
                resolved_workspace = ws / f"{name}_records" if name else ws
    elif name:
        resolved_workspace = Path.home() / "rasteret_workspace" / f"{name}_records"

    # Cache hit: reuse existing collection.
    if resolved_workspace is not None:
        rw_str = str(resolved_workspace)
        if not _is_cloud_uri(rw_str) and Path(rw_str).exists() and not force:
            return Collection._load_cached(rw_str)

    # Arrow-native path: accept in-memory Arrow tables, datasets, readers, and
    # PyCapsule protocol objects.
    if not isinstance(path, (str, Path)):
        table = _arrow_object_to_table(
            path,
            columns=columns,
            filter_expr=filter_expr,
        )
        table = _apply_column_map_aliases(table, column_map)
        table = prepare_record_table(
            table,
            href_column=href_column,
            band_index_map=band_index_map,
            url_rewrite_patterns=url_rewrite_patterns,
        )

        if enrich_cog:
            from rasteret.core.utils import run_sync
            from rasteret.ingest.enrich import (
                build_url_index_from_assets,
                enrich_table_with_cog_metadata,
            )

            url_index = build_url_index_from_assets(table, band_codes)
            resolved_band_codes = band_codes or sorted(
                {band for bands in url_index.values() for band in bands}
            )
            if not url_index or not resolved_band_codes:
                logger.warning("No asset URLs found for COG enrichment")
            else:
                table = run_sync(
                    enrich_table_with_cog_metadata(
                        table,
                        url_index,
                        resolved_band_codes,
                        max_concurrent=max_concurrent,
                        backend=backend,
                    )
                )

        return build_collection_from_table(
            table,
            name=name or "record_table",
            data_source=data_source,
            workspace_dir=resolved_workspace,
        )

    builder = RecordTableBuilder(
        path,
        data_source=data_source,
        workspace_dir=resolved_workspace,
        column_map=column_map,
        href_column=href_column,
        band_index_map=band_index_map,
        url_rewrite_patterns=url_rewrite_patterns,
        filesystem=filesystem,
        columns=columns,
        filter_expr=filter_expr,
        enrich_cog=enrich_cog,
        band_codes=band_codes,
        max_concurrent=max_concurrent,
        backend=backend,
    )
    return builder.build(name=name, workspace_dir=resolved_workspace)

as_collection

as_collection(
    table: Any,
    *,
    name: str = "",
    data_source: str = "",
    description: str = "",
    start_date: datetime | None = None,
    end_date: datetime | None = None,
    require_band_metadata: bool = True,
) -> "Collection"

Wrap a read-ready Arrow object as a Collection.

This is the lightweight re-entry path for workflows where you already have a table derived from an existing Collection and want to keep using Rasteret reads without re-running ingest/enrichment.

Unlike :func:build_from_table, this function performs no COG enrichment, normalization, or persistence. It validates the read contract and wraps the provided Arrow object as-is.

Use :func:build_from_table for first-time external Parquet ingest.

Parameters:

Name Type Description Default
table Arrow-compatible object

Read-ready Arrow object to wrap. Supports pyarrow.dataset.Dataset (preferred for lazy scans), pyarrow.Table, pyarrow.RecordBatch, pyarrow.RecordBatchReader, and Arrow-compatible Python objects implementing the standard PyCapsule interchange protocol. Only pyarrow.dataset.Dataset preserves an already-lazy dataset view. Other Arrow-native inputs may be wrapped as in-memory datasets.

required
name str

Optional collection name.

''
data_source str

Optional data source identifier. If omitted, Rasteret attempts to infer it from schema metadata or the collection column.

''
description str

Optional collection description.

''
start_date datetime

Optional temporal bounds to attach to the Collection object.

None
end_date datetime

Optional temporal bounds to attach to the Collection object.

None
require_band_metadata bool

When True (default), require at least one *_metadata column and validate those columns are struct-typed with required COG metadata fields.

True

Returns:

Type Description
Collection

A wrapped Collection ready for get_numpy(), get_xarray(), and to_torchgeo_dataset() when the necessary band metadata columns are present.

Source code in src/rasteret/__init__.py
def as_collection(
    table: Any,
    *,
    name: str = "",
    data_source: str = "",
    description: str = "",
    start_date: datetime | None = None,
    end_date: datetime | None = None,
    require_band_metadata: bool = True,
) -> "Collection":
    """Wrap a read-ready Arrow object as a Collection.

    This is the lightweight re-entry path for workflows where you already
    have a table derived from an existing Collection and want to keep using
    Rasteret reads without re-running ingest/enrichment.

    Unlike :func:`build_from_table`, this function performs **no** COG
    enrichment, normalization, or persistence. It validates the read contract
    and wraps the provided Arrow object as-is.

    Use :func:`build_from_table` for first-time external Parquet ingest.

    Parameters
    ----------
    table : Arrow-compatible object
        Read-ready Arrow object to wrap. Supports
        ``pyarrow.dataset.Dataset`` (preferred for lazy scans),
        ``pyarrow.Table``, ``pyarrow.RecordBatch``,
        ``pyarrow.RecordBatchReader``, and Arrow-compatible Python objects
        implementing the standard PyCapsule interchange protocol.
        Only ``pyarrow.dataset.Dataset`` preserves an already-lazy dataset
        view. Other Arrow-native inputs may be wrapped as in-memory datasets.
    name : str
        Optional collection name.
    data_source : str
        Optional data source identifier. If omitted, Rasteret attempts to infer
        it from schema metadata or the ``collection`` column.
    description : str
        Optional collection description.
    start_date, end_date : datetime, optional
        Optional temporal bounds to attach to the Collection object.
    require_band_metadata : bool
        When ``True`` (default), require at least one ``*_metadata`` column and
        validate those columns are struct-typed with required COG metadata
        fields.

    Returns
    -------
    Collection
        A wrapped Collection ready for ``get_numpy()``, ``get_xarray()``, and
        ``to_torchgeo_dataset()`` when the necessary band metadata columns are
        present.
    """
    import pyarrow as pa

    from rasteret.core.collection import Collection
    from rasteret.core.utils import infer_data_source_from_dataset

    dataset, materialized_table = _arrow_object_to_dataset(table)

    if materialized_table is not None:
        global _AS_COLLECTION_MEMORY_WARNING_EMITTED
        table_bytes = int(getattr(materialized_table, "nbytes", 0) or 0)
        total_ram = _total_ram_bytes()
        warn_large_absolute = table_bytes >= 2 * 1024**3
        warn_large_ratio = (
            total_ram is not None
            and total_ram > 0
            and (table_bytes / total_ram) >= 0.40
        )
        if (
            warn_large_absolute or warn_large_ratio
        ) and not _AS_COLLECTION_MEMORY_WARNING_EMITTED:
            ram_text = (
                f"{(table_bytes / total_ram):.0%} of system RAM"
                if total_ram
                else "a large fraction of system memory"
            )
            warnings.warn(
                "as_collection() received a large in-memory pyarrow.Table "
                f"({table_bytes / (1024**3):.2f} GiB, ~{ram_text}). For large "
                "workloads, prefer a lazy pyarrow.dataset.Dataset or persist to "
                "Parquet and use rasteret.load(...).",
                UserWarning,
                stacklevel=2,
            )
            _AS_COLLECTION_MEMORY_WARNING_EMITTED = True
    schema_names = set(dataset.schema.names)

    required = {"id", "datetime", "geometry", "assets", "bbox"}
    missing = required - schema_names
    if missing:
        raise ValueError(
            f"Table is missing required columns for as_collection: {sorted(missing)}. "
            "Use build_from_table(...) for external tables that still need "
            "normalization."
        )

    if require_band_metadata:
        metadata_columns = [
            column for column in dataset.schema.names if column.endswith("_metadata")
        ]
        if not metadata_columns:
            raise ValueError(
                "No '*_metadata' columns found. as_collection() expects a "
                "read-ready table. Use build_from_table(..., enrich_cog=True) "
                "for first-time external ingest, or pass "
                "require_band_metadata=False for metadata-only workflows."
            )

        required_meta_fields = {
            "tile_offsets",
            "tile_byte_counts",
            "tile_width",
            "tile_height",
            "dtype",
            "transform",
        }
        for column in metadata_columns:
            field_type = dataset.schema.field(column).type
            if not pa.types.is_struct(field_type):
                raise ValueError(
                    f"Column '{column}' must be a struct for as_collection()."
                )
            missing_fields = required_meta_fields - set(field_type.names)
            if missing_fields:
                raise ValueError(
                    f"Column '{column}' is missing required metadata fields: "
                    f"{sorted(missing_fields)}."
                )

    if start_date is not None and not isinstance(start_date, datetime):
        raise TypeError("start_date must be a datetime when provided.")
    if end_date is not None and not isinstance(end_date, datetime):
        raise TypeError("end_date must be a datetime when provided.")

    resolved_source = data_source or infer_data_source_from_dataset(dataset)

    collection = Collection(
        dataset=dataset,
        name=name,
        description=description,
        data_source=resolved_source,
        start_date=start_date,
        end_date=end_date,
    )

    return collection

create_backend

create_backend(
    credential_provider: Any = None,
    cloud_config: Any = None,
    region: str | None = None,
    default_s3_config: dict[str, str] | None = None,
) -> Any

Create an I/O backend for authenticated cloud reads.

Pass the result as backend= to :meth:~rasteret.core.collection.Collection.get_xarray or :meth:~rasteret.core.collection.Collection.get_gdf.

Parameters:

Name Type Description Default
credential_provider object

An obstore credential provider, e.g. PlanetaryComputerCredentialProvider, NasaEarthdataCredentialProvider.

None
cloud_config CloudConfig

Cloud configuration for S3 URL rewriting and per-bucket overrides.

None
region str

Convenience alias for default_s3_config={"region": region}.

None
default_s3_config dict

Default S3Store config applied to all buckets that don't have per-bucket overrides (e.g. {"region": "us-west-2"}).

None

Examples:

>>> from obstore.auth.planetary_computer import PlanetaryComputerCredentialProvider
>>> pc_asset_url = "https://naipeuwest.blob.core.windows.net/naip/v002/"
>>> backend = rasteret.create_backend(
...     credential_provider=PlanetaryComputerCredentialProvider(pc_asset_url)
... )
>>> ds = collection.get_xarray(geometries=aoi, bands=["B04"], backend=backend)
Source code in src/rasteret/__init__.py
def create_backend(
    credential_provider: Any = None,
    cloud_config: Any = None,
    region: str | None = None,
    default_s3_config: dict[str, str] | None = None,
) -> Any:
    """Create an I/O backend for authenticated cloud reads.

    Pass the result as ``backend=`` to
    :meth:`~rasteret.core.collection.Collection.get_xarray` or
    :meth:`~rasteret.core.collection.Collection.get_gdf`.

    Parameters
    ----------
    credential_provider : object, optional
        An obstore credential provider, e.g.
        ``PlanetaryComputerCredentialProvider``,
        ``NasaEarthdataCredentialProvider``.
    cloud_config : CloudConfig, optional
        Cloud configuration for S3 URL rewriting and per-bucket overrides.
    region : str, optional
        Convenience alias for ``default_s3_config={"region": region}``.
    default_s3_config : dict, optional
        Default S3Store config applied to all buckets that don't have
        per-bucket overrides (e.g. ``{"region": "us-west-2"}``).

    Examples
    --------
    >>> from obstore.auth.planetary_computer import PlanetaryComputerCredentialProvider
    >>> pc_asset_url = "https://naipeuwest.blob.core.windows.net/naip/v002/"
    >>> backend = rasteret.create_backend(
    ...     credential_provider=PlanetaryComputerCredentialProvider(pc_asset_url)
    ... )
    >>> ds = collection.get_xarray(geometries=aoi, bands=["B04"], backend=backend)
    """
    from rasteret.cloud import s3_overrides_from_config
    from rasteret.fetch.cog import _create_obstore_backend

    resolved_default_s3_config = default_s3_config
    if resolved_default_s3_config is None and region is not None:
        resolved_default_s3_config = {"region": region}

    s3_overrides = s3_overrides_from_config(cloud_config) if cloud_config else None
    url_patterns = cloud_config.url_patterns if cloud_config else None
    return _create_obstore_backend(
        s3_overrides=s3_overrides,
        credential_provider=credential_provider,
        default_s3_config=resolved_default_s3_config,
        url_patterns=url_patterns,
    )

load

load(path: str | Path, name: str = '') -> 'Collection'

Load a persisted Rasteret Collection artifact from Parquet.

Use this to reopen a collection previously written by :func:build, :func:build_from_stac, :func:build_from_table, or :meth:rasteret.core.collection.Collection.export. If you already have a read-ready Arrow table/dataset in memory, use :func:as_collection instead.

Parameters:

Name Type Description Default
path str or Path

Path to the Parquet file or dataset directory. Supports local/cloud Parquet paths and Hugging Face dataset URIs in the form hf://datasets/<org>/<name>[/subpath].

required
name str

Optional name override.

''

Returns:

Type Description
Collection
Source code in src/rasteret/__init__.py
def load(path: str | Path, name: str = "") -> "Collection":
    """Load a persisted Rasteret Collection artifact from Parquet.

    Use this to reopen a collection previously written by
    :func:`build`, :func:`build_from_stac`, :func:`build_from_table`,
    or :meth:`rasteret.core.collection.Collection.export`.
    If you already have a read-ready Arrow table/dataset in memory,
    use :func:`as_collection` instead.

    Parameters
    ----------
    path : str or Path
        Path to the Parquet file or dataset directory. Supports local/cloud
        Parquet paths and Hugging Face dataset URIs in the form
        ``hf://datasets/<org>/<name>[/subpath]``.
    name : str
        Optional name override.

    Returns
    -------
    Collection
    """
    from rasteret.catalog import DatasetRegistry
    from rasteret.core.collection import Collection
    from rasteret.integrations.huggingface import is_hf_dataset_uri

    path_str = str(path)
    descriptor = DatasetRegistry.get(path_str)
    if descriptor is not None and descriptor.collection_uri:
        record_index_filesystem = None
        record_index_url_rewrite_patterns = None
        runtime_index_uri = descriptor.runtime_index_uri
        if runtime_index_uri:
            if descriptor.cloud_config:
                record_index_url_rewrite_patterns = descriptor.cloud_config.get(
                    "url_patterns"
                )
            if runtime_index_uri.startswith("s3://") and not descriptor.requires_auth:
                try:
                    import pyarrow.fs as pafs

                    cloud_region = "us-west-2"
                    if descriptor.cloud_config:
                        cloud_region = descriptor.cloud_config.get(
                            "region", cloud_region
                        )
                    record_index_filesystem = pafs.S3FileSystem(
                        anonymous=True,
                        region=cloud_region,
                    )
                except Exception:
                    record_index_filesystem = None
        return Collection.from_parquet(
            descriptor.collection_uri,
            name=name or descriptor.name,
            data_source=descriptor.id,
            defer_dataset_open=bool(
                runtime_index_uri
                and runtime_index_uri != descriptor.collection_uri
                and not is_hf_dataset_uri(descriptor.collection_uri)
            ),
            record_index_path=(
                runtime_index_uri
                if runtime_index_uri and runtime_index_uri != descriptor.collection_uri
                else None
            ),
            record_index_field_roles=descriptor.field_roles,
            record_index_column_map=descriptor.column_map,
            record_index_href_column=descriptor.source_field("href")
            if descriptor.source_field("href") != "href"
            else descriptor.href_column,
            record_index_band_index_map=descriptor.band_index_map,
            record_index_url_rewrite_patterns=record_index_url_rewrite_patterns,
            record_index_filesystem=record_index_filesystem,
            surface_fields=descriptor.surface_fields,
            filter_capabilities=descriptor.filter_capabilities,
        )

    return Collection.from_parquet(path, name=name)

register

register(descriptor: 'DatasetDescriptor') -> None

Register a dataset descriptor in the global registry.

Parameters:

Name Type Description Default
descriptor DatasetDescriptor

The descriptor to register. See :class:rasteret.catalog.DatasetDescriptor.

required
Source code in src/rasteret/__init__.py
def register(descriptor: "DatasetDescriptor") -> None:
    """Register a dataset descriptor in the global registry.

    Parameters
    ----------
    descriptor : DatasetDescriptor
        The descriptor to register.  See :class:`rasteret.catalog.DatasetDescriptor`.
    """
    from rasteret.catalog import DatasetRegistry

    DatasetRegistry.register(descriptor)

register_local

register_local(
    dataset_id: str,
    path: str | Path,
    *,
    name: str | None = None,
    description: str = "",
    data_source: str = "",
    persist: bool = True,
    registry_path: str | Path | None = None,
) -> "DatasetDescriptor"

Register a local Parquet collection as a dataset descriptor.

This is useful when you want local/shared Collection Parquet artifacts to appear in DatasetRegistry and CLI dataset commands.

Parameters:

Name Type Description Default
dataset_id str

Descriptor id (e.g. "local/my-collection").

required
path str or Path

Path to a local Collection Parquet file or directory.

required
name str

Human-readable dataset name. Defaults to the loaded collection name.

None
description str

Optional one-line description.

''
data_source str

Optional data source id. If omitted, inferred from collection metadata.

''
persist bool

When True (default), save descriptor to local registry JSON so it is auto-loaded in future sessions.

True
registry_path str or Path

Override local registry JSON path (defaults to ~/.rasteret/datasets.local.json or RASTERET_LOCAL_DATASETS_PATH).

None

Returns:

Type Description
DatasetDescriptor
Source code in src/rasteret/__init__.py
def register_local(
    dataset_id: str,
    path: str | Path,
    *,
    name: str | None = None,
    description: str = "",
    data_source: str = "",
    persist: bool = True,
    registry_path: str | Path | None = None,
) -> "DatasetDescriptor":
    """Register a local Parquet collection as a dataset descriptor.

    This is useful when you want local/shared Collection Parquet artifacts
    to appear in ``DatasetRegistry`` and CLI dataset commands.

    Parameters
    ----------
    dataset_id : str
        Descriptor id (e.g. ``"local/my-collection"``).
    path : str or Path
        Path to a local Collection Parquet file or directory.
    name : str, optional
        Human-readable dataset name. Defaults to the loaded collection name.
    description : str
        Optional one-line description.
    data_source : str
        Optional data source id. If omitted, inferred from collection metadata.
    persist : bool
        When ``True`` (default), save descriptor to local registry JSON so it
        is auto-loaded in future sessions.
    registry_path : str or Path, optional
        Override local registry JSON path (defaults to
        ``~/.rasteret/datasets.local.json`` or ``RASTERET_LOCAL_DATASETS_PATH``).

    Returns
    -------
    DatasetDescriptor
    """
    from rasteret.catalog import (
        DatasetDescriptor,
        DatasetRegistry,
        save_local_descriptor,
    )
    from rasteret.constants import BandRegistry
    from rasteret.core.utils import infer_data_source

    local_path = Path(path).expanduser()
    collection = load(local_path)
    resolved_source = data_source or infer_data_source(collection)
    temporal_range: tuple[str, str] | None = None

    if collection.dataset is not None and "datetime" in collection.dataset.schema.names:
        values = collection.dataset.to_table(columns=["datetime"]).column("datetime")
        datetimes = [value for value in values.to_pylist() if value is not None]
        if datetimes:
            start = min(datetimes).date().isoformat()
            end = max(datetimes).date().isoformat()
            temporal_range = (start, end)

    band_map = BandRegistry.get(resolved_source) if resolved_source else {}
    descriptor = DatasetDescriptor(
        id=dataset_id,
        name=name or collection.name or dataset_id,
        description=description or collection.description,
        collection_uri=str(local_path),
        stac_collection=resolved_source or None,
        band_map=band_map or None,
        separate_files=True,
        spatial_coverage="local",
        temporal_range=temporal_range,
        requires_auth=False,
    )
    DatasetRegistry.register(descriptor)
    if persist:
        save_local_descriptor(descriptor, registry_path)
    return descriptor

version

version() -> str

Return the installed rasteret package version.

Source code in src/rasteret/__init__.py
def version() -> str:
    """Return the installed rasteret package version."""
    try:
        return get_version("rasteret")
    except PackageNotFoundError:
        return "0.0.0+local"