Skip to content

Custom Cloud Provider & Authentication

Do you need this page?

Most datasets work without any authentication or configuration. Rasteret's obstore backend routes URLs to native cloud stores (S3, Azure Blob, GCS) automatically, and public data like Sentinel-2 on Earth Search is read anonymously.

Data source Auth needed What to do
Public S3 (Earth Search Sentinel-2, etc.) None Just call build(), no configuration needed
Requester-pays S3 (Landsat, NAIP) Standard AWS credentials Install rasteret[aws] and set AWS credentials (env vars or ~/.aws/credentials)
Planetary Computer (pc/*) Built-in: works via build() Install rasteret[azure] for SAS signing. For long-lived caches, see create_backend() below.
Earthdata / DAAC endpoints (temporary S3 credentials) Advanced Install rasteret[earthdata] and create a backend with an Earthdata credential provider (details below)
Private buckets with custom URL patterns CloudConfig.register() Your CDN remaps URLs, or you need requester-pays signing; see below

If your data is on Earth Search or any public S3/GCS/HTTP endpoint, stop here and go back to Getting Started.


This guide covers three layers, from simplest to most advanced:

  1. Requester-pays / URL rewriting - CloudConfig (URL patterns + auto-resolved credentials)
  2. Dynamic credentials - create_backend() with obstore credential providers
  3. Custom I/O - implement StorageBackend (or wrap a store) and pass backend=

Register a cloud config

from rasteret import CloudConfig

CloudConfig.register(
    "my-private-collection",
    CloudConfig(
        provider="aws",
        requester_pays=True,
        region="eu-central-1",
        url_patterns={
            "https://my-cdn.example.com/": "s3://my-private-bucket/",
        },
    ),
)

The url_patterns dict maps HTTP URL prefixes to S3 URL prefixes. When Rasteret encounters a COG URL starting with the HTTP pattern, it rewrites it to the S3 pattern for authenticated access.

Use the config

Once registered, the config is picked up automatically when data_source matches:

import rasteret

collection = rasteret.build_from_stac(
    name="private-data",
    stac_api="https://my-stac.example.com/v1",
    collection="my-private-collection",
    # Optional: namespace conventions to avoid collisions across providers.
    # data_source="acme/my-private-collection",
    bbox=(-0.2, 51.4, 0.2, 51.7),
    date_range=("2024-01-01", "2024-06-30"),
)

Or pass the config explicitly via cloud_config when you need requester-pays or URL rewriting:

from rasteret import CloudConfig

config = CloudConfig.get_config("my-private-collection")

ds = collection.get_xarray(
    geometries=(-0.1, 51.45, 0.1, 51.65),  # bbox tuple
    bands=["B04", "B08"],
    cloud_config=config,
)

Rasteret auto-creates an obstore backend from the config at read time. For requester-pays buckets, AWS credentials are resolved automatically from environment variables or ~/.aws/credentials via boto3.

Built-in configs

Rasteret ships with a few pre-registered configs for common sources, and the catalog also registers per-dataset configs when a DatasetDescriptor includes cloud_config.

The key thing: configs are looked up by Collection data_source. When in doubt, print collection.data_source and register/lookup under that value.

Check what's registered:

CloudConfig.get_config("sentinel-2-l2a")
# CloudConfig(provider='aws', requester_pays=False, region='us-west-2', ...)

For catalog datasets, the data_source is the catalog ID (e.g. earthsearch/landsat-c2-l2, earthsearch/naip), so those are the keys you should use with CloudConfig.get_config(...) if you want to inspect or override built-in behavior.

Multi-cloud obstore backend

Rasteret uses obstore for all remote reads and natively routes URLs to the correct cloud store:

URL pattern Store type
s3://bucket/... S3Store
*.s3.*.amazonaws.com/... S3Store
gs://bucket/... GCSStore
storage.googleapis.com/bucket/... GCSStore
*.blob.core.windows.net/container/... AzureStore
Pre-signed / SAS-signed URLs (query params) HTTPStore
Other HTTPS HTTPStore

This happens automatically -- no configuration needed for public data.

Authenticated cloud reads

Use create_backend() when your data source provides its own credential mechanism (Planetary Computer SAS tokens, Earthdata-style temporary S3 credentials). This passes the credential provider directly to the obstore native store (S3Store, AzureStore, GCSStore):

Planetary Computer

Built-in pc/* datasets work via build() when rasteret[azure] is installed. Rasteret signs STAC assets during the build so COG header enrichment can read bytes from Azure.

Rate limits

Planetary Computer has two different network surfaces:

  • SAS signing (calling the Planetary Computer API to obtain short-lived SAS URLs) is rate-limited and can return HTTP 429.
  • COG reads (range requests to Azure Blob URLs that already include SAS tokens) go directly to Azure Blob Storage, and do not depend on the signing API.

If you hit signing rate limits, reduce query size (e.g. query={"max_items": 1}), retry later, or configure a subscription key via PC_SDK_SUBSCRIPTION_KEY (or planetarycomputer configure) for less restrictive rate limits.

Separately, Azure Blob Storage itself can throttle very high request rates (e.g. 429/503). If you see those while reading tiles/headers, lower max_concurrent and retry.

If you want long-lived cached Collections (without embedding SAS tokens in the Parquet index), create a backend and pass it to both build() and reads:

import rasteret
from obstore.auth.planetary_computer import PlanetaryComputerCredentialProvider

pc_asset_url = "https://naipeuwest.blob.core.windows.net/naip/v002/"
backend = rasteret.create_backend(
    credential_provider=PlanetaryComputerCredentialProvider(pc_asset_url)
)
collection = rasteret.build(
    "pc/sentinel-2-l2a",
    name="pc-s2",
    bbox=(-122.45, 37.74, -122.35, 37.84),
    date_range=("2024-06-01", "2024-07-15"),
    backend=backend,
)
ds = collection.get_xarray(
    geometries=(-122.45, 37.74, -122.35, 37.84),
    bands=["B04"],
    backend=backend,
)

Earthdata (temporary S3 credentials)

import rasteret
from obstore.auth.earthdata import NasaEarthdataCredentialProvider

credentials_url = "https://data.lpdaac.earthdata.nasa.gov/s3credentials"
backend = rasteret.create_backend(
    credential_provider=NasaEarthdataCredentialProvider(credentials_url),
    region="us-west-2",
)
ds = collection.get_xarray(
    geometries=(-105.5, 40.0, -105.0, 40.5),
    bands=["B04"],
    backend=backend,
)
Use the DAAC (Distributed Active Archive Center) specific s3credentials endpoint for the assets you query.

Rasteret does not prompt for credentials. Configure Earthdata auth via one of:

  • ~/.netrc (recommended), or
  • EARTHDATA_TOKEN, or
  • EARTHDATA_USERNAME / EARTHDATA_PASSWORD.

For custom Earthdata-backed datasets, set s3_credentials_url on your dataset descriptor (or pass a pre-built backend) so Rasteret can fetch temporary credentials as needed.

Custom ObstoreBackend

For advanced use cases (e.g. wrapping a pre-configured store with custom client options), use ObstoreBackend directly:

from rasteret.cloud import ObstoreBackend
from obstore.store import S3Store

store = S3Store(
    bucket="my-private-bucket",
    config={"region": "eu-central-1"},
)
backend = ObstoreBackend(store, url_prefix="s3://my-private-bucket/")

ds = collection.get_xarray(
    geometries=(10.0, 48.0, 10.5, 48.5),
    bands=["B04"],
    backend=backend,
)

This is only needed for store configurations that create_backend() does not cover. For most use cases, create_backend() with a credential provider is sufficient.