Quickstart: xarray¶

This notebook shows the minimal Rasteret workflow:

Build a Collection from the dataset catalog (one-time, cached)
Fetch pixels for a small AOI + time window as xarray.Dataset
Compute NDVI

In [ ]:

Copied!

from pathlib import Path

import xarray as xr
from shapely.geometry import Polygon

import rasteret
from pathlib import Path

import xarray as xr
from shapely.geometry import Polygon

import rasteret

Define area of interest¶

We use a small polygon over Bengaluru, India. The STAC query uses the polygon's bounding box to find matching Sentinel-2 L2A scenes.

In [ ]:

Copied!





aoi = Polygon(
    [
        (77.55, 13.01),
        (77.58, 13.01),
        (77.58, 13.08),
        (77.55, 13.08),
        (77.55, 13.01),
    ]
)
aoi = Polygon(
    [
        (77.55, 13.01),
        (77.58, 13.01),
        (77.58, 13.08),
        (77.55, 13.08),
        (77.55, 13.01),
    ]
)

Build the Collection¶

build() picks a dataset from the catalog (here Sentinel-2 on Earth Search), queries the STAC API, parses COG headers for every matching scene, and writes the result to a local Parquet index. On the next run, the cache is loaded in milliseconds.

For more on the catalog and local descriptors, see Dataset Catalog & Descriptors.

For full-control STAC indexing, use build_from_stac() with explicit API and collection parameters; see the Collection Management how-to guide.

In [ ]:

Copied!





workspace = Path.home() / "rasteret_workspace"

collection = rasteret.build(
    "earthsearch/sentinel-2-l2a",
    name="bangalore",
    bbox=aoi.bounds,
    date_range=("2024-01-01", "2024-01-31"),
    workspace_dir=workspace,
)

print(f"Collection: {collection.name}")
print(f"Scenes: {collection.dataset.count_rows()}")
print(f"Columns: {collection.dataset.schema.names[:8]}...")
workspace = Path.home() / "rasteret_workspace"

collection = rasteret.build(
    "earthsearch/sentinel-2-l2a",
    name="bangalore",
    bbox=aoi.bounds,
    date_range=("2024-01-01", "2024-01-31"),
    workspace_dir=workspace,
)

print(f"Collection: {collection.name}")
print(f"Scenes: {collection.dataset.count_rows()}")
print(f"Columns: {collection.dataset.schema.names[:8]}...")

Fetch pixels as xarray¶

get_xarray() reads only the tiles that intersect the AOI. Filters (cloud_cover_lt, date_range) are applied as Arrow pushdown predicates before any HTTP requests are made.

In [ ]:

Copied!





ds = collection.get_xarray(
    geometries=[aoi],
    bands=["B04", "B08"],
    cloud_cover_lt=20,
    date_range=("2024-01-10", "2024-01-30"),
)

ds
ds = collection.get_xarray(
    geometries=[aoi],
    bands=["B04", "B08"],
    cloud_cover_lt=20,
    date_range=("2024-01-10", "2024-01-30"),
)

ds

Compute NDVI¶

Standard xarray operations. Rasteret hands off a standard xr.Dataset.

In [ ]:

Copied!

ndvi = (ds["B08"] - ds["B04"]) / (ds["B08"] + ds["B04"])
out = xr.Dataset({"ndvi": ndvi}, coords=ds.coords, attrs=ds.attrs)
out
ndvi = (ds["B08"] - ds["B04"]) / (ds["B08"] + ds["B04"])
out = xr.Dataset({"ndvi": ndvi}, coords=ds.coords, attrs=ds.attrs)
out

What happened¶

build() looked up earthsearch/sentinel-2-l2a in the catalog, queried STAC once, parsed every COG header, and cached the result as partitioned Parquet.
get_xarray() read the Parquet index, computed which tiles intersect the AOI, fetched those tiles in parallel, and assembled an xr.Dataset.
Subsequent runs skip the STAC query and header parsing entirely.

Next: TorchGeo Integration