Getting Started¶
Installation¶
Rasteret requires Python 3.12 or later.
Add extras as needed:
uv pip install "rasteret[xarray]" # + xarray output
uv pip install "rasteret[torchgeo]" # + TorchGeo for ML pipelines
uv pip install "rasteret[aws]" # + requester-pays buckets (Landsat, NAIP)
Combine extras: uv pip install "rasteret[xarray,aws]"
All extras
| Extra | What it adds | When you need it |
|---|---|---|
xarray |
xarray | get_xarray() output |
torchgeo |
TorchGeo >=0.9.0 |
ML training pipelines |
aws |
boto3 | Requester-pays buckets (Landsat, NAIP) - credential resolution via boto3 |
azure |
planetary-computer | Planetary Computer signed URLs |
earthdata |
requests | Earthdata / DAAC auth (temporary S3 credentials) |
dev |
pytest, ruff, pre-commit | Running tests and linting |
docs |
mkdocs + plugins | Building documentation locally |
Verify the installation:
Running notebooks¶
Jupyter runs each notebook in a kernel, a separate Python process. To use your Rasteret environment as a kernel:
uv pip install ipykernel
python -m ipykernel install --user --name rasteret --display-name "Rasteret"
Then select the Rasteret kernel when you open a notebook.
marimo manages dependencies inline, no kernel
registration needed. Just uv pip install marimo alongside Rasteret
and run marimo edit notebook.py.
VS Code auto-detects virtual environments. Open a .ipynb file,
click the kernel picker (top-right), and select the Python interpreter
from your Rasteret venv.
Key concepts¶
- Collection - a local Parquet index of raster scenes + cached COG headers. Build, filter, read, share.
- Catalog - the registry of known data sources (Sentinel-2, Landsat, NAIP, ...). What
build()looks up. - Workspace - the directory where Collections are cached (
~/rasteret_workspace/by default).
Browse the catalog¶
Before building, see what's available:
$ rasteret datasets list
ID Name Coverage Auth
earthsearch/sentinel-2-l2a Sentinel-2 Level-2A global none
earthsearch/cop-dem-glo-90 Copernicus DEM 90m global none
earthsearch/landsat-c2-l2 Landsat Collection 2 Level-2 global required
earthsearch/naip NAIP north-america required
earthsearch/cop-dem-glo-30 Copernicus DEM 30m global none
pc/io-lulc-annual-v02 ESRI 10m Land Use/Land Cover global required
pc/esa-worldcover ESA WorldCover global required
aef/v1-annual AlphaEarth Foundation Embeddings global none
...
Pick any dataset ID and pass it to build() in the next section. For data not
in the catalog, use build_from_stac() or build_from_table(), or
add it to the catalog
so the whole community benefits.
Quick start¶
1. Build a Collection¶
import rasteret
collection = rasteret.build(
"earthsearch/sentinel-2-l2a",
name="s2-bangalore",
bbox=(77.5, 12.9, 77.7, 13.1),
date_range=("2024-01-01", "2024-06-30"),
)
collection
# Collection('s2-bangalore', source='sentinel-2-l2a', bands=13, records=42, crs=32643, 2024-01-01..2024-06-30)
collection.bands # ['B01', 'B02', ..., 'B12', 'SCL']
collection.bounds # (77.50, 12.90, 77.70, 13.10)
collection.epsg # [32643]
len(collection) # 42
What just happened?
Rasteret searched the STAC catalog, parsed the COG tile-layout headers for
every matching scene, and cached everything in a local Parquet index
(in ~/rasteret_workspace/ by default). The next call with the same
parameters reuses the cached index instantly; no network requests,
no header parsing. This is the core idea: build once, read many times.
Collection vs catalog
A Collection contains only the scenes you indexed (your bbox, date range,
filters). Use compare_to_catalog() to see what you built vs what the
source offers:
collection.compare_to_catalog()
# Collection: s2-bangalore
#
# Property Value
# ──────── ─────────────────────────────────────────────────────────
# Records 42
# Bands B01, B02, B03, B04, B05 (+8 more) (13/13)
# CRS EPSG:32643
# Bounds (77.5000, 12.9000, 77.7000, 13.1000)
# Dates 2024-01-01 .. 2024-06-30 (source: 2015-06-23 .. present)
# Source earthsearch/sentinel-2-l2a (Sentinel-2 Level-2A)
# Coverage global
# Auth none
In Jupyter/marimo notebooks this renders as a styled HTML table.
This example uses Sentinel-2 from Earth Search, which is public; no credentials needed. For requester-pays or authenticated sources, see Custom Cloud Provider.
2. Read pixels¶
Pick whichever output fits your workflow:
from torch.utils.data import DataLoader
from torchgeo.samplers import RandomGeoSampler
from torchgeo.datasets.utils import stack_samples
dataset = collection.to_torchgeo_dataset(
bands=["B04", "B03", "B02", "B08"],
chip_size=256,
)
sampler = RandomGeoSampler(dataset, size=256, length=100)
loader = DataLoader(dataset, sampler=sampler, batch_size=4, collate_fn=stack_samples)
Everything after to_torchgeo_dataset() is standard TorchGeo - your
samplers, transforms, and training loop stay the same.
ds = collection.get_xarray(
geometries=(77.55, 13.01, 77.58, 13.08), # bbox tuple
bands=["B04", "B08"],
)
ndvi = (ds.B08 - ds.B04) / (ds.B08 + ds.B04)
geometries also accepts Arrow arrays (from GeoParquet), Shapely
objects, or raw WKB. See Build from Parquet
for the Arrow-native path.
That's it for the basics. Two calls: build() to index, then read pixels.
Data types
Rasteret preserves the native COG dtype. Sentinel-2 data returns as
uint16, not float32. If you need float for computation (e.g., NDVI),
cast explicitly: ds.B04.astype("float32").
Which function do I use?
| Situation | Use |
|---|---|
| Dataset in the catalog (Sentinel-2, Landsat, NAIP, ...) | rasteret.build("earthsearch/sentinel-2-l2a", ...) |
| Custom STAC API not in the catalog | rasteret.build_from_stac(stac_api="...", ...) |
| Existing Parquet with COG URLs (Source Cooperative, STAC GeoParquet, custom) | rasteret.build_from_table("s3://...parquet", ...) |
| Someone shared a Collection with you | rasteret.load("path/to/collection/") |
Sharing: collection.export("path/") writes a portable copy. Your teammate runs rasteret.load("path/").
Going further¶
Once the quick start works, explore these as you need them:
| I want to... | Go to |
|---|---|
| Follow a hands-on notebook | Tutorials |
| Build from existing Parquet (Source Cooperative, STAC GeoParquet, custom) | Build from Parquet |
| Enrich a collection with AOIs and labels | Enriched Parquet Workflows |
| Use train/val/test splits and labels | ML Training with Splits |
| Manage cached collections (build/import/list/info/delete) | Collection Management |
| Browse the dataset catalog and reuse local Collections | Dataset Catalog |
| Read authenticated/requester-pays data (Landsat, PC, Earthdata) | Custom Cloud Provider |
| Understand how Rasteret works (Explanation) | Explanation |
| See how Rasteret fits with TorchGeo/xarray/rasterio | Ecosystem Comparison |
| See the full API | API Reference |