AlphaEarth Foundation Embeddings (AEF)¶
AlphaEarth Foundation (AEF) embeddings are 64-band, 10 m resolution foundation-model features published as tiled GeoTIFFs. Rasteret reads them efficiently by caching COG tile headers once and then doing fast byte-range reads for each query.
Quick start¶
AEF is a built-in catalog dataset. Three lines gets you from zero to pixels:
import rasteret
collection = rasteret.build(
"aef/v1-annual",
name="aef-demo",
bbox=(11.3, -0.002, 11.5, 0.001),
date_range=("2023-01-01", "2023-12-31"),
)
ds = collection.get_xarray(
geometries=(11.3, -0.002, 11.5, 0.001),
bands=["A00", "A01", "A31", "A63"],
)
build() reads the published AEF GeoParquet index from Source Cooperative,
filters by bbox and year, constructs the assets from each tile's COG URL,
parses COG headers, and caches everything as a local Parquet index.
Subsequent calls reuse the cache.
Selecting bands¶
AEF COGs contain 64 bands (A00--A63). Enrichment parses the full
tile layout once per COG (a single HTTP request regardless of band count),
so you pay no extra cost at build time.
Select which bands to read at query time:
# Read all 64 bands
ds = collection.get_xarray(geometries=bbox, bands=[f"A{i:02d}" for i in range(64)])
# Read a subset
ds = collection.get_xarray(geometries=bbox, bands=["A00", "A15", "A31"])
De-quantization and nodata¶
AEF embeddings are stored as signed 8-bit integers. Nodata is -128.
To convert to floats in [-1.0, 1.0]:
import numpy as np
raw = ds["A00"].values.squeeze()
embedding = np.where(raw == -128, np.nan, raw.astype(np.float32) / 127.0)
DuckDB + Arrow workflow (advanced)¶
If you want full control over filtering (e.g. UTM zone, custom SQL predicates), use DuckDB to query the AEF GeoParquet index, then pass the filtered Arrow table directly into Rasteret via zero-copy:
import duckdb
import rasteret
con = duckdb.connect()
filtered = con.execute("""
SELECT *
FROM read_parquet('https://data.source.coop/tge-labs/aef/v1/annual/aef_index.parquet')
WHERE year = 2023
AND utm_zone = '32N'
AND wgs84_east >= 11.3 AND wgs84_west <= 11.5
LIMIT 3
""").fetch_arrow_table() # zero-copy Arrow table
collection = rasteret.build_from_table(
filtered, # Arrow table passed directly, no serialization
name="aef-duckdb",
column_map={"fid": "id", "geom": "geometry", "year": "datetime"},
href_column="path",
band_index_map={f"A{i:02d}": i for i in range(64)},
url_rewrite_patterns={
"s3://us-west-2.opendata.source.coop/": "https://data.source.coop/",
},
enrich_cog=True,
band_codes=["A00", "A01", "A31", "A63"],
force=True,
)
The key: fetch_arrow_table() returns a PyArrow Table, and
build_from_table() accepts it directly. No CSV export, no file I/O,
no intermediate copies. Rasteret's column_map aliases the DuckDB
column names to the contract schema, href_column + band_index_map
construct the assets struct, and proj:epsg is derived automatically
from the crs column.
See the runnable script at examples/aef_duckdb_query.py.
How it works internally¶
Rasteret reads AEF COGs with the same reader used for all datasets. The AEF-specific piece is the catalog descriptor which declares how the GeoParquet index maps to Rasteret's schema:
| Descriptor field | Purpose |
|---|---|
column_map |
{"fid": "id", "geom": "geometry", "year": "datetime"} - alias source columns to the schema contract |
href_column |
"path" - column containing COG URLs |
band_index_map |
{"A00": 0, "A01": 1, ..., "A63": 63} - maps band codes to sample indices in the multi-band COG |
bbox_columns |
{"minx": "wgs84_west", ...} - enables predicate pushdown on the 235k-row index |
Three general-purpose COG features combine to make the GeoTIFFs work:
-
ModelTransformationTag: AEF COGs store their geotransform as a 4x4 matrix (TIFF tag 34264) instead of the more common PixelScale + Tiepoint tags. Rasteret's header parser extracts axis-aligned transforms from either representation.
-
South-up orientation: AEF tiles have a positive Y pixel scale (
scale_y > 0), meaning row 0 is the southern edge. Rasteret's tile math is sign-agnostic and works with both north-up and south-up rasters. -
Planar-separate multi-band: The 64 bands are stored with TIFF PlanarConfiguration=2. The
band_indexfield in the asset dict tells the enrichment pipeline to slice the TileOffsets/TileByteCounts arrays per band, so the tile reader handles each band independently.
For the full band_index specification, see
Schema Contract -- asset dict.