rasteret.ingest.parquet_record_table¶
Parquet/Arrow record-table driver for building Collections from tabular raster indexes. For user-facing examples, see Build from Parquet and Arrow Tables.
parquet_record_table
¶
Record-table collection builder.
Reads a Parquet/GeoParquet record table (one row per raster item) and
normalizes it into a :class:~rasteret.core.collection.Collection via
:func:~rasteret.ingest.normalize.build_collection_from_table.
Terminology
- Record table -- a tabular index that enumerates raster items (satellite scenes, drone images, derived products, grid cells, etc.). It may come from stac-geoparquet, a lab-specific registry, or a custom export.
- Collection Parquet -- Rasteret's normalized, runtime-ready Parquet dataset
that follows the
Schema Contract <explanation/schema-contract>_ docs page.
Classes¶
RecordTableBuilder
¶
RecordTableBuilder(
path: str | Path,
*,
data_source: str = "",
column_map: dict[str, str] | None = None,
href_column: str | None = None,
band_index_map: dict[str, int] | None = None,
url_rewrite_patterns: dict[str, str] | None = None,
filesystem: Any | None = None,
columns: list[str] | None = None,
filter_expr: Expression | None = None,
name: str = "",
workspace_dir: str | Path | None = None,
enrich_cog: bool = False,
band_codes: list[str] | None = None,
max_concurrent: int = 300,
backend: StorageBackend | None = None,
)
Bases: CollectionBuilder
Build a Collection from an existing Parquet/GeoParquet table.
Reads a Parquet record table where each row is a raster item
with at minimum the four contract columns (id, datetime,
geometry, assets), or columns that can be normalised into
them via column_map, href_column, and band_index_map.
When enrich_cog=True, the builder parses COG headers from the
asset URLs and adds {band}_metadata struct columns, making
the resulting Collection suitable for fast tiled reads and TorchGeo
integration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str or Path
|
Path/URI to the Parquet/GeoParquet file or dataset directory. |
required |
data_source
|
str
|
Data-source identifier for the resulting Collection. |
''
|
column_map
|
dict
|
|
None
|
href_column
|
str
|
Column containing COG URLs. When set and |
None
|
band_index_map
|
dict
|
|
None
|
url_rewrite_patterns
|
dict
|
|
None
|
filesystem
|
FileSystem
|
PyArrow filesystem for reading remote URIs (e.g.
|
None
|
columns
|
list of str
|
Scan-time column projection. |
None
|
filter_expr
|
Expression
|
Scan-time predicate pushdown. |
None
|
enrich_cog
|
bool
|
If |
False
|
band_codes
|
list of str
|
Bands to enrich. If omitted, all bands found in the |
None
|
max_concurrent
|
int
|
Maximum concurrent HTTP connections for COG header parsing. |
300
|
name
|
str
|
Collection name. Passed through to the normalisation layer. |
''
|
workspace_dir
|
str or Path
|
If provided, persist the resulting Collection as Parquet here. |
None
|
backend
|
StorageBackend
|
I/O backend for authenticated range reads during COG header parsing. |
None
|
Source code in src/rasteret/ingest/parquet_record_table.py
Functions¶
build
¶
Read the record table and return a normalized Collection.
Pipeline: read -> alias -> prepare -> enrich -> normalize.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
**kwargs
|
Any
|
|
{}
|
Returns:
| Type | Description |
|---|---|
Collection
|
|
Source code in src/rasteret/ingest/parquet_record_table.py
Functions¶
prepare_record_table
¶
prepare_record_table(
table: Table,
*,
href_column: str | None = None,
band_index_map: dict[str, int] | None = None,
url_rewrite_patterns: dict[str, str] | None = None,
required_columns: Sequence[str] | None = None,
) -> Table
Normalise column types and construct assets when absent.
This is a pure function (no instance state) so it can be used from both
:class:RecordTableBuilder and the in-memory build_from_table() path
without constructing a builder object.
Steps:
- Auto-coerce
id: integer -> string. - Auto-coerce
datetime: integer year -> timestamp. - Construct
assetsfrom href_column + band_index_map. - Derive legacy
proj:epsgfrom acrscolumn when present.
Source code in src/rasteret/ingest/parquet_record_table.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 | |