Problem
After CachingIcebergCatalog refreshes a table's metadata (SQL
REFRESH EXTERNAL TABLE, the periodic background refresh, or the
catalog's own refresh hooks), the data files that the new snapshot
references are not yet present in BE/CN block_cache. The first user
query on the table therefore pays a S3 round-trip (~50–150 ms,
sometimes more depending on region and bucket warmup) per file
just to read the parquet/orc footer. Multiplied across the files in
a scan plan, this dominates first-query latency on cold tables and
shows up as a visible "first run is slow" cliff that disappears on
subsequent runs.
Why it bites
The CACHE SELECT path already exists and is the documented way to
warm block_cache against a table, but:
- It is a user-driven SQL statement — there is no built-in hook that
makes the warm-up happen automatically when metadata changes.
- Even when issued manually, the CACHE SELECT scanner today does not
wrap the underlying _file in a populating CacheInputStream, so
the parquet reader->init footer read goes straight to raw storage
on every CACHE SELECT — only the column ranges explicitly fed to
CacheSelectInputStream via _write_disk_ranges end up in
block_cache. The footer is exactly the part that drives the
per-file S3 round-trip in subsequent real queries.
The net effect is that footers are paid for at user-query time even
on warehouses where operators are happy to spend background IO to
hide that latency from interactive workloads.
Proposed direction
- Add an opt-in (default false) hook in
CachingIcebergCatalog that
fires a CACHE SELECT against the freshly-refreshed table on a
per-catalog background executor, with duplicate-trigger coalescing.
- Teach
HdfsScanner::create_random_access_file to wrap _file in
a regular CacheInputStream for cache_select mode too, so the
footer read populates block_cache during CACHE SELECT (this is a
generic fix, not feature-gated).
- Add a footer-only mode to
CacheSelectScanner so the new hook can
stop scanning after reader->init (footer is already warmed) and
skip column data and Iceberg delete-file fetches. Exposed only via
an internal INVISIBLE session variable so user-issued CACHE
SELECTs do not silently degrade to footer-only.
Problem
After
CachingIcebergCatalogrefreshes a table's metadata (SQLREFRESH EXTERNAL TABLE, the periodic background refresh, or thecatalog's own refresh hooks), the data files that the new snapshot
references are not yet present in BE/CN
block_cache. The first userquery on the table therefore pays a S3 round-trip (~50–150 ms,
sometimes more depending on region and bucket warmup) per file
just to read the parquet/orc footer. Multiplied across the files in
a scan plan, this dominates first-query latency on cold tables and
shows up as a visible "first run is slow" cliff that disappears on
subsequent runs.
Why it bites
The
CACHE SELECTpath already exists and is the documented way towarm
block_cacheagainst a table, but:makes the warm-up happen automatically when metadata changes.
wrap the underlying
_filein a populatingCacheInputStream, sothe parquet
reader->initfooter read goes straight to raw storageon every CACHE SELECT — only the column ranges explicitly fed to
CacheSelectInputStreamvia_write_disk_rangesend up inblock_cache. The footer is exactly the part that drives the
per-file S3 round-trip in subsequent real queries.
The net effect is that footers are paid for at user-query time even
on warehouses where operators are happy to spend background IO to
hide that latency from interactive workloads.
Proposed direction
CachingIcebergCatalogthatfires a
CACHE SELECTagainst the freshly-refreshed table on aper-catalog background executor, with duplicate-trigger coalescing.
HdfsScanner::create_random_access_fileto wrap_fileina regular
CacheInputStreamfor cache_select mode too, so thefooter read populates block_cache during CACHE SELECT (this is a
generic fix, not feature-gated).
CacheSelectScannerso the new hook canstop scanning after
reader->init(footer is already warmed) andskip column data and Iceberg delete-file fetches. Exposed only via
an internal
INVISIBLEsession variable so user-issued CACHESELECTs do not silently degrade to footer-only.