[experiment] Layout Reader#8518
Draft
gatesn wants to merge 51 commits into
Draft
Conversation
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Checkpoint of in-progress V2 ScanNode work (segment scheduling driver, scheduled segment source, scan scheduler) so agent fixes can be integrated on a clean base. Reviewed/benchmarked state. Signed-off-by: Nicholas Gates <nick@nickgates.com>
The scan2 StructScanNode single-field fast paths (single get_item and single-referenced-field expressions) routed straight to the child scan node, bypassing the parent struct's validity mask. Projecting one field out of a nullable struct therefore returned the child's own values and validity with no parent null mask applied, producing wrong nulls (and a non-nullable result where a nullable one was expected). Mirror the v1 struct reader's `array.mask(validity)` behaviour: add a small MaskScanNode that reads an input value and the struct's non-nullable boolean validity child and produces `mask(input, validity)`. Wrap the single-field fast-path results in MaskScanNode when the struct is nullable. The full push_struct path already threads validity through StructValueScanNode, so it is unchanged. Add a V1-vs-V2 differential test harness in vortex-file that scans the same ScanRequest through both paths and asserts equality across flat (nullable + non-nullable), chunked, dict-encoded, zoned, and nested nullable-struct fixtures, plus ports of the v1 struct-null regression tests (test_struct_layout_nulls / test_struct_layout_nested) to the V2 path. Before the fix the five nested-nullable-struct cases failed with "expected i32?, actual i32"; after the fix all 18 cases pass. Signed-off-by: Nicholas Gates <nick@nickgates.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…filter-first Port of the V1 multi-conjunct filter behavior to the V2 PartitionWorkScheduler driver: (1) sort filter conjuncts cheapest-first in PreparedScanNodeFile::try_new so expensive residuals (e.g. FSST LIKE) run after cheap selective ones; (2) when the demanded-row density falls below EXPR_EVAL_THRESHOLD (0.2), read the residual predicate with selection=need so the leaf returns the compacted array and the expression evaluates over only the demanded rows, scattering the verdict back via Mask::intersect_by_rank. Adds V1-vs-V2 differential cases (low- and high-density multi-conjunct) and a predicate_cost unit test. Improves ClickBench multi-conjunct filters (q22 701->547ms, q23 now < V1). A separate single-LIKE FSST amplification (q21) remains and is tracked separately. Signed-off-by: Nicholas Gates <nick@nickgates.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
V2 parallelizes the join probe, aggregate, and Arrow decode ACROSS DataFusion partitions (V1 instead fans one partition into many split tasks). When a query projected a heavily-encoded column (e.g. a single RunEnd chunk for lineitem.l_orderkey), the opener fed split_aligned_row_range coarse chunk boundaries, which collapsed every byte-range file_group onto one partition and serialized the probe ~2-wide (TPC-H q4 ran 2.6x slower than V1). Feed split_aligned_row_range the scan's own morsel ranges instead: the read-column chunk hints, or the 100k-row fallback when a read column is a single chunk (mirroring PreparedScanNodeFile::splits). Each morsel lands wholly in one partition, so the scan spreads across all of DataFusion's byte-range file_groups with no collapse and no chunk straddling a partition boundary. The assignment is contiguous per partition, so it is correct even when the scan output must preserve order. Also run the Vortex->Arrow conversion on the runtime CPU pool (handle.spawn_cpu + buffered/buffer_unordered) so decode fans out within a partition rather than running serially on the consumer poll thread. TPC-H SF1 (datafusion-bench, VORTEX_SCAN_IMPL=v2): q4 goes from 2.6x slower than V1 to faster than V1; overall ~parity. Signed-off-by: Nicholas Gates <nick@nickgates.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…H_FULL_PLAN With --show-metrics and VORTEX_BENCH_FULL_PLAN=1, print the DataFusion EXPLAIN ANALYZE-style annotated plan (elapsed_compute / output_rows per operator) to stderr, to localize where wall time goes across scan, HashJoin build/probe, and aggregate. Signed-off-by: Nicholas Gates <nick@nickgates.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Rename the runtime scan node API to ScanPlan and move the plan and segment primitives into vortex-scan. Layout v2 now expands directly through layout.new_scan_plan with a plan ScanRequest, and the docs describe the v2 path as the layout scan model. Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
This comment was marked as off-topic.
This comment was marked as off-topic.
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: "Nicholas Gates" <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: "Nicholas Gates" <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What feels like the 27th time I've explored this space, I think I might finally be getting somewhere.
This design pulls out essentially a scan engine. Layouts are actually just one way take serialized arrays and construct a ScanPlan, but in theory we could build a ScanPlan by hand or by any other means.
A ScanPlan node can accept push-down of various operations:
This plan can then be used to answer different types of questions:
[more description to come]