Skip to content

fix(layout): don't panic collecting an empty stream#8472

Merged
robert3005 merged 3 commits into
vortex-data:developfrom
miniex:fix/empty-struct-write-panic
Jun 23, 2026
Merged

fix(layout): don't panic collecting an empty stream#8472
robert3005 merged 3 commits into
vortex-data:developfrom
miniex:fix/empty-struct-write-panic

Conversation

@miniex

@miniex miniex commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Summary

CollectStrategy collects its whole input into a single chunk for a child that requires exactly one chunk. When the input stream is empty, e.g. writing a zero-row table with a nullable struct column whose validity substream is empty, no chunk supplied a sequence id and it panicked with must have visited at least one chunk. CollectStrategy now yields nothing on empty input, and FlatLayoutStrategy returns an empty ChunkedLayout instead of requiring a single chunk, so no segment is written for an empty array.

The issue mentions the fuzzer's assume() guard could be dropped once this is fixed. I left it in place here: reading a nullable struct nested in a struct back is a separate bug (#8348), so the round-trip only works once both are fixed.

Closes: #8347

Testing

Added a regression test in vortex-file/tests/test_write_table.rs that writes a zero-row nullable struct column; it panicked before this change and now writes and reads back as zero rows. The issue's Python repro (vx.io.write of an empty struct column) no longer panics. cargo nextest run -p vortex-layout -p vortex-file passes (177 tests, including the segment-ordering tests). fmt --all + clippy --all-targets --all-features clean.


I'm Korean, so sorry if any wording reads a little awkward.

@miniex miniex requested a review from a team June 17, 2026 14:37
@joseph-isaacs joseph-isaacs requested a review from onursatici June 17, 2026 14:53
@codspeed-hq

codspeed-hq Bot commented Jun 17, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 3 improved benchmarks
❌ 7 regressed benchmarks
✅ 1571 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation decompress_rd[f64, (10000, 0.01)] 108.9 µs 139.3 µs -21.82%
Simulation decompress_rd[f64, (10000, 0.1)] 109.2 µs 139.6 µs -21.79%
Simulation decompress_rd[f64, (10000, 0.0)] 108.9 µs 139.2 µs -21.77%
Simulation decompress_rd[f32, (100000, 0.0)] 496.1 µs 583.9 µs -15.04%
Simulation decompress_rd[f32, (10000, 0.1)] 78.2 µs 91.4 µs -14.4%
Simulation decompress_rd[f32, (10000, 0.01)] 78.2 µs 91.1 µs -14.16%
Simulation decompress_rd[f32, (10000, 0.0)] 78.7 µs 91.3 µs -13.88%
Simulation bitwise_not_vortex_buffer_mut[128] 244.4 ns 215.3 ns +13.55%
Simulation bitwise_not_vortex_buffer_mut[1024] 304.7 ns 275.6 ns +10.58%
Simulation eq_i64_constant 317.9 µs 287.9 µs +10.42%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing miniex:fix/empty-struct-write-panic (13b42fc) with develop (9814173)

Open in CodSpeed

@miniex miniex marked this pull request as draft June 17, 2026 22:40
@miniex miniex marked this pull request as ready for review June 17, 2026 22:44

@onursatici onursatici left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!
This seems correct, but I think we should fix this on a lower level.
We generally shouldn't use eof if we do not intend the chunk to go towards the end of file, so although it makes it possible to write an empty chunk, I think the right behaviour of collect strategy is to not yield anything on empty input.
If you just make that change the empty stream would propagate to the children and the flat layout would then fail. I think in the flat layout if there is no chunk we should return an empty ChunkedLayout which allows for no segments to exist at all, unlike flat layout which enforces one segment. If we do this we can save from writing a segment just to store an empty array

So the shape would be:

  • CollectStrategy: empty input yields no chunks
  • FlatLayoutStrategy: empty input returns a zero-row, zero-child ChunkedLayout
  • non-empty flat input keeps the existing exactly-one-chunk behavior

@miniex miniex force-pushed the fix/empty-struct-write-panic branch from 174774e to 1d1792b Compare June 19, 2026 00:15
@miniex

miniex commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the steer - done.

CollectStrategy now yields nothing on empty input, and FlatLayoutStrategy returns an empty ChunkedLayout instead of requiring a single chunk, so no segment is written for an empty array. Dropped the eof minting as you suggested. Non-empty flat input keeps the exactly-one-chunk path.

Verified the empty file still opens and scans back to zero rows; full vortex-layout + vortex-file suites green.

@miniex miniex requested a review from onursatici June 19, 2026 00:16

@onursatici onursatici left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, one final comment and then we can merge

Comment thread vortex-layout/src/layouts/collect.rs Outdated
let collected = ChunkedArray::try_new(chunks, _dtype)?.into_array();
yield (latest_sequence_id.vortex_expect("must have visited at least one chunk"), collected);
// an empty input yields no chunk; the child layout handles it.
if let Some(sequence_id) = latest_sequence_id {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we invert this if and return early, something like:

  let Some(sequence_id) = latest_sequence_id else {
      return;
  };

  let collected = ChunkedArray::try_new(chunks, _dtype)?.into_array();
  yield (sequence_id, collected);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, switched to the let-else guard.

miniex added 3 commits June 20, 2026 00:23
`CollectStrategy` panicked when its input stream was empty (writing a zero-row
nullable struct column) because no chunk supplied a sequence id. mint the
collected chunk's id from `eof` instead.

Closes vortex-data#8347

Signed-off-by: Han Damin <miniex@daminstudio.net>
instead of minting a sequence id from `eof`, `CollectStrategy` now yields nothing
on empty input and `FlatLayoutStrategy` returns an empty `ChunkedLayout`, so no
segment is written for an empty array.

Signed-off-by: Han Damin <miniex@daminstudio.net>
Signed-off-by: Han Damin <miniex@daminstudio.net>
@miniex miniex force-pushed the fix/empty-struct-write-panic branch from 0fd5c4d to 13b42fc Compare June 19, 2026 15:24
@miniex miniex requested a review from onursatici June 19, 2026 15:25
@robert3005 robert3005 added the changelog/fix A bug fix label Jun 22, 2026
@miniex miniex requested a review from onursatici June 22, 2026 11:55
@robert3005 robert3005 merged commit 4668b6e into vortex-data:develop Jun 23, 2026
75 of 79 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/fix A bug fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Writing an empty table with a top-level struct column panics: "must have visited at least one chunk"

3 participants