fix(layout): don't panic collecting an empty stream#8472
Conversation
Merging this PR will not alter performance
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | decompress_rd[f64, (10000, 0.01)] |
108.9 µs | 139.3 µs | -21.82% |
| ❌ | Simulation | decompress_rd[f64, (10000, 0.1)] |
109.2 µs | 139.6 µs | -21.79% |
| ❌ | Simulation | decompress_rd[f64, (10000, 0.0)] |
108.9 µs | 139.2 µs | -21.77% |
| ❌ | Simulation | decompress_rd[f32, (100000, 0.0)] |
496.1 µs | 583.9 µs | -15.04% |
| ❌ | Simulation | decompress_rd[f32, (10000, 0.1)] |
78.2 µs | 91.4 µs | -14.4% |
| ❌ | Simulation | decompress_rd[f32, (10000, 0.01)] |
78.2 µs | 91.1 µs | -14.16% |
| ❌ | Simulation | decompress_rd[f32, (10000, 0.0)] |
78.7 µs | 91.3 µs | -13.88% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[128] |
244.4 ns | 215.3 ns | +13.55% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[1024] |
304.7 ns | 275.6 ns | +10.58% |
| ⚡ | Simulation | eq_i64_constant |
317.9 µs | 287.9 µs | +10.42% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing miniex:fix/empty-struct-write-panic (13b42fc) with develop (9814173)
There was a problem hiding this comment.
Thanks for the contribution!
This seems correct, but I think we should fix this on a lower level.
We generally shouldn't use eof if we do not intend the chunk to go towards the end of file, so although it makes it possible to write an empty chunk, I think the right behaviour of collect strategy is to not yield anything on empty input.
If you just make that change the empty stream would propagate to the children and the flat layout would then fail. I think in the flat layout if there is no chunk we should return an empty ChunkedLayout which allows for no segments to exist at all, unlike flat layout which enforces one segment. If we do this we can save from writing a segment just to store an empty array
So the shape would be:
CollectStrategy: empty input yields no chunksFlatLayoutStrategy: empty input returns a zero-row, zero-childChunkedLayout- non-empty flat input keeps the existing exactly-one-chunk behavior
174774e to
1d1792b
Compare
|
Thanks for the steer - done.
Verified the empty file still opens and scans back to zero rows; full |
onursatici
left a comment
There was a problem hiding this comment.
thanks, one final comment and then we can merge
| let collected = ChunkedArray::try_new(chunks, _dtype)?.into_array(); | ||
| yield (latest_sequence_id.vortex_expect("must have visited at least one chunk"), collected); | ||
| // an empty input yields no chunk; the child layout handles it. | ||
| if let Some(sequence_id) = latest_sequence_id { |
There was a problem hiding this comment.
shall we invert this if and return early, something like:
let Some(sequence_id) = latest_sequence_id else {
return;
};
let collected = ChunkedArray::try_new(chunks, _dtype)?.into_array();
yield (sequence_id, collected);There was a problem hiding this comment.
done, switched to the let-else guard.
`CollectStrategy` panicked when its input stream was empty (writing a zero-row nullable struct column) because no chunk supplied a sequence id. mint the collected chunk's id from `eof` instead. Closes vortex-data#8347 Signed-off-by: Han Damin <miniex@daminstudio.net>
instead of minting a sequence id from `eof`, `CollectStrategy` now yields nothing on empty input and `FlatLayoutStrategy` returns an empty `ChunkedLayout`, so no segment is written for an empty array. Signed-off-by: Han Damin <miniex@daminstudio.net>
Signed-off-by: Han Damin <miniex@daminstudio.net>
0fd5c4d to
13b42fc
Compare
Summary
CollectStrategycollects its whole input into a single chunk for a child that requires exactly one chunk. When the input stream is empty, e.g. writing a zero-row table with a nullable struct column whose validity substream is empty, no chunk supplied a sequence id and it panicked withmust have visited at least one chunk.CollectStrategynow yields nothing on empty input, andFlatLayoutStrategyreturns an emptyChunkedLayoutinstead of requiring a single chunk, so no segment is written for an empty array.The issue mentions the fuzzer's
assume()guard could be dropped once this is fixed. I left it in place here: reading a nullable struct nested in a struct back is a separate bug (#8348), so the round-trip only works once both are fixed.Closes: #8347
Testing
Added a regression test in
vortex-file/tests/test_write_table.rsthat writes a zero-row nullable struct column; it panicked before this change and now writes and reads back as zero rows. The issue's Python repro (vx.io.writeof an emptystructcolumn) no longer panics.cargo nextest run -p vortex-layout -p vortex-filepasses (177 tests, including the segment-ordering tests).fmt --all+clippy --all-targets --all-featuresclean.I'm Korean, so sorry if any wording reads a little awkward.