Skip to content

MINOR: Preserve empty list offset buffer#30

Open
prashanthbdremio wants to merge 9 commits into
mainfrom
prashanthbdremio/dx-122953
Open

MINOR: Preserve empty list offset buffer#30
prashanthbdremio wants to merge 9 commits into
mainfrom
prashanthbdremio/dx-122953

Conversation

@prashanthbdremio

Copy link
Copy Markdown

What's Changed

Preserve the Arrow-required empty list offset buffer entry while avoiding malformed Netty buffer state. ListVector and LargeListVector now materialize a one-entry offset buffer when an empty vector still needs to expose offset[0], then set the writer index from (valueCount + 1) * OFFSET_WIDTH.

This keeps the IPC serialization behavior from apache#967 without producing an ArrowBuf with writerIndex > capacity, which caused Netty unwrap failures in Dremio sender paths.

Testing

mvn -pl vector -am -Dmaven.gitcommitid.skip=true -Dsurefire.failIfNoSpecifiedTests=false -Dtest=TestListVector,TestLargeListVector test

Result: BUILD SUCCESS. TestListVector and TestLargeListVector passed under both Netty and Unsafe allocator executions.

Closes DX-122953.

@prashanthbdremio prashanthbdremio changed the title DX-122953 Preserve empty list offset buffer MINOR: Preserve empty list offset buffer Jun 22, 2026
@github-actions

This comment has been minimized.

@selvaganesang

Copy link
Copy Markdown

LGTM

@selvaganesang

selvaganesang commented Jul 2, 2026

Copy link
Copy Markdown

On debugging the newly added TestFragmentWritableBatch.emptyListVectorOffsetBufferIsInconsistentAfterUnload in my dremio PR https://github.com/dremio/dremio/pull/25327, it is clear that VectorUnloader call set the offset buffer writerIndex to 4. And this call in FragmentWritableBatch gets the buffers to be used by Netty. The VectorUnloaded calls

private void appendNodes(
FieldVector vector,
List nodes,
List buffers,
List variadicBufferCounts) {
nodes.add(
new ArrowFieldNode(vector.getValueCount(), includeNullCount ? vector.getNullCount() : -1));
List fieldBuffers = vector.getFieldBuffers();

The statement in bold calls
public List getFieldBuffers() {
List result = new ArrayList<>(2);
setReaderAndWriterIndex();
result.add(validityBuffer);
result.add(offsetBuffer);

return result;

}

Fixing in setReaderAndWriterIndex is needed and it is sync with BaseVariableWidthVector.setReaderAndWriterIndex for VarcharVector.

It is also possible that ListVector.empty() should have created the offsetBuffer with one entry. empty() method is available in ListVector and StructVector only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

2 participants