Skip to content

GCP: Fix GCSInputStream.readTail partial read returning short count#16894

Draft
thswlsqls wants to merge 1 commit into
apache:mainfrom
thswlsqls:fix/gcs-readtail-partial-read
Draft

GCP: Fix GCSInputStream.readTail partial read returning short count#16894
thswlsqls wants to merge 1 commit into
apache:mainfrom
thswlsqls:fix/gcs-readtail-partial-read

Conversation

@thswlsqls

Copy link
Copy Markdown
Contributor

Summary

  • GCSInputStream.readTail calls the private read() helper once, which does a single readChannel.read(buffer). A GCS NIO ReadChannel may return fewer bytes than requested even when more data remains, so readTail can return a short count — violating the RangeReadable.readTail contract to return the actual number of bytes read.
  • Fix: loop the channel read until the requested length is filled or EOF. S3InputStream.readTail and ADLSInputStream.readTail already loop via IOUtil.readRemaining; GCS uses a ReadChannel rather than an InputStream, so it needs a channel-based loop instead.
  • Only readTail changes (readFully is a separate concern).

Testing done

  • Added TestGCSInputStream#testReadTailPartialRead: a Mockito mock ReadChannel fills the buffer in two partial reads; asserts readTail returns the full length with the expected bytes.
  • The test fails against the pre-fix single-read code and passes after the fix.
  • ./gradlew :iceberg-gcp:check passed (test, spotlessCheck, checkstyle, errorProne). GCP is not a revapi module.

readTail called the private read() helper once, and that helper does a
single readChannel.read(buffer). A GCS NIO channel may return fewer bytes
than requested even when more data remains, so readTail could return a
short count and leave the tail partially read. This breaks the
RangeReadable.readTail contract.

Loop the channel read until the requested length is filled or EOF is
reached, matching S3InputStream and ADLSInputStream which already loop
via IOUtil.readRemaining. GCS uses a ReadChannel rather than an
InputStream, so it needs a channel-based loop instead.

Generated-by: Claude Code
@github-actions github-actions Bot added the GCP label Jun 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant