Skip to content

Avoid per-entry array allocation in Request#build_headers#2770

Merged
dblock merged 2 commits into
masterfrom
perf/build-headers-each-header
Jun 21, 2026
Merged

Avoid per-entry array allocation in Request#build_headers#2770
dblock merged 2 commits into
masterfrom
perf/build-headers-each-header

Conversation

@ericproulx

@ericproulx ericproulx commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Summary

Grape::Request#build_headers builds the request headers hash by walking the Rack env. It did this with each_header.with_object(...), which allocates a throwaway [k, v] Array for every header on the request. This PR drops with_object for a plain block, eliminating that per-entry allocation.

This came out of a memory_profiler run against a grape-on-rack app, where build_headers was the single highest object-count allocation site (~24k objects in the sampled run).

The problem

def build_headers
  each_header.with_object(Grape::Util::Header.new) do |(k, v), headers|
    next unless k.start_with? 'HTTP_'

    transformed_header = KNOWN_HEADERS.fetch(k) { -k[5..].tr('_', '-').downcase }
    headers[transformed_header] = v
  end
end

Enumerator#with_object yields the block a fixed shape — one value plus the memo: { |value, memo| }. But each_header produces two values per iteration (k, v). To fit two values into that single value slot, Ruby boxes them into a fresh [k, v] Array each iteration — which is exactly what |(k, v), headers| then destructures back apart. So every header does: allocate an array → pass it → destructure it → discard it → GC it. Pure overhead, once per request header.

The Enumerator object itself is not the cost — isolating it (enum.each { |k, v| }, no packing) is within ~5% of a direct block. The array packing accounts for essentially the entire gap.

The fix

def build_headers
  headers = Grape::Util::Header.new
  each_header do |k, v|
    next unless k.start_with? 'HTTP_'

    transformed_header = KNOWN_HEADERS.fetch(k) { -k[5..].tr('_', '-').downcase }
    headers[transformed_header] = v
  end
  headers
end

each_header { |k, v| } passes k and v as separate block arguments (normal multi-value yield) — no array boxing — and the accumulator becomes a plain closed-over local, so with_object's memo-threading is no longer needed.

Correctness

Output is byte-identical. each_header is Rack::Request::Env#each_header (full env iteration), and the HTTP_-prefix filter plus KNOWN_HEADERS/fallback transform are unchanged. build_headers is still lazy — it only runs when an endpoint reads headers (@headers ||= build_headers).

Measurements

On a request with ~30 headers (benchmark/ips + GC.stat object counts, Ruby 4.0.5):

variant objects/call throughput
each_header.with_object (before) 37 1.0x (baseline)
each_header { |k, v| } (after) 21 1.36x faster

~43% fewer objects and ~1.36x faster, with no behavior change.

@ericproulx ericproulx force-pushed the perf/build-headers-each-header branch from 61d32f8 to ad631db Compare June 21, 2026 11:23
@github-actions

github-actions Bot commented Jun 21, 2026

Copy link
Copy Markdown

Danger Report

No issues found.

View run

@ericproulx ericproulx requested a review from dblock June 21, 2026 11:24
`build_headers` walked the env with `each_header.with_object(...)`. Because
`Enumerator#with_object` hands the block a single value plus the memo, the
two values `each_header` yields (`k`, `v`) get boxed into a throwaway
`[k, v]` Array on every iteration — which the `|(k, v), headers|` destructure
then immediately unpacks. That is one array allocated, packed, destructured,
and discarded per request header.

Drop `with_object` for a plain `each_header do |k, v|` block writing into a
pre-built `Grape::Util::Header`. `k` and `v` arrive as separate block args
(normal multi-value yield), so no array is boxed, and the accumulator is just
a closed-over local.

Output is byte-identical (`each_header` is `Rack::Request::Env#each_header`,
i.e. full env iteration; the `HTTP_` filter is unchanged). `build_headers`
runs lazily, only when an endpoint reads `headers`.

Measured on a request with ~30 headers: ~43% fewer objects (37 -> 21 per
call) and ~1.36x faster, with the array packing — not the Enumerator object
itself — accounting for the entire gap.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ericproulx ericproulx force-pushed the perf/build-headers-each-header branch from ad631db to 73eb9a8 Compare June 21, 2026 11:28
@dblock dblock merged commit 7ac57ff into master Jun 21, 2026
69 checks passed
@dblock dblock deleted the perf/build-headers-each-header branch June 21, 2026 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants