Skip to content

perf: introduce endpoint slice indexer#9221

Open
kkk777-7 wants to merge 2 commits into
envoyproxy:mainfrom
kkk777-7:perf-fetch-ep
Open

perf: introduce endpoint slice indexer#9221
kkk777-7 wants to merge 2 commits into
envoyproxy:mainfrom
kkk777-7:perf-fetch-ep

Conversation

@kkk777-7

@kkk777-7 kkk777-7 commented Jun 13, 2026

Copy link
Copy Markdown
Member

What this PR does / why we need it:

Currently, we have a bottleneck when listing EndpointSlices in large-scale environments.
We observed this with pprof; see #7573

The current implementation uses a label selector when listing EndpointSlice. In controller-runtime's cache client, label selectors first fetch all resources from cache, either in the specified namespace or across all namespaces, and then filter out non-matching items. when there are many EndpointSlices, this becomes expensive.
https://github.com/kubernetes-sigs/controller-runtime/blob/main/pkg/cache/internal/cache_reader.go#L112-L188

This PR adds field indexers for EndpointSlices and uses indexed field selectors instead of label selectors when looking up EndpointSlices for a backend.

In a test environment with 2,000 Services and 4,000 EndpointSlices, this made the EndpointSlice listing path about 12x faster.

Before

スクリーンショット 2026-06-13 午後10 58 57
flat  flat%   sum%        cum   cum%
0.01s 0.012% 0.035%     63.40s 73.70%  sigs.k8s.io/controller-runtime/pkg/client.(*client).List
0.01s 0.012% 0.047%     63.33s 73.62%  sigs.k8s.io/controller-runtime/pkg/cache.(*delegatingByGVKCache).List
0     0% 0.047%     63.29s 73.58%  sigs.k8s.io/controller-runtime/pkg/cache.(*informerCache).List
8.27s  9.61%  9.66%     63.15s 73.41%  sigs.k8s.io/controller-runtime/pkg/cache/internal.(*CacheReader).List

After

スクリーンショット 2026-06-13 午後5 36 43
flat  flat%   sum%        cum   cum%
0.01s 0.012% 33.86%      5.25s  6.04%  sigs.k8s.io/controller-runtime/pkg/client.(*client).List
0.03s 0.035% 33.89%      5.07s  5.84%  sigs.k8s.io/controller-runtime/pkg/cache.(*delegatingByGVKCache).List
0.01s 0.012% 33.91%      4.87s  5.61%  sigs.k8s.io/controller-runtime/pkg/cache.(*informerCache).List
0.11s  0.13% 35.11%      4.09s  4.71%  sigs.k8s.io/controller-runtime/pkg/cache/internal.(*CacheReader).List

Which issue(s) this PR fixes:

Fixes #7573

Release Notes: Yes

Signed-off-by: kkk777-7 <kota.kimura0725@gmail.com>
@netlify

netlify Bot commented Jun 13, 2026

Copy link
Copy Markdown

Deploy Preview for cerulean-figolla-1f9435 ready!

Name Link
🔨 Latest commit e78da29
🔍 Latest deploy log https://app.netlify.com/projects/cerulean-figolla-1f9435/deploys/6a2d660f945d7000086efdd2
😎 Deploy Preview https://deploy-preview-9221--cerulean-figolla-1f9435.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@codecov

codecov Bot commented Jun 13, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 75.00000% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.91%. Comparing base (6578a6d) to head (e78da29).

Files with missing lines Patch % Lines
internal/provider/kubernetes/indexers.go 75.00% 2 Missing and 2 partials ⚠️
internal/provider/kubernetes/controller.go 70.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9221      +/-   ##
==========================================
+ Coverage   74.89%   74.91%   +0.01%     
==========================================
  Files         252      252              
  Lines       40799    40819      +20     
==========================================
+ Hits        30558    30581      +23     
+ Misses       8158     8153       -5     
- Partials     2083     2085       +2     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Signed-off-by: kkk777-7 <kota.kimura0725@gmail.com>
@kkk777-7 kkk777-7 marked this pull request as ready for review June 13, 2026 14:15
@kkk777-7 kkk777-7 requested a review from a team as a code owner June 13, 2026 14:15
@kkk777-7 kkk777-7 added this to the v1.9.0-rc.1 Release milestone Jun 13, 2026
@kkk777-7

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Bravo.

Reviewed commit: e78da29345

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@zirain

zirain commented Jun 14, 2026

Copy link
Copy Markdown
Member

this's a tradeoff between memory and cpu, should we add feature flag?
cc @arkodg

@kkk777-7

Copy link
Copy Markdown
Member Author

this's a tradeoff between memory and cpu, should we add feature flag?

yes, I observed increase memory (environment with 2,000 Services and 4,000 EndpointSlices)

Metric Before After Difference
container_memory_working_set_bytes avg 109 MiB 123 MiB +14 MiB
container_memory_working_set_bytes max 167 MiB 214 MiB +47 MiB

Using [avg|max]_over_time(container_memory_working_set_bytes{namespace="envoy-gateway-system", container="envoy-gateway"}[1m]) / 1024 / 1024

Before

スクリーンショット 2026-06-15 午前2 06 09

After

スクリーンショット 2026-06-15 午前1 10 11

@kkk777-7

kkk777-7 commented Jun 14, 2026

Copy link
Copy Markdown
Member Author

So, I agree that it would be better for user to choose controller behavior.
I'll update this PR next weekend.

@jukie jukie requested review from a team June 15, 2026 21:43
@jukie

jukie commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Agreed on adding config options for it but would propose we make this the default behavior.

@zirain

zirain commented Jun 16, 2026

Copy link
Copy Markdown
Member

Agreed on adding config options for it but would propose we make this the default behavior.

let's discuss this in the meeting this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Perf: cpu spikes

3 participants