Skip to content

improve slow query logging with additional context and statistics#7601

Open
SungJin1212 wants to merge 3 commits into
cortexproject:masterfrom
SungJin1212:enhance-slow-query-log
Open

improve slow query logging with additional context and statistics#7601
SungJin1212 wants to merge 3 commits into
cortexproject:masterfrom
SungJin1212:enhance-slow-query-log

Conversation

@SungJin1212

Copy link
Copy Markdown
Member

This PR enhances the slow query logging in the Query Frontend to provide more detailed information for diagnosing slow queries.

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
  • docs/configuration/v1-guarantees.md updated if this PR introduces experimental flags

@dosubot dosubot Bot added component/query-frontend go Pull requests that update Go code type/observability To help know what is going on inside Cortex labels Jun 8, 2026
@SungJin1212 SungJin1212 requested a review from friedrichg June 10, 2026 01:42
Comment thread CHANGELOG.md Outdated
* [ENHANCEMENT] Distributor: Added `cortex_distributor_received_histogram_buckets` metric to track number of buckets in received native histogram samples before validation, per user. #7569
* [ENHANCEMENT] Distributor: Add `WrappedHistogram` with configurable size limit (`-validation.max-native-histogram-size-bytes`) to cap native histogram protobuf size before unmarshalling. #7570
* [ENHANCEMENT] Ingester: Add lazy regex evaluation on head postings cache miss. Defers expensive regex matchers on high-cardinality labels to per-series filtering when a selective equality matcher already narrows the result set. Configured via `-blocks-storage.expanded_postings_cache.head.lazy-matcher-max-cardinality` (disabled by default). #7553
* [ENHANCEMENT] Query Frontend: Improve the slow query log with `component`, `source`, `user_agent`, `engine_type`, `block_store_type`, and query stats fields to aid slow query diagnosis. #7601

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see component in the code. is it there?

@friedrichg friedrichg left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this.

no tests updated? we need tests for this!.

@SungJin1212 SungJin1212 force-pushed the enhance-slow-query-log branch from bbd61e2 to 7f371a1 Compare June 17, 2026 10:52
@pull-request-size pull-request-size Bot added size/L and removed size/M labels Jun 17, 2026
@SungJin1212

SungJin1212 commented Jun 17, 2026

Copy link
Copy Markdown
Member Author

@friedrichg
I updated the test and the changelog.
Do we need to add the user ID (joined userIDs)? wdyt

When using a single tenant, the reportSlowQuery reports an org_id, but not in a multi-tenant (tenant federation).

//Single-tenant log
ts=2026-06-11T02:21:10.73096747Z caller=handler.go:438 level=info x-request-logging-headers-key= x-cortex-request-id=c3268102-a59c-4502-8d76-01ab19a81f99 org_id=user-xxxx msg="slow query detected"

Signed-off-by: SungJin1212 <tjdwls1201@gmail.com>
Signed-off-by: SungJin1212 <tjdwls1201@gmail.com>
Signed-off-by: SungJin1212 <tjdwls1201@gmail.com>
@SungJin1212 SungJin1212 force-pushed the enhance-slow-query-log branch from 1eadf7c to b34eada Compare June 18, 2026 01:39
Comment thread CHANGELOG.md
Comment on lines -4 to +7
* [FEATURE] Distributor: Add experimental `-distributor.num-query-workers` flag to use a goroutine worker pool for query fan-out calls to ingesters. Reuses pre-grown goroutine stacks to eliminate the `runtime.copystack` overhead (~8% CPU) observed on rulers with wide ingester fan-out. Falls back to spawning a new goroutine when no worker is available. #7623
* [CHANGE] Querier: Make query time range configurations per-tenant: `query_ingesters_within`, `query_store_after`, and `shuffle_sharding_ingesters_lookback_period`. Uses `model.Duration` instead of `time.Duration` to support serialization but has minimum unit of 1ms (nanoseconds/microseconds not supported). #7160
* [CHANGE] Cache: Setting `-blocks-storage.bucket-store.metadata-cache.bucket-index-content-ttl` to 0 will disable the bucket-index cache. #7446
* [CHANGE] HA Tracker: Move `-distributor.ha-tracker.failover-timeout` from a global config to a per-tenant runtime config. The flag name and default value (30s) remain the same. #7481
* [FEATURE] Distributor: Add experimental `-distributor.num-query-workers` flag to use a goroutine worker pool for query fan-out calls to ingesters. Reuses pre-grown goroutine stacks to eliminate the `runtime.copystack` overhead (~8% CPU) observed on rulers with wide ingester fan-out. Falls back to spawning a new goroutine when no worker is available. #7623

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix ordering

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/query-frontend go Pull requests that update Go code size/L type/observability To help know what is going on inside Cortex

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants