Skip to content

refactor(approx_distinct): centralize grouped HLL type dispatch#22985

Closed
EdsonPetry wants to merge 2 commits into
apache:mainfrom
EdsonPetry:edsonpetry/issue-22819
Closed

refactor(approx_distinct): centralize grouped HLL type dispatch#22985
EdsonPetry wants to merge 2 commits into
apache:mainfrom
EdsonPetry:edsonpetry/issue-22819

Conversation

@EdsonPetry

@EdsonPetry EdsonPetry commented Jun 16, 2026

Copy link
Copy Markdown

Which issue does this PR close?

Rationale for this change

approx_distinct's grouped HyperLogLog fast path encoded its supported input types in two independent places: the is_hll_groups_type predicate (backing groups_accumulator_supported) and the match in create_groups_accumulator. The two share a contract, if groups_accumulator_supported returns true, create_groups_accumulator must succeed, but nothing enforced it. A future type addition or edit could update one path and not the other, either silently dropping a type to the slow GroupsAccumulatorAdapter path or producing a runtime not_impl_err for an advertised-supported type.

What changes are included in this PR?

  • Introduce create_hll_groups_accumulator(&DataType) -> Option<Box<dyn GroupsAccumulator>> as the single source of truth for grouped HLL dispatch (Some = supported, None = falls back to the per-group Accumulator path).
  • Rewire groups_accumulator_supported to create_hll_groups_accumulator(..).is_some() and create_groups_accumulator to wrap the helper, re-attaching the existing not_impl_err! message for unsupported types.
  • Remove the now-redundant is_hll_groups_type predicate; its rationale moves onto the helper's doc comment.

No change to which types are supported or to runtime behavior, only consolidation.

Are these changes tested?

Yes. A new unit test, grouped_support_predicate_matches_constructor, drives a table of representative supported and unsupported DataTypes and asserts both groups_accumulator_supported and create_groups_accumulator(..).is_ok() agree with the expected support for each, pinning both halves of the contract. It includes the unsupported time/unit combinations called out in the issue (Time32(Microsecond), Time32(Nanosecond), Time64(Second), Time64(Millisecond)) as regression guards. All existing approx_distinct grouped tests continue to pass.

Are there any user-facing changes?

No.

Make create_hll_groups_accumulator the single source of truth for the
grouped HyperLogLog fast path, replacing the parallel type lists in
groups_accumulator_supported and create_groups_accumulator. Add a unit
test asserting the support predicate and constructor agree for every
type.
@github-actions github-actions Bot added the functions Changes to functions implementation label Jun 16, 2026
@EdsonPetry EdsonPetry marked this pull request as ready for review June 16, 2026 22:10
@2010YOUY01

Copy link
Copy Markdown
Contributor

Sorry I missed this issue. This should already be cleaned up by a related refactor: #22921

@EdsonPetry

Copy link
Copy Markdown
Author

No problem! Thanks for letting me know.

@EdsonPetry EdsonPetry closed this Jun 17, 2026
@EdsonPetry EdsonPetry deleted the edsonpetry/issue-22819 branch June 17, 2026 01:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Centralize approx_distinct grouped HLL dispatch

2 participants