refactor(approx_distinct): centralize grouped HLL type dispatch#22985
Closed
EdsonPetry wants to merge 2 commits into
Closed
refactor(approx_distinct): centralize grouped HLL type dispatch#22985EdsonPetry wants to merge 2 commits into
EdsonPetry wants to merge 2 commits into
Conversation
Make create_hll_groups_accumulator the single source of truth for the grouped HyperLogLog fast path, replacing the parallel type lists in groups_accumulator_supported and create_groups_accumulator. Add a unit test asserting the support predicate and constructor agree for every type.
Contributor
|
Sorry I missed this issue. This should already be cleaned up by a related refactor: #22921 |
Author
|
No problem! Thanks for letting me know. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
approx_distinctgrouped HLL dispatch #22819.Rationale for this change
approx_distinct's grouped HyperLogLog fast path encoded its supported input types in two independent places: theis_hll_groups_typepredicate (backinggroups_accumulator_supported) and thematchincreate_groups_accumulator. The two share a contract, ifgroups_accumulator_supportedreturnstrue,create_groups_accumulatormust succeed, but nothing enforced it. A future type addition or edit could update one path and not the other, either silently dropping a type to the slowGroupsAccumulatorAdapterpath or producing a runtimenot_impl_errfor an advertised-supported type.What changes are included in this PR?
create_hll_groups_accumulator(&DataType) -> Option<Box<dyn GroupsAccumulator>>as the single source of truth for grouped HLL dispatch (Some= supported,None= falls back to the per-groupAccumulatorpath).groups_accumulator_supportedtocreate_hll_groups_accumulator(..).is_some()andcreate_groups_accumulatorto wrap the helper, re-attaching the existingnot_impl_err!message for unsupported types.is_hll_groups_typepredicate; its rationale moves onto the helper's doc comment.No change to which types are supported or to runtime behavior, only consolidation.
Are these changes tested?
Yes. A new unit test,
grouped_support_predicate_matches_constructor, drives a table of representative supported and unsupportedDataTypes and asserts bothgroups_accumulator_supportedandcreate_groups_accumulator(..).is_ok()agree with the expected support for each, pinning both halves of the contract. It includes the unsupported time/unit combinations called out in the issue (Time32(Microsecond),Time32(Nanosecond),Time64(Second),Time64(Millisecond)) as regression guards. All existingapprox_distinctgrouped tests continue to pass.Are there any user-facing changes?
No.