feat: Support float 32 and float 64 in approx_distinct by mkleen · Pull Request #23084 · apache/datafusion

mkleen · 2026-06-22T13:24:57Z

Which issue does this PR close?

Relates to Support more types for approx_distinct function #22989 but does not close it. More types are coming.

Rationale for this change

Float32 and Float64 were not supported in approx_distinct yet.
Float32 and Float64 can be hashed generically via create_hashesin HLLAccumulator and HllGroupsAccumulator , exactly like the string/binary types over their bit patterns.
Float32 and Float64 don't implement Hash on their native type so they can't use NumericHLLAccumulator.

What changes are included in this PR?

Enable HLLAccumulator and HllGroupsAccumulator to support Float32 and Float64.
Tests for the non-grouped and grouped path as part of approx_distinct.rst and aggregate.slt.
Benchmarks for completeness.

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes, approx_distinct supports now Float32 and Float64 but no breaking changes.

mkleen · 2026-06-23T10:54:25Z

@Jefffrey Thanks for the review!

alamb · 2026-06-23T16:20:34Z

What is the usecase for doing (bitwise) distinct for floats?

mkleen · 2026-06-23T17:13:38Z

What is the usecase for doing (bitwise) distinct for floats?

Would bitwise distinct fundamentally different from a normal (mathematical) distinct ? My intention was to just support float types here.

mkleen · 2026-06-23T17:36:19Z

The corner cases for floats such as -0.0, 0.0 and NaN are handled correctly because of this #22835.

mkleen · 2026-06-23T17:42:35Z

Ok, from reading up on this subject this whole pr is not a good idea.

alamb · 2026-06-23T18:05:23Z

Ok, from reading up on this subject this whole pr is not a good idea.

I am sorry to be a wet blanket @mkleen -- and I apologize if I have wasted any of your time. Thank you for all your contributions so far. They have made the project better.

mkleen · 2026-06-23T18:17:54Z

Ok, from reading up on this subject this whole pr is not a good idea.

I am sorry to be a wet blanket @mkleen -- and I apologize if I have wasted any of your time. Thank you for all your contributions so far. They have made the project better.

No worries, this is for what reviews are for. Thank you for pointing out I was on the wrong track!

2010YOUY01 · 2026-06-24T01:12:26Z

Just to make sure I have the same understanding: is the concern that it is ambiguous whether nearly contiguous float values should be treated as distinct? For example:

-- Should these be treated as distinct or equal?
3.141590, 3.141591, 3.141592

I think the +-0.0 corner cases are not the key issue here, since they only make the result differ by a small constant and can also be special-cased.

I think it is still valuable to add a comment, or improve the error message, to explain why approx_distinct does not support float types.

alamb · 2026-06-24T22:03:12Z

Just to make sure I have the same understanding: is the concern that it is ambiguous whether nearly contiguous float values should be treated as distinct? For example:

Yeah -- and more specifically I think the idea is that since floating point values are typically not precise (they are subject to rounding error, etc) semantically trying to say "what are the distinct set of these values that are not exact" doesn't make much sense

I think it is still valuable to add a comment, or improve the error message, to explain why approx_distinct does not support float types.

Agreed

github-actions Bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Jun 22, 2026

mkleen changed the title ~~Support float32 and float64 in approx_distinct~~ feat: Support float32 and float64 in approx_distinct Jun 22, 2026

mkleen force-pushed the float_approx_distinct branch from 2e71c6c to 58dbfa1 Compare June 22, 2026 14:09

mkleen changed the title ~~feat: Support float32 and float64 in approx_distinct~~ feat: Support float 32 and float 64 in approx_distinct Jun 22, 2026

mkleen force-pushed the float_approx_distinct branch 5 times, most recently from 48921b9 to c7599d2 Compare June 23, 2026 08:52

feat: Support float 32 and float 64 in approx_distinct

d5f203c

mkleen force-pushed the float_approx_distinct branch from c7599d2 to d5f203c Compare June 23, 2026 10:01

mkleen marked this pull request as ready for review June 23, 2026 10:02

Jefffrey approved these changes Jun 23, 2026

View reviewed changes

fix typo

ec400a7

mkleen closed this Jun 23, 2026

alamb mentioned this pull request Jun 23, 2026

Support more types for approx_distinct function #22989

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Support float 32 and float 64 in approx_distinct#23084

feat: Support float 32 and float 64 in approx_distinct#23084
mkleen wants to merge 2 commits into
apache:mainfrom
mkleen:float_approx_distinct

mkleen commented Jun 22, 2026 •

edited

Loading

Uh oh!

mkleen commented Jun 23, 2026

Uh oh!

alamb commented Jun 23, 2026

Uh oh!

mkleen commented Jun 23, 2026 •

edited

Loading

Uh oh!

mkleen commented Jun 23, 2026

Uh oh!

mkleen commented Jun 23, 2026

Uh oh!

alamb commented Jun 23, 2026

Uh oh!

mkleen commented Jun 23, 2026

Uh oh!

2010YOUY01 commented Jun 24, 2026

Uh oh!

alamb commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

mkleen commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

mkleen commented Jun 23, 2026

Uh oh!

alamb commented Jun 23, 2026

Uh oh!

mkleen commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mkleen commented Jun 23, 2026

Uh oh!

mkleen commented Jun 23, 2026

Uh oh!

alamb commented Jun 23, 2026

Uh oh!

mkleen commented Jun 23, 2026

Uh oh!

2010YOUY01 commented Jun 24, 2026

Uh oh!

alamb commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mkleen commented Jun 22, 2026 •

edited

Loading

mkleen commented Jun 23, 2026 •

edited

Loading