Skip to content

feat: Support float 32 and float 64 in approx_distinct#23084

Closed
mkleen wants to merge 2 commits into
apache:mainfrom
mkleen:float_approx_distinct
Closed

feat: Support float 32 and float 64 in approx_distinct#23084
mkleen wants to merge 2 commits into
apache:mainfrom
mkleen:float_approx_distinct

Conversation

@mkleen

@mkleen mkleen commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

  • Float32 and Float64 were not supported in approx_distinct yet.
  • Float32 and Float64 can be hashed generically via create_hashesin HLLAccumulator and HllGroupsAccumulator , exactly like the string/binary types over their bit patterns.
  • Float32 and Float64 don't implement Hash on their native type so they can't use NumericHLLAccumulator.

What changes are included in this PR?

  • Enable HLLAccumulator and HllGroupsAccumulator to support Float32 and Float64.
  • Tests for the non-grouped and grouped path as part of approx_distinct.rst and aggregate.slt.
  • Benchmarks for completeness.

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes, approx_distinct supports now Float32 and Float64 but no breaking changes.

@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Jun 22, 2026
@mkleen mkleen changed the title Support float32 and float64 in approx_distinct feat: Support float32 and float64 in approx_distinct Jun 22, 2026
@mkleen mkleen force-pushed the float_approx_distinct branch from 2e71c6c to 58dbfa1 Compare June 22, 2026 14:09
@mkleen mkleen changed the title feat: Support float32 and float64 in approx_distinct feat: Support float 32 and float 64 in approx_distinct Jun 22, 2026
@mkleen mkleen force-pushed the float_approx_distinct branch 5 times, most recently from 48921b9 to c7599d2 Compare June 23, 2026 08:52
@mkleen mkleen force-pushed the float_approx_distinct branch from c7599d2 to d5f203c Compare June 23, 2026 10:01
@mkleen mkleen marked this pull request as ready for review June 23, 2026 10:02
@mkleen

mkleen commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

@Jefffrey Thanks for the review!

@alamb

alamb commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

What is the usecase for doing (bitwise) distinct for floats?

@mkleen

mkleen commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

What is the usecase for doing (bitwise) distinct for floats?

Would bitwise distinct fundamentally different from a normal (mathematical) distinct ? My intention was to just support float types here.

@mkleen

mkleen commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

The corner cases for floats such as -0.0, 0.0 and NaN are handled correctly because of this #22835.

@mkleen

mkleen commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

Ok, from reading up on this subject this whole pr is not a good idea.

@alamb

alamb commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Ok, from reading up on this subject this whole pr is not a good idea.

I am sorry to be a wet blanket @mkleen -- and I apologize if I have wasted any of your time. Thank you for all your contributions so far. They have made the project better.

@mkleen

mkleen commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

Ok, from reading up on this subject this whole pr is not a good idea.

I am sorry to be a wet blanket @mkleen -- and I apologize if I have wasted any of your time. Thank you for all your contributions so far. They have made the project better.

No worries, this is for what reviews are for. Thank you for pointing out I was on the wrong track!

@2010YOUY01

Copy link
Copy Markdown
Contributor

Just to make sure I have the same understanding: is the concern that it is ambiguous whether nearly contiguous float values should be treated as distinct? For example:

-- Should these be treated as distinct or equal?
3.141590, 3.141591, 3.141592

I think the +-0.0 corner cases are not the key issue here, since they only make the result differ by a small constant and can also be special-cased.

I think it is still valuable to add a comment, or improve the error message, to explain why approx_distinct does not support float types.

@alamb

alamb commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Just to make sure I have the same understanding: is the concern that it is ambiguous whether nearly contiguous float values should be treated as distinct? For example:

Yeah -- and more specifically I think the idea is that since floating point values are typically not precise (they are subject to rounding error, etc) semantically trying to say "what are the distinct set of these values that are not exact" doesn't make much sense

I think it is still valuable to add a comment, or improve the error message, to explain why approx_distinct does not support float types.

Agreed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants