top | item 45698150

(no title)

philbe77 | 4 months ago

good point :) - we can re-aggregate HyperLogLog (HLL) sketches to get a pretty accurate NDV (Count Distinct) - see Query.farm's DataSketches DuckDB extension here: https://github.com/Query-farm/datasketches

We also have Bitmap aggregation capabilities for exact count distinct - something I worked with Oracle, Snowflake, Databricks, and DuckDB labs on implementing. It isn't as fast as HLL - but it is 100% accurate...

discuss

order

fifilura|4 months ago

I remember BigQuery had Distinct with HLL accuracy 10 years ago but rather quickly replaced it with actual accuracy.

How would you compare this solution to BigQuery?