top | item 38379402

(no title)

supercoco9 | 2 years ago

Hi. Sorry if my query offended you.

I basically executed literally what Clickhouse recommends at their guides for deduplication https://clickhouse.com/docs/en/guides/developer/deduplicatio....

Of course you can also materialize with aggregations or just use a group by, or even force optimize of the table. But my point is that you don't really get exactly once guarantees. Whoever is querying that table needs to be aware than a `SELECT * FROM tb` might contain duplicates and needs to adapt their queries accordingly.

discuss

order

higeorge13|2 years ago

I believe there are 0 people working with CH and ReplacingMergeTree and don’t know that they have to use final or group by in order to get non duplicate data. It’s mentioned in the table engine page, their knowledge base everywhere.

Also i have not recently seen anyone not recommending it. It might have been the case a few years ago, but performance of final has improved and it’s faster than alternatives. People suggest to use MergeTrees obviously but if no alternative, replacing is the way to go.