top | item 36904986

(no title)

poulpi | 2 years ago

If the issue happen a lot, there is also: https://github.com/datafold/data-diff

That is a nice tool to do it cross database as well.

I think it's based on checksum method.

discuss

order

Pxtl|2 years ago

Honestly if the resultsets are small-enough, I just dump them to JSON and diff the files. But it has to be fully deterministically sorted for that (in a sane world "order by *" would be valid ANSI SQL).

hichkaker|2 years ago

Thank you for mentioning Data Diff! Founder of Datafold here. We built Data Diff to solve a variety of problems that we encountered as data engineers: (A) Testing SQL code changes by diffing the output of production/dev versions of SQL query. (B) Validating that data is consistent when replicating data between databases.

Data Diff has two algorithms implemented for diffing in the same database and across databases. The former is based on JOIN, and the latter utilizes checksumming with binary search, which has minimal network IO and database workload overhead.