top | item 27676414

(no title)

ahmedelsama | 4 years ago

I am a data person and I never knew about the boundary counting. Like you can have a `day diff` be 3 and the `week diff` be 1 and the `month diff` be 1 You can also have `day diff` be 25 and the `month diff` be 0.

I would love to know why warehouse do this because as a data person it confuses me and is even more confusing to explain to stakeholders.

discuss

order

cedricd|4 years ago

If I'm understanding correctly you think the boundary crossing thing is weird. Like why would it say that Jan 1st 2021 - Dec 31 2020 is 1 year, when it's more like 1/365 of a year.

I'm not necessarily the best person to defend it, but I think it has a couple nice properties.

1. it's an integer, so can be used in group_by, comparisons, bucketing, etc

2. it aligns to commonly understood boundaries, which helps with the above.

Another way to look at the boundary issue is that it only matters when things are close. If you get a 1 and don't like it, drop down a unit (year -> months or days) instead.

Comparing years the way I did above is obviously a bit of an edge case