top | item 29741031

(no title)

Upitor | 4 years ago

In my opinion this proposal seems only to consider simple cases, but there are many not-so-simple relationsship types:

Consider a ‘sales’ table which includes columns [time] and [sold_by_employee_id], and a periodized ‘employee’ table which includes columns [employee_id], [valid_from] and [valid_to] columns. There is a perfectly valid relationsship between the two tables, but you cant join them using only equal-statements (you need a between-statement as well)

discuss

order

JoelJacobson|4 years ago

Nice example! The join you describe would remain as a JOIN ON.

This is per design. Quote from the proposal:

"The idea is to improve the SQL language, specifically the join syntax, for the special but common case when joining on foreign key columns." ... "If the common simple joins (when joining on foreign key columns) would be written in a different syntax, the remaining joins would visually stand out and we could focus on making sure we understand them when reading a large SQL query."

So, the special non-equal based join condition you describe, would become more visible, and stand out, allowing readers to pay more attention to it.

The hypothesis is most joins are made on foreign key columns, so if we can improve such cases, a lot can be won.

DevKoala|4 years ago

But how could you accurately tell if some queries join on the foreign key, but were written by someone without knowledge of the new specification?

Upitor|4 years ago

I see. I should have read the article more carefully :-)

Upitor|4 years ago

Also, consider a ‘sales’ table with multiple references to a ‘calendar’ table: [shipped_date], [order_date], [received_date]

JoelJacobson|4 years ago

Good example too, but this one can with benefit be written using the JOIN FOREIGN syntax, you just need to give the foreign keys suitable names such as e.g. "shipped_date", "order_date", "received_date". Or, to remind you of which is the referenced table, perhaps you want to include it in the names, and the names would be "shipped_date_calendar", "order_date_calendar", "received_date_calendar", but personally I would prefer just "shipped_date" or perhaps even just "shipped".

CRConrad|4 years ago

Are those timestamps, or actual dates, i.e. meaning a whole day encompassing (usually) 24 hours?

In many (most?) data warehousing projects I've seen, you make a "fake date" integer the primary key column of your Dates dimension. This integer consists of 10000 × YEAR_PART + 100 × MONTH_PART + 1 × DAY_PART of the date in question, so yesterday's New Year's Eve woul get 10000 × 2021 + 100 × 12 + 1 × 31 = 20211231. The date dimension itself has many more columns (often booleans, IS_WEEKEND, IS_HOLIDAY, etc; also the date parts themselves in both numeric and character form (12, 'December'), day of week (5 [or 6, depending on convention], 'Friday'), etc) that are used for BI and reporting.

But since this generated ISO-8601-date-as-integer column is the primary key of the Dates table, it is also the value of the Date foreign key in all tables that reference Dates. That makes it incredibly handy in queries -- both during development and for ad-hoc reports -- of those tables without joining to the Dates dimension at all: grouping, sorting, limiting to a more manageable date interval in the WHERE clause... And it tells the reader exactly what the actual date in question is. (Well, at least readers who are used to ISO-8601-format dates.) As jerryp would have said, recommended.