SQL:2023 is finished: Here is what's new

[+] burakemir|2 years ago|reply

In the end, SQL is what people are familiar with (for better or worse) and what is entrenched in some form in most database, and new versions of the standard are unlikely to change anything fundamental.

I don't believe that adding property graph support to SQL makes much of a difference for SQL and relational databases...

However, the PG query additions (I believe they are close to the Cypher language neo4j) being part of the SQL standard will give a boost to marketing and adoption of PG databases. Vendors can now say "our language follows the SQL standard" instead of referencing cypher and indirectly neo4j. Even if neo4j may have the best intentions, marketing is very important in databases and it is unlikely that vendors could have supported Cypher and thus admit that they are following neo4j rather than leading.

Standards can thus be valuable regardless of whether they are adopted or not and a lot of discussion and collaboration was likely necessary to get to this point re: the PG additions.

In this context, it is also worth mentioning the GQL standard here, which neo4j folks are part of https://en.m.wikipedia.org/wiki/Graph_Query_Language

[+] hobofan|2 years ago|reply

Huh, how views can differ...

I think SQL/PGQ have the potential to outright kill the whole field of PG databases (if/once existing databases actually start implementing it).

The big marketing claim of Neo4J & friends has always been that they make graph queries possible, with a big portion of that claim being that those queries would be intractable for RDBMs performance-wise.

With SQL/PGQ it becomes quite apparent that there is next to no magic sauce in PG databases and that it's all just syntactic sugar on top of node/vertex tables. All the rest of their query plans looks 1:1 what you'll find in a RDBMs. Now that they are on a level playing field query syntax-wise, PG databases will have to compete against RDBMs with a lot more advanced query optimizers, better administrative tooling etc..

[+] timenova|2 years ago|reply

In relational databases, the property graph queries can be really effective to write authorization queries.

Whether user 'x' has access to a resource 'z' which is related via 'y', becomes a much simpler looking query.

[+] noobcoder|2 years ago|reply

Cypher has built-in support for traversing relationships between nodes in a graph, which can be more intuitive and efficient than using SQL to join tables.

[+] mastax|2 years ago|reply

Lots of nice stuff in here.

    SELECT t.j.foo.bar[2], ... FROM tbl t ...

I'm going to miss having this for years, probably.

[+] paulddraper|2 years ago|reply

Or.....you could just use PostgreSQL 14+ and have it today :)

[+] superdupershant|2 years ago|reply

Spark has something similar but you need to use a colon to as the first member. So something like:

  t.j:foo.bar

Hopefully all the different dialects can adopt this syntax.

https://docs.databricks.com/optimizations/semi-structured.ht...

[+] mikece|2 years ago|reply

Is there a competing query strategy, technology, or approach to compete with SQL? Yes, it gets the job done but it's always felt kludgy.

[+] starik36|2 years ago|reply

Can someone explain why the standard endorses this ambiguity:

CREATE TABLE t1 (a VARCHAR, b VARCHAR)

The length of `a` and `b` is left up to the implementers: "an implementation-defined default limit will be applied."

Why even allow ambiguity in the first place? This is just asking for problems.

[+] paulddraper|2 years ago|reply

IIRC SQL already allows default limits for `NUMERIC`, `DECIMAL`, `FLOAT`, and `TIMESTAMP`.

This is simply allowing it for `VARCHAR` as well. In a sense, making it more consistent :)

[+] paulryanrogers|2 years ago|reply

Likely as a way to ask for arbitrary text up to something smaller than TEXT.

The VARCHAR(000) form is already limited in its max by implementations.

[+] gaganyaan|2 years ago|reply

I don't have any special knowledge, but I would speculate that it's for approximately the same reasons C allows you to define a byte as 7 bits: Backwards compatibility that let everyone agree on a standard

[+] treeman79|2 years ago|reply

People don’t interchange databases that often.

[+] jbverschoor|2 years ago|reply

Why not directly allow each column to be specified as “NULLS DISTINCT”?

  unique (a null distinct, b)

> ORDER BY in grouped table

So that means you’re allowed to use values you haven’t grouped by and haven’t aggregated? Sounds like MySQL to me

[+] ww520|2 years ago|reply

For the ORDER BY in grouped table (F868), it's kind of weird. I guess it's a special case with narrow usage.

In the example, product_id is unique (as PK) along with product_code being unique. There's one product_id for each product_code. A Group By on product_id will produce a list of groups where each group has one product_id and one product_code. A subsequent Order By on the groups by the product_code would work. I assume if there're multiple product_code's per product_id, the query won't work.

    CREATE TABLE product (
        product_id int PRIMARY KEY,
        product_name varchar,
        product_code varchar UNIQUE
    );

    SELECT product.product_id, sum(product_part.num) ...
    GROUP BY product.product_id
    ORDER BY product.product_code;

[+] amluto|2 years ago|reply

Agreed. I read:

> it turned out that this was technically not allowed.

And thought “no, it turned out this query made no sense and had ill-defined results”.

But the rest of the description is odd:

> In particular, it was not allowed to have a grouped table ordered by a column that is not exposed by the SELECT list of the grouped table.

And I’m wondering if the example query was wrong and they meant:

    SELECT product.product_id, sum(product_part.num)
    FROM product JOIN product_part ON product.product_id = product_part.product_id
    GROUP BY product.product_id, product.product_code
    ORDER BY product.product_code;

In which case the change makes sense and is inoffensive.

[+] evanelias|2 years ago|reply

> you’re allowed to use values you haven’t grouped by and haven’t aggregated? Sounds like MySQL to me

The default settings of MySQL have forbidden this since MySQL 5.7, released in October of 2015: https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.ht...

[+] brabel|2 years ago|reply

Who comes up with an argument for considering nulls distinct??

Imagine a programming language where `if (variable == null)` always returned `false` because nulls were considered distinct. What kind of reasoning goes into thinking that's a great idea?

> If nulls are considered distinct, then having more than one of them won’t cause a unique constraint violation.

Gosh, I would've thought the spec was fine as it was, having to spell out that the DB should consider two null values equivalent for UNIQUENES constraints is like spelling out that "two integers that have the same mathematical value should be considered equivalent". I guess they need to add that in the spec if it's missing as well.

[+] SoftTalker|2 years ago|reply

Imparting any meaning at all to NULL including non uniqueness or distinction seems fraught with problems. NULL is NULL that’s it. It means nothing else.

[+] paulddraper|2 years ago|reply

2024 :)

[+] filoeleven|2 years ago|reply

SQL only hangs around because it was the best thing available at the time when DBs first exploded into common use. It’s pretty declarative, which is a point in its favor, but it’s not very amenable to query construction since it has AppleScript levels of “being like English.” Have fun writing something that can generate it while handling highly-adaptive and/or customizable requirements.

I’m not going to call Datalog the definitive answer, but it’s a step in the right direction.

[+] orthoxerox|2 years ago|reply

I would love the next revision of the standard to include QoL stuff like:

- SELECT column aliases being visible from WHERE and GROUP BY clauses, or even from the following expressions in the same SELECT clause

- JOIN other_table ON FK

- GROUP BY ALL

- SELECT * EXCEPT column

[+] roenxi|2 years ago|reply

The SQL Standard appears to me to be some sort of political tool, likely developed and used by Oracle. It is the most significant standard I know of that isn't publicly available.

Seriously, if you want a legal copy that costs something like 200 Swiss francs. This isn't how serious comp-sci standards work (the TCP people just throw out RFCs). Especially since the SQL standards committee probably don't know how to design a language given what most SQL implementations look like. I can't be sure though, I'll probably never be in a position to read the official standard.

Reflect on the madness. Postgres? Free. SQL Standard? 200 francs. Postgres makes a better standard than the SQL standard. The ISO standard thing isn't here to support people using or building SQL databases.

[+] smokel|2 years ago|reply

I sometimes fail to understand the disconnect in some of us here: you want six figure salaries, AND have standardization committees work for free (and all software and AI models should be provided for free as well).

This would all make sense if you were in favor of fat governments, but that does not seem to be the case either.

The entitlement baffles me. Or am I accidentally combining opinions of different people and making a misjudgment?

[+] niknetniko|2 years ago|reply

Of the top of my head, there are also:

- ISO 8601: dates and times

- ISO/IEC 9899: the C standard

- ISO/IEC 14882: the C++ standard

> This isn't how serious comp-sci standards work (the TCP people just throw out RFCs).

This isn't how most _web_ standards work.

[+] jaipilot747|2 years ago|reply

FWIW, the PDF spec also costs some crazy amount like that for something so ubiquitous.

[+] unknown|2 years ago|reply

[deleted]

[+] EdwardDiego|2 years ago|reply

It's how most standards work outside of comp-sci though.

[+] mistersys|2 years ago|reply

title is confusing.

SQL announces: 2023 is finished, on April 13th, 2023.

[+] rjbwork|2 years ago|reply

Cool. Let's see if Postgres will decide to implement this one. We're 12 years on from 2011 with no implementation.

[+] paulddraper|2 years ago|reply

It's practically ripped straight from PostresSQL documentation.

PostgreSQL already supports:

* ORDER BY in grouped table (F868)

* GREATEST and LEAST (T054)

* String padding functions (T055)

* Multi-character TRIM functions (T056)

* Optional string types maximum length (T081)

* Enhanced cycle mark values (T133)

* JSON data type (T801)

* Enhanced JSON data type (T802)

* String-based JSON (T803)

* SQL/JSON simplified accessor (T860–T864)

The upcoming release of PostgreSQL 16 will have:

* Non-decimal integer literals (T661)

* Underscores in numeric literals (T662)

PostgreSQL currently lacks:

* UNIQUE null treatment (F292)

* ANY_VALUE (T626)

* Hex integer literals in SQL/JSON path language (T840)

* SQL/JSON item methods (T865–T878)

* JSON comparison (T879–T882) - though the nonstandard JSONB has this

* Property Graph Queries

---

As with many standards, the SQL standard likes to include things that DBMSs already have in the wild and have found practically useful.

[+] yobert|2 years ago|reply

Some of this has already implemented in PostgreSQL 9.6! lpad / ltrim, for example.

[+] ksec|2 years ago|reply

I wonder if MySQL will have a new version for these features or if they will continue to milk version 8.

[+] lionkor|2 years ago|reply

Honest question; why use MySQL over postgresql?

[+] tofflos|2 years ago|reply

Would love to see a more composable UNION in the future.

Instead of: select * from a UNION select * from b UNION select * from c

I'd like to do: select * from UNION (a, b, c)

[+] orthoxerox|2 years ago|reply

If SQL ever standardizes that table_name alone is equivalent to SELECT * FROM table_name, you will be able to write: a UNION b UNION c

[+] hzia|2 years ago|reply

I think we need a transpiler that allows writing SQL 2023, while having specific backends for PostGres and MySQL. What babel / TS did for JS

This will make the standard process a lot easier, as community can try out new features first before actually building a standard.

It really feels like they are shooting in the dark, with major focus on features and light on the actual DX of devs using SQL day to day.

[+] wutania|2 years ago|reply

My fear with yet another transpiler is that we start working on layers of tech. I like Postgres for its directness w/ the DB. You query, you get it.

[+] jakewhelan|2 years ago|reply

Definitely agree! I've never taken a liking to SQL but using transpiler libraries has really improved the DX, and made me not dislike it so much.

[+] ellisv|2 years ago|reply

I can’t wait for Property Graph Queries

[+] go_prodev|2 years ago|reply

Any_Value is a strange addition. I'm struggling to think of a scenario where I need a value from a group of records, but don't care which one.

Why not just use min, max or even row_number to specify an order? If the value isn't important, do you really need to present it?

[+] Sonata|2 years ago|reply

Here a common example:

Given the following tables:

  CREATE TABLE person (
    id INT PRIMARY KEY,
    name VARCHAR NOT NULL
  );

  CREATE TABLE pet (
    id INT PRIMARY KEY,
    name VARCHAR NOT NULL,
    owner_id INT NOT NULL
  );

It is common to want to join them, like so:

  SELECT owner.id, owner.name, COUNT(pet.id) AS numberOfPets 
  FROM person AS owner
  LEFT OUTER JOIN pet ON owner.id = pet.owner_id
  GROUP BY owner.id

This doesn't work in standard SQL, because all columns in the SELECT list have to either be aggregated or included in the GROUP BY. owner.name is neither. That is a bit silly though because we know each result row will only have one unambiguous value for the owner name, since the GROUP BY is on a unique column from the same table as the owner name.

We can solve this with ANY_VALUE:

  SELECT owner.id, ANY_VALUE(owner.name) AS name, COUNT(pet.id) AS numberOfPets 
  FROM person AS owner
  LEFT OUTER JOIN pet ON owner.id = pet.owner_id
  GROUP BY owner.id

[+] tomtom1337|2 years ago|reply

We use it to select any value from an aggregated list of values which we know are identical. It requires fewer operations than e.g. max.

[+] EnergyAmy|2 years ago|reply

It's actually super useful if you've got a table with two columns like "sku" and "product_name", where the value in both columns has one unique value from the other column paired with it. You can group by sku, and then grab any value for product_name, since they're all the same. Using min/max/etc involves extra computations that you don't need to do.

You could group by the combination of sku+product_name, but it's kind of unwieldy. You could also normalize it to not have this issue, but when you're getting this data from someone else, there's not much you can do.

[+] phartenfeller|2 years ago|reply

I guess SQL is so widely used that there are always some niche cases. I guess it is useful if you just want to show one example value where there is no numerical order, and you want fetching this value to be superfast.

Like a movie database, and you want to show an example title for every genre.

But in a real-world scenario, I would want a popular movie, not literally any. If you have an Academy Award winners table, I guess, then everyone is popular.

[+] wizofaus|2 years ago|reply

Still no explicit support for enums... best thing about Postgres in the brief period I used it was that. Looking at tables full of integer values and trying to remember what each mean does everyone's head in IME.

[+] slim|2 years ago|reply

  ORDER BY in grouped table (F868)

Finally! This will let one have control over selected columns whose value would otherwise be randomly selected from the group

[+] gigatexal|2 years ago|reply

this whole nulls distinct v not distinct is a side-effect of the standard not being more explicit in what nulls really mean -- so to me adding this distinction in the unique constraint looks like a code smell if one could apply that term to standards.

[+] gigatexal|2 years ago|reply

whoa Property Graph Queries being added to the standard is huge imo

[+] hbarka|2 years ago|reply

Will this be part of the ANSI SQL standard?

153 comments