top | item 24730713

A Short Story About SQL’s Biggest Rival

256 points| huy | 5 years ago |holistics.io

123 comments

order

alquemist|5 years ago

> … The language (SQL) is not very composable. This is a fact that most SQL users are not aware of. The relational algebra that SQL is based on is absolutely composable but SQL is not due to the inherent limitation of the language (as it was designed to be natural language-like). When you write "select x from a where z", you are actually building something along the lines of "from a" => "where z" => "select x" in the algebra and you can actually compose each portion separately. If you are familiar with dplyr, Spark or pandas you would get this instantly.

Hmmm. Not a big lover of SQL, but this is a bit imprecise. While the unit of composability is slightly smaller for from=>where=>select (expression) vs select+from+where (subquery), in practice they both encode the same fundamental compositional principles, based on relational algebra. Any from=>where=>select query can be translated into a select+where+from query almost 1:1, if only via:

    from t ::= select * from t
    q => where p ::= select * from q where p
    q => select xs ::= select xs from q 
Sometimes the select+where+from ends up more verbose, sometimes there is more brain twisting to grok a given select+where+from query, but that's not a composition limiting factor. Granted, the readability of SQL is sometimes lacking, but it is fully capable to compose recursive relational algebra queries.

contravariant|5 years ago

You're arguing about capability, whereas the post is arguing about the grammar itself. What you're describing is how to build a separate domain specific language that compiles to SQL (which there are quite a few of; e.g. C#'s Linq).

Quite a few query languages are equally capable (including, surprisingly, quite a few so-called graph languages, provided the SQL dialect provides a transitive closure), but SQL as a language has some undesirable properties (most of which are trade-offs for the fact that basic SQL is very easy to parse).

didibus|5 years ago

This argues that composability is the most important consideration of a domain specific language. But I think, as proved by SQL taking over, the UX is more important. Any programming language must consider the programmer and its humanity and natural way of thinking and reasoning to win in getting the most adoption and mindshare. Usability does matter as the user of any programming language is a human being.

Was QUEL also more usable and natural to people? As well as being more amenable to composition?

asah|5 years ago

Subqueries, named VIEWs, CTEs, etc all make SQL compostable ?

jhallenworld|5 years ago

I don't like either syntax. Wikipedia has this example:

QUEL:

    range of E is EMPLOYEE
    retrieve into W
    (COMP = E.Salary / (E.Age - 18))
    where E.Name = "Jones"
SQL:

    select (e.salary / (e.age - 18)) as comp
    from employee as e
    where e.name = "Jones"
I would prefer an operator syntax that directly mimics relational algebra. Something like:

    w = employee(name == "Jones")[comp = salary / (age - 18)]
So () is "where", [] is "project" (choose or create columns) and you can use * for join and + for union. The result is a table with a column named comp.

roenxi|5 years ago

They should implement something using straight functions, extremely spartan with no special syntax at all, and let everyone build their own favoured DSL over the top.

One of my major complaints about SQL is the syntax is so finicky that it is really hard to replace it with a [something -> sql] layer, because the something layer can't generate all the silly syntactic forms that SQL uses.

Eg, personal favourite, it is easy to have a dsl that translates

  select(y = fn(x)) -> select fn(x) as y
that then breaks down because it can't construct

  ??? -> select extract(month from x) as y
and that is the only syntax the SQL database decided to understand. There are too many cases like that that need special handling, especially once SQL dialect-specific stuff comes into play.

prostodata|5 years ago

It should be noted that the expression like

    comp = salary / (age - 18)
is (strictly speaking) not part of the relational algebra because it is not using set operations (like join or union). It was added to the relational model because we hardly can process data without such expressions.

A better way to formally describe such calculated columns is to introduce functions and treat them as first class elements of the data model. In particular, function operations are better than set operations in these cases [1]:

- Calculating data using calculated columns. We describe a calculated column as a function without generating new relations (as opposed to select-from)

- Aggregating data. We can add an aggregate column without generating new relations (as opposed to groupby)

- Linking data. We can add derived link columns without generating new relations (as opposed to join)

This function-based approach to data modeling and data processing was implemented in Prosto [2] which demonstrates how many typical relational tasks can be solved using functions.

[1] Why functions and column-orientation? https://prosto.readthedocs.io/en/latest/text/why.html

[2] Functions matter! No join-groupby, No map-reduce. https://github.com/prostodata/prosto

_eigenfoo|5 years ago

One major strength of SQL is its readability - it reads so much like English that a non-technical stakeholder could conceivably understand queries. Do you not find that this is a valuable thing that's lost with relational-algebra-esque syntax?

amw-zero|5 years ago

Relational algebra notation is so much more concise and readable. We’ve mangled so many very simple concepts because someone decided that math is offensive or something, so any math notation is automatically “not fit for software.”

Computing is inherently tied to math. There’s no getting away from that.

sergeykish|5 years ago

Unbiased samples should match typography

QUEL:

    range of e is employee
    retrieve into w (comp = e.salary / (e.age - 18))
    where e.name = "Jones"
SQL:

    select
    (E.Salary / (E.Age - 18)) as COMP
    from EMPLOYEE as E
    where E.Name = "Jones"

Now QUEL looks like modern language while SQL is jarred mess.

boramalper|5 years ago

I would reorder round and square brackets, since I may want to filter on computed/created columns, and the ordering makes it clearer.

brian_herman|5 years ago

That sounds like a cool language.

me2i81|5 years ago

In my first job I used Ingres/Quel. Probably because of that, I still find SQL hideous. Quel was a lot more orthogonal and clean, but by now SQL has so many more features that they're not really comparable. The first version of Ingres ran on a PDP-11/70, and the different components (parser, query optimizer, query executor, etc.) ran in separate processes connected via pipes (this was pre-socket Berkeley Unix) because each process could only be 64K 16-bit words. It was hideously slow--ran a lot faster once it was ported to the VAX and everything could run in one process. INGRES originally stood for something like Interactive Graphics Retrieval System, IIRC because the original funding agency wanted a graphics database. Stonebraker wanted to build a relational DBMS so he just went ahead and did it, but gave it a grant-compliant name and wrote some bullshit about graphics to make the funding agency happy.

jp0d|5 years ago

I agree with some of the comments about composabilty of SQL. I've been doing SQL since more than 14 years. Most of it was spent working on ETL projects for Finance and Utilities industries. Even today 80% of the code I write in pySpark is just plain SQL. It's been my bread and butter. However, I spend a lot of time trying to think about a solution in SQL. It's not an easy language to use when it comes to implementing complex transformations. I could write the same logic in Python in a lot less time. I use SQL mostly because it's easily portable across systems and most analysts and to some extent tech managers understand it. I work primarily on proof of concept data products and it does the job for that. Then a real developer takes over and implements it in .Net.

hn_throwaway_99|5 years ago

Minor point, and others have brought up more details around composability, but I think it's an absolute mistake to reference properties on an object before that object is declared. I.e. if SQL had the FROM clause before the SELECT clause autocomplete support could be much more intelligent about helping with SELECT columns.

Same problem with declaring imports in javascript. Python got it correct, where I write e.g. 'from somemodule import ...', so by the time I get to the import the parser has enough info to help with autocomplete. Javascript's 'import { Something } from "foo"' means the parser can't help me autocomplete Something.

earthboundkid|5 years ago

I find imports in Python completely backwards conceptually compared to JS. I hadn’t thought about the autocomplete issue though. In practice, I know what I want to import, I guess.

TehShrike|5 years ago

I'd been aware of Codd's relational theory, but never heard of QUEL – now I want to find a database engine that implements QUEL!

cameldrv|5 years ago

Postgres originally implemented QUEL, until Postgres95 which implemented SQL (later renamed to Postgresql)

runroader|5 years ago

As far as I'm aware only Ingres supports QUEL

Edit: In case you actually do want to play with it, Ingres is now Actian IngresX. Though I'd recommend thinking about what could have been instead of actually spending any time fighting with and configuring Ingres.

samatman|5 years ago

> In a real coup we hired a superb team from Xerox PARC.

Once again, California's absence of noncompetes plays a critical role in the success of a business.

timpark|5 years ago

Long ago, I worked for a company that had to convert its entire accounting system from QUEL to SQL. Fortunately, I was able to write a parser to find the queries and rewrite them, at least for whatever subset of the language they used. It's been a while, but I think there was an issue with multi-query transactions, so the program warned that you'd have to convert and/or verify some parts yourself, but fortunately there weren't too many of those.

geophile|5 years ago

Dupe: https://news.ycombinator.com/item?id=24709031

As I said the last time this came around: The end of this isn't quite right. The Postgres project started in 1986. I don't recall what language it used, QUEL perhaps, but it wasn't SQL. SQL support was added between 1994 and 1996, and that's when PostgreSQL was born.

wavesplash|5 years ago

Postgres used PostQUEL. SQL was added to Postgres by one of Stonebreaker's graduate students (Andrew Yu) around '95. By '95 Stonebreaker's lab was already working on a new distributed Postgres called Mariposa.

wenc|5 years ago

I don't see the contradiction you mention. Stonebraker returned to Berkeley in 1985. The article doesn't say the Postgres project started in 1985. It says Stonebraker started a post-Ingres project then. The query language at that time might have been either QUEL or POSTQUEL [1]

But the ending as it is written seems correct to me.

[1] https://dsf.berkeley.edu/papers/ERL-M85-95.pdf

cafard|5 years ago

"Mike Stonebraker of Ingres didn’t even bother to show up at the committee meeting to make the (quite strong) case for adopting QUEL because he was ideologically opposed to setting technology standards. It was the behavior of an intellectually arrogant academic rather than a prudent businessman protecting the interests of his company."

Some might call the behavior principled, rather than arrogant.

lisper|5 years ago

Those are not mutually exclusive. (One could add "foolish" and "short-sighted" to the list of potentially compatible adjectives.)

titanomachy|5 years ago

In retrospect, standards turned out to be rather important.

tannhaeuser|5 years ago

Had expected to read sth about Datalog, rivaling SQL at least in academic DB literature.

eternalban|5 years ago

Apparently it is not easy to create general purpose datalog engines that scale.

"In this paper, we started with the observation that Datalog engines do not translate across domains. We experimentally evaluated the advantages and disadvantages of existing techniques, and compared them with our own baseline, a general-purpose, parallel, in-memory Datalog solver (RecStep) built upon a rdbms.

"We presented the necessary optimizations and guidelines to achieve efficiency, and demonstrated that RecStep is scalable, applicable to a range of application domains, and is competitive with highly op- timized and specialized Datalog solvers."

vldb.org/pvldb/vol12/p695-fan.pdf (2019)

dreamcompiler|5 years ago

Same here. Datalog is superior to both, but alas.

ilaksh|5 years ago

This is a fundamental difference that I have with almost all of humanity.

People do not know the difference between popularity and merit. They don't realize that the reason we do things is because that is how we do things. And they put a lot of effort into rationalizing the way we do things, without realizing the subconscious psychological (rather than rational) basis for that.

Most people fundamentally are generally unable to question assumptions about technology or how the world works.

This is one reason why, even though I am rooting for the human species, I am doubtful we will be able to stay relevant for long as autonomous general machine intelligence is built and deployed.

Even with extensive augmentation, it's obvious that there are just severe limitations to the human mind.

The nice thing is that we have the opportunity to design successors that will not be limited in so many ways.

mamcx|5 years ago

I'm working in a relational language that if it work as I wish, could eventually be put on top of a RDBM like sqlite or layer for other DBs:

https://github.com/Tablam/TablaM

It have ideas similar to QUEL...

prostodata|5 years ago

Just for comparison, a functional approach is a major alternative to relational and set-oriented models and query languages. The difference is that functions and operations with functions are first class elements of the model. One version of it is implemented in this project:

https://github.com/prostodata/prosto - Functions matter! No join-groupby, No map-reduce

nl|5 years ago

Composition might be desirable but in the 1980s and early 90s it didn't really matter because the databases of the day weren't powerful enough to do the multiple table joins where composition is useful with enough speed to be usable.

DBAs of the day would spend a long time optimising physical storage of tables and building aggregation tables to make up for this performance deficit.

exabrial|5 years ago

Really the the only issue I have with SQL is NULL != NULL. This creates an impedance mismatch with most languages...

MySQL sort of solves this problem with a <=> operator, which I wish was the default for ORMs to use.

There are a lot of other minor nitpicks but a lot of criticisms come down to the actual RDMS not SQL itself.

prostodata|5 years ago

Here are some major issues [1]:

SQL’s shortcomings can be grouped into these categories:

- lack of proper orthogonality — SQL is hard to compose;

- lack of compactness — SQL is a large language;

- lack of consistency — SQL is inconsistent in syntax and semantics;

- poor system cohesion — SQL does not integrate well enough with application languages and protocols.

[1] We Can Do Better Than SQL: https://www.edgedb.com/blog/we-can-do-better-than-sql/

wenc|5 years ago

I consider myself a fan of SQL, but my nitpick is more around named calculated columns. For instance:

    SELECT AVG(col1) OVER (PARTITION BY col2) AS partcol1
    FROM tbl
    WHERE AVG(col1) OVER (PARTITION BY col2) > 10.0
I wish I could just do:

    SELECT AVG(col1) OVER (PARTITION BY col2) AS partcol1
    FROM tbl
    WHERE partcol1 > 10.0
But I can't, because the WHERE clause is processed before the SELECT clause.

So if I have a bunch of these and want a convenient way to work with the named columns, I have to wrap them into a common table expression:

    WITH cte AS (
        SELECT AVG(col1) OVER (PARTITION BY col2) AS partcol1
        FROM tbl
    )
    SELECT partcol1
    FROM cte
    WHERE partcol1 > 10.0
So wordy.

setr|5 years ago

My bigger concern regarding NULLs is that its a ternary logic shoved into a binary logic system, and it all invisibly becomes nonsense when your dataset has NULLs in it, and you don't explicitly address it

WHERE col1 > col2

is wrong, and it'll break in terrible ways

and in the face of negation + NULLs, everything falls apart[0], giving you both false positive and false negatives in your answerset, and everything will look perfectly fine.

Every column being made NOT NULL is the only sane solution.

[0] https://databasetheory.org/sites/default/files/2018-05/03_pr...

sumtechguy|5 years ago

I just wish they had flipped from and select around.

jholman|5 years ago

Next you're going to complain about NaN != NaN.

castorp|5 years ago

The SQL standard also defined an operator for that: IS DISTINCT FROM

nendroid|5 years ago

What about the main point of the article that SQL is not composable.

jrochkind1|5 years ago

can anyone find examples of what QUEL looked like?

If we had a more composable query language being used instead of SQL, I wonder if that would have effected the course of ORMs, which arguably end up being as much about composable models of queries as they do about actual object mapping.

GordonS|5 years ago

I kept searching the article, thinking I must have somehow missed the examples - nope! How very odd to wrote an article about how much better QUEL was than SQL, and not have a single example of either!

hackerfromthefu|5 years ago

Would anyone familiar with both LINQ and QUEL be able to comment on similarities and differences?

somurzakov|5 years ago

SQL is perfect. if you want composability of SQL, just do SQL code-generation with your bare hands and compose whatever and however you want

brian_herman|5 years ago

I want him to continue telling the Postgres Story.

unnouinceput|5 years ago

Quote: "Of course, there was a silver lining to the whole saga. Stonebraker had forked the Ingres codebase in 1982 to create his company. Defeated by the bruising database wars of the 80s, he returned to Berkeley in 1985, and started a post-Ingres database project. Naturally, he named that database post-gres — as in, after Ingres.

And thus PostgreSQL was born."

And 35 years later PostgreSQL is kicking Oracle's butt at every corner.

jgalt212|5 years ago

> technically superior alternatives like Dvorak and Esperanto would have taken over.

The best thing about English is the lack of accent marks. Makes each glyph unique (other than casing). Sorts are faster and not ambiguous.

samatman|5 years ago

It's also untrue.

It's good, for some purposes, that English can be written conventionally by ignoring the accent in words such as résumé. But there are plenty of contexts, and I would say most of them, where this is going to bite you.

eecc|5 years ago

are there any usable implementations of QUEL in the wild?

kmeisthax|5 years ago

>If the world worked differently, we wouldn’t still be writing on QWERTY keyboards, or speaking English; technically superior alternatives like Dvorak and Esperanto would have taken over.

Bad metaphor: There is no evidence for Dvorak's technical superiority to QWERTY, neither is there for Esperanto over other languages.

cbsmith|5 years ago

I think the "technically superior" bit wasn't really the right choice of words.

Esperanto isn't intended to be superior. It's value is on it being equally foreign yet approachable for all the salient parties and therefore a conceivable acceptable neutral turf for everyone to share.

Ironically, the case for Dvorak keyboards is kind of the opposite: QWERTY was intentionally designed to avoid jams, which if anything biased towards making it more difficult.

It's not so much technical superiority as having a design objective that is more appropriate for the problem space.

One can similarly argue about whether "QUEL" is really technically superior to "SQL", but the design objective is (at least as perceived by the author) better aligned with the solution space.

nine_k|5 years ago

Esperanto is similar to a bunch of Roman languages widespread in Europe. One can see it as an attempt to collect them back into a common "Latin 2.0".

Outside Western Europe, Esperanto makes rather little sense. It's highly regular, which is nice — but, say, Japanese is also highly regular.

nendroid|5 years ago

The article talks about how SQL lacks composability. I would like to know everyones thoughts about this.

This is a huge issue with programming in general not exclusive to SQL. Everyone would like to build programs that are modular and reusable but programming paradigms have been traveling in directions that prevent this from happening. Many people turn to design patterns or microservices to try to deal with this organizational issue but they fail to see that the lower level programming paradigm itself is the precursor to the problem.

In SQL the problem occurs in the statement itself. The WHERE clause or the SELECT clause cannot be reused anywhere else. I can't modularize a where clause and put it in another SQL statement. I have to rewrite the entire clause to reuse it.

In OOP the same issue occurs. In OOP your class tends to contain methods that are not combinators, or in other words methods that modify a free variable. Due to this like the SQL expression, Objects cannot be decomposed either. I cannot reuse a setter in another class or anywhere else outside of the context of the free variable it modifies.

In both cases there comes a time in the future of an application where programmers realize that similar logic could be reused but structural problems are preventing the reuse from happening so they have to implement a hack to get around it.

The issue is that everyone is unaware of this trend at a low level. They are unaware that SQL lacks composability just like how they are unaware that OOP lacks composability as well. But they are aware of this issue at a higher level and they tend to call it "technical debt" or some high level design problem.

Case in point: https://news.ycombinator.com/item?id=24732789

Most commenters above talk about minor syntactical issues and fail to address what is not only IMO the main issue, but the main issue that the article itself is addressing. Likely because they're all unaware of the true nature of the composability issue and just didn't completely understand what the article was saying.

Also note that when I talk about composition in OOP I am not talking about "object composition." These are completely different usages of the word.

bawolff|5 years ago

I don't really feel like composability/modularity is all that important in SQL.

I want modularity in programs because they are large, and without proper abstraction, impossible to manage.

SQL queries are generally really short. Rarely more than a few lines (might be different for people doing ad-hoc analysis instead of making a db backed application). I don't need modularity for a program that is only a few lines long.

The lack of really natural integration into modern day programming languages (and data model mismatches) is a much bigger issue imo

magicalhippo|5 years ago

I've just learned SQL by doing. You're not including extensions here like stored procs and such right?

I have a lot of "where my_func(x, y, z) = 1" type where clauses, so seem that would do what you say, no?

cafard|5 years ago

Is this really a SQL feature or a relational feature? The essence of the relational model is that names are known up front: an attribute of one relation is not an attribute of another.

zem|5 years ago

mixins are a good first step towards composability in OOP (also doesn't javascript have rebindable methods? i'm not very familiar with the language, but i think it makes it pretty easy to bind a function to a new object as long as it contains the same members that the function refers to)