We Can Do Better Than SQL

[+] tqi|5 years ago|reply

It's pretty arrogant to complain about the syntax being inconsistent across versions and databases and then present your own weird offshoot, as if every other version wasn't introduced for the exact same reason with the exact same lofty delusions of grandeur...

SQL is messy because describing the underlying data relationships are messy. The orthogonality example is a great illustration of this. What exactly should the result be if there are multiple dept heads? Should the result rows be duplicated? It's not clear how edgeQl would handle either (their edgQl orthogonality examples were constructed to only have result per sub query), but it seems like they would be kept together as sets. In that case, the result set is no longer a table, it's a dataframe, which is a useful data structure but is also not what relational databases do.

[+] roenxi|5 years ago|reply

> SQL is messy because describing the underlying data relationships are messy.

> SELECT extract(day from timestamp '2001-02-16 20:38:40');

SQL is messy because all the syntax was decided on before there was a community that really understood what good syntax is. The 'from' in that extract does nothing and I can't easily identify if extract is a function or some sort of crazy parsing construct - what are the arguments? Is "day from timestamp '2001-02-16 20:38:40'" the argument? Are the arguments "day", "timestamp" and "'2001-02-16 20:38:40'"?

Are these annoyances crippling? Yes. Yes they are crippling. It should be possible for an amateur to quickly write an SQL validator as a starting project; relational algebra is not complicated. Any fool can write a validator for lisp. Relational algebra isn't that much more complicated - we don't have loops or flow control to contend with here.

Tidyverse's dplyr [0] implements the relational model for real dirty data and, as might be expected for something implemented this century, does a much better job than SQL. Not because the operations are that much different (although gather() & spread() are welcome additions) but through ingenious innovations like, as mentioned, functions having arguments instead of I-don't-even-know-what-that-is.

And the pipe operator which is legitimately ingenious. Great operator for data.

[0] https://dplyr.tidyverse.org/reference/index.html

[+] goto11|5 years ago|reply

> SQL is messy because describing the underlying data relationships are messy.

No, the relational model is beautiful and consistent! SQL is messy because the syntax is not consistent and elegantly composable. It could have those properties and still present the same underlying data relationships.

See Linq in C# as an example for how a more composable query syntax can expose the same data model.

For example in Linq you can chain arbitrary many select/join/where/group by in arbitrary order. In SQL you need nested subqueries to achieve the same which is a much more convoluted syntax.

[+] 013a|5 years ago|reply

A very simple, basic SQL query would be something like "select * from users where foo=bar;"

Already, we're introducing a weird inversion of syntax that, in my experience, trips up people learning it: data in SQL is stored as "rows" with "columns" inside "tables". More formally, we've got a hierarchical relationship where Tables > Rows > Columns, yet we write the query as Columns > Table > Rows.

There are far more consistent and beautiful querying languages than SQL: I would point to MongoDB's query language, which is less of a query language and more of a static javascript-interpretable library, but is still far easier to learn and more consistent than SQL. The same query in MongoDB: "db.users.find({ foo: "bar" });". How is this better? It embeds the operation in the statement ("find"); reading it hierarchically follows how the data is stored (Collection > Rows); the filtering operation is the same shape as the data being stored; and it naturally disallows most injection attacks.

[+] sfvisser|5 years ago|reply

> SQL is messy because describing the underlying data relationships are messy.

Not sure that is true. You can imagine having a smaller query language with clean semantics able to capture messy data relationships. Small functional expression languages come to mind. Where it lies on the spectrum from purely-table-form to can-hold-everything is a design choice.

Your query language doesn’t need to capture the underlying data model exactly. SQL is an example of this itself.

I’ve run into the orthogonality issue myself several times. You don’t need to drop SQL today, but for some data domains more expressive query models can work really well.

[+] Areading314|5 years ago|reply

Right, it's like criticizing python, or English, for being inconsistent, or "large". Turns out that doesn't matter -- what matters is that the language is useful because it has a wide base of users and libraries, just like SQL does.

[+] asdfasgasdgasdg|5 years ago|reply

SQL is messy because describing the underlying data relationships are messy.

The article presents several examples where SQL's messiness cannot be plausibly attributed to underlying data relationship messiness.

[+] wtetzner|5 years ago|reply

I think the real problem is that SQL text is the only interface to most relational databases. There is a SQL standard, of course, but it's difficult to get vendors to implement the standard, because SQL is the user interface to the database.

Instead, perhaps a second format should be standardized, which is machine readable/writable, that more or less represents relational algebra + whatever extra features SQL supports. It can be clunky and verbose, as long as it's straight-forward and easily composable. It would effectively be like a compiler intermediate representation.

Once you have something like that, you can have as many front-ends as you want with whatever syntax you prefer.

Of course, getting all of the RDBMS vendors to agree to it is still a problem, and they'll probably all still include their own vendor-specific extensions and differences, because they want to lock you in to their system.

But at least maybe the open source ones could agree, which would still be quite beneficial.

[+] IshKebab|5 years ago|reply

I don't see how that makes them arrogant. SQL's implementation inconsistencies are a real problem. Their query language doesn't claim to be another version of SQL - it's clearly very different.

Also that is only one of the things they are hoping to fix. It sounds like you're saying "We have 5 variants of SQL already so nobody is allowed to write any other database query languages ever. We must use SQL forever." which is stupid.

[+] gwbas1c|5 years ago|reply

In a world where there's a new programming language every other month, what's wrong with suggesting a new query language?

At least we don't have to change our whole data model or give up consistent data to try it!

[+] joshsyn|5 years ago|reply

Most relationships, I have encountered are just one-many, many-one, many-mane, one-one, indirect, direct or graphical in nature.

You want flat result, aggregated result. Why does this have to be blamed on the data relationship and not SQL language itself?

For any graphical type of relationships, I find SQL utterly hard to express my queries properly. It feels like a assembly language at that point.

[+] xiaodai|5 years ago|reply

yeah the edgeql syntax just looks messy and the doc doesn't even explain how to do a group by

[+] bjo590|5 years ago|reply

Am I the only full stack dev that likes SQL?

SQL is an incredibly expressive and flexible way to read, store, and update data. It's ubiquitous, so the SQL skills I learned six jobs and three industries ago are still relevant and useful to me today. Relational Databases and SQL are heavy lifters that I often relay upon to build projects and get things done.

[+] mathgladiator|5 years ago|reply

No, I'm with you and prefer SQL for many tasks.

SQL got a bad rap in many ways due to security issues, databases in general, and "web-scale".

SQL as a language within other languages is a nightmare from a security standpoint, and if language integrated query was more common across languages earlier on then this wouldn't have been an issue.

Databases generally depend on normalization, but normalization comes with interesting scaling problems and how do you replicate normalized schemas. Thus denormalization became a thing, and then the emergence of NoSQL and document stores started to infect everywhere. The JOIN was a killer too, and then the discipline required to do sharding made it annoying to manage, so easier to manage solutions became a thing.

I'm looking at databases in a different light these days with more appreciation, but now the hot new thing is GraphQL makes things... interesting. I don't view GraphQL as a server-side solution, but a client solution to overcome the limits of HTTP/1.1. However GraphQL clients are exceptionally complicated, and I'm not sure they are worth it. The only problem is that to overcome them requires engineers "to know how to do things", but that is a hostile stance. People want to go fast and make progress, and GraphQL enables that.

[+] smabie|5 years ago|reply

Do you love SQL, or do you love relational algebra? Because actual SQL, the language, is pretty shitty.

Perhaps the best querying language I've ever used is Q-SQL, integrated into kdb+/q. Unlike SQL, it's actually part of the language (q/k) and, most importantly, it's modular and more expressive than SQL.

If you're interested in how we can do a lot better than sending strings to remote databases using an inexpressive and non-turing complete language, check it out: https://code.kx.com/q4m3/9_Queries_q-sql/

[+] piva00|5 years ago|reply

Same for me, even more after venturing into the NoSQL hype of the early 2010s, I implemented pretty successful systems using non-relational databases but over time I stick to a RDBMS as most and as far as I can.

Not only I prefer to work with SQL nowadays I also prefer SQL over any ORM in older codebases, ORMs are pretty useful for getting up to speed without caring about your persistence layer too much but after 17 years in this industry I've had my fair share of issues with ORMs to avoid them whenever I can.

Native SQL queries with placeholders for my parameters in their own files, loaded by my database driver to execute and return data is my go-to solution for data access, it's flexible, maintainable and readable if you treat SQL as your normal code (code reviews, quality standards, etc.).

[+] igeligel_dev|5 years ago|reply

Same for me. I love it. I always avoided the NoSQL things so far since the use-case is mostly unstructured data in comparison to normal SQL databases. I think what most engineers struggle with is just thinking of rows and columns as data. It is a separate thing to create schemas, work with them, inserting data, manipulating data and reading data. These are all different skills. Creating the proper queries is like learning an own language. Most programming languages come up with their own ORMs like the LINQ style of C# (EntityFramework) or all other ORMs like TypeORM for TypeScript, the Django ORM or Hibernate. So overall, to learn how to handle data you have to understand SQL basically + the abstraction layer. Sounds a lot harder than having data in a simple object.

Personally, I was lucky because I had great courses, even in highschool, regarding SQL including: How are rows working, what is normalization, how does it help with data and so on. So naturally I developed some feeling on how to handle database tables.

[+] goto11|5 years ago|reply

I love SQL but I still think we can do better! The strength of SQL is the underlying relational model and relational algebra. Syntax wise SQL is somewhat clunky.

Linq in .net shows IMHO how queries can be expressed in a more consistent and composable syntax while still conforming to the relational model.

[+] steve_adams_86|5 years ago|reply

The work I do lately tends to be missing an sql layer and yes, I miss it a lot. I loved organizing my data at that layer and having such powerful ways to query it. It felt like I could eternally find ways to optimize it, and I really enjoyed learning year after year.

Lately I use nosql for very light data and otherwise our API outputs heavily cached and extremely simple data. Adding a database as a middleman wouldn’t make sense. Still fun, but I miss Postgres!

[+] randomdata|5 years ago|reply

I like relational algebra. I dislike SQL. It harkens from a similar era as COBOL, and I dislike COBOL for procedural applications for the same reason. We've learned a lot about language design over the years and it is a shame that we've put very little effort into adopting new languages to address this particular problem space.

[+] sanderjd|5 years ago|reply

No, you're right, SQL is awesome. I mean, the syntax itself is pretty creaky, but it doesn't matter, the power to declaratively query structured data is the key. Everything that provides that power is awesome; SQL is just at the top of that heap in the breadth of its utility.

[+] mszcz|5 years ago|reply

Seconded. Every language has it's use and quirks. Does SQL get tricky in tricky situations? Sure. But what doesn't.

This piece reads like Joey being unable to open a carton of milk [1], "there's gotta be a better way!".

[1] https://www.youtube.com/watch?v=wwROPN3Fir8

[+] bni|5 years ago|reply

I really like the relational model, and I think its the best way to model data. Referential integrity is an awesome thing that removes the possibility for many bugs to exist.

As for SQL it has its warts, but im pragmatic when it comes to programming languages, like for example C, JavaScript, its also "ugly" but its often the best option anyway.

[+] Akronymus|5 years ago|reply

At work we almost exclusively use pure stored procedures[1] and everything is normalized very well. It is an absolute joy to write SQL, because of how terse it is while still being very readable.

Trying to implement business rules about data relations outside of the DB is a nightmare.

[1] We use dynamic SQL within stored procedures for pivots.

[+] quickthrower2|5 years ago|reply

I love SQL. I'd love someone to make it better for complex queries. Have you seen the enterprise SQL monstrosities. Why do we have ORMs if SQL is perfect?

[+] michaelchisari|5 years ago|reply

Same here. Every experience I've had in the alternatives reminded me why SQL was superior.

[+] agumonkey|5 years ago|reply

how many devs accept it for what it is, limits included ?

it's alright, it's consistent enough, it's good

I'm reading about datalog and prolog more and more but sql is ok

[+] stonemetal12|5 years ago|reply

Do you like relational algebra, SQL the language, or both?

To me relational algebra is the beauty queen, and SQL is the "beauty mark" that prevents it from reaching perfection.

[+] michaelcampbell|5 years ago|reply

I use it a lot, because I know it fairly well. Not sure I "like" it; it's like a lot of food - I know what of the food varieties I prefer over others, even when I didn't immediately "like" any of them. More of a "got used to" type of thing.

[+] aspyct|5 years ago|reply

Totally with you, long live SQL!

[+] etxm|5 years ago|reply

Team SQL. There have been few data models Ive encountered that I couldn’t express with some functionality of Postgres.

[+] throwaway_pdp09|5 years ago|reply

What you (and I) like is not SQL per se but what comes through of the underlying relational model, that SQL hasn't screwed up. SQL as a syntax and semantics is a mess. It could have been better.

[+] ckocagil|5 years ago|reply

I'm not impressed for two reasons:

1. Anyone striving to build a better SQL should make a comprehensive list of common (but difficult!) database tasks for OLTP and OLAP workloads. This will expose the weakness of their language. SQL has had 50 years and myriads of improvements to cover all these common cases. This is not a fair fight, so come prepared.

2. It's not enough to be just "better than SQL" to replace it. SQL has such a huge momentum that a new language needs to be absolutely better _and_ it should have many features that SQL cannot possibly have. My nice-to-have list would contain predictable performance, lock ordering, ownership relations (for easy data cleanup), and a standard low level language which the query optimizer would output.

[+] nealabq|5 years ago|reply

QUEL ( https://en.wikipedia.org/wiki/QUEL_query_languages ), the original query-language for Ingres, was more orthogonal and consistent than SQL. But IBM decided SQL was more business friendly. Who can argue with that.

And before that there was ALPHA. From https://www.labouseur.com/courses/db/s2-Remembering-Codd-2.p... :

"Ted [Codd] also saw the potential of using predicate logic as a foundation for a database language. He discussed this possibility briefly in his 1969 and 1970 papers, and then, using the predicate logic idea as a basis, went on to describe in detail what was probably the very first relational language to be defined, Data Sublanguage ALPHA, in “A Data Base Sublanguage Founded on the Relational Calculus,” Proc. 1971 ACM SIGFIDET Workshop on Data Description, Access and Control, San Diego, Calif. (November 1971). ALPHA as such was never implemented, but it was extremely influential on certain other languages that were, including in particular the Ingres language QUEL and (to a lesser extent) SQL as well."

[+] tristanz|5 years ago|reply

There's a lot of negativity here for understandable reasons given the success of SQL empirically. But I'd encourage everybody to read the home page https://edgedb.com/. This project is not trying to replace SQL as its primary goal, it's trying to build a data modeling and query interface on top of Postgres that meshes well with modern applications that have hierarchical data akin to what you'd model with GraphQL. Major kudos for trying to rethink and improve upon things -- I'm sure it wouldn't be too hard to stick an SQL backdoor in there for the laggards too ;).

[+] gregjor|5 years ago|reply

Would be nice, but bazillions of lines of SQL at the core of almost every business system make this as likely as “We can do better than five fingers.”

The article does nicely illustrate many of the well-known shortcomings of SQL. Chris Date and Hugh Darwen unsuccessfully tried to fix SQL with Tutorial D. Never heard of it? Exactly.

[+] exabrial|5 years ago|reply

Honestly I think SQL is pretty easy. I love it. I can teach the basics to a new person in minutes.

You know what we could do better at? Crappy explains from database engines. Crappy rate limiting capabilities. Poor feedback on keep cache pipelines fed during scans. Poor feedback on column size effects on reading stripes from disk and size alignments between the filesystem and database.

[+] grandinj|5 years ago|reply

SQL was designed by COBOL / NATURAL people and was intended to be used by "normal people". That was part of a larger movement at the time to make programming more accessible. Unfortunately (a) that experiment mostly failed (b) the resulting syntax does not compose well at all (c) the syntax does not represent the underlying semantics and mathematical operations well at all.

The combined effect is a rather tortured language, as it has been extended over the years.

However, replacing it is equally problematic because of the huge installed base.

[+] dman|5 years ago|reply

One word of caution to those trying to improve on SQL - for the many users of SQL, technology is a secondary aspect of their jobs. Hence any replacement runs into the issue that many of its core users do not have the bandwidth to spend significant effort on learning another querying language.

I will cheer everyone who tries to displace SQL, because I do think it needs to be displaced but would also want to caution such people on the magnitude of the task ahead of them.

[+] seer|5 years ago|reply

I for one would welcome a new alternative to sql. It might not be _this_ alternative, but why not try.

SQL is very hard to learn properly, with all of its gotchas and inconsistencies. There are running jokes for noobs truncating their tables due to forgetting a where clause. I’ve seen junior devs crying in tears and throwing their mice just because they needed to debug / optimise a complex query.

The mare existence of all the ORMs is a testament that people would opt to write (or use) insanely complex pieces of software just so they don’t have to deal with the lack of composition and ease of use.

All of those look to me as signs that something wrong with the core itself. We could do better.

If we settled for good enough in all cases we wouldn’t have Go or Rust, React or Postgres. In fact every software that we have is a product of someone thinking “this is hard/wasteful/unexpressive/etc, lets write an alternative”, SQL included.

This alternative looks quite promising. We can wait to see how they can handle the edge cases, but the core looks a lot simpler to deal with than regular SQL.

[+] julochrobak|5 years ago|reply

I am a big fan of a relational model. SQL itself is OK but far from great. So, I wish you a lot of success!

I was part of a similar attempt - building a better "SQL" and relational DB. This was roughly 8 years a go. You can have a look at our GitHub Projects or look at some further links and may be you get inspired :)

* http://bandilab.github.io/ - introduction to the bandicoot project

* https://www.infoq.com/presentations/Bandicoot/ - presentation of the Bandicoot language on

* https://github.com/ostap/comp - another interesting attempt, a query language based on a list comprehension

[+] jayd16|5 years ago|reply

I'm not convinced by a lot of the comments here focusing on SQL as an irreplaceable juggernaut. Currently I think Postgres is just fantastic but recent DBs have shown that even SQL can bleed.

There's some neat stuff here and I hope the project well. I would love to see object/hierarchical result set support grow. SQL ORMs feel so kludgy.

[+] barrkel|5 years ago|reply

SQL has a lot of incidental, accidental complexity, no doubt.

Though when I reason about SQL, I think mostly in terms of functional operators over streams of data: projection, filtering, flat-map, join, fold/reduce. Obviously optimization means looking through streams and seeing tables to find indexes etc., but once you get to the execution plan, you're firmly in a concrete world of data flow and streams of tuples.

I didn't get on well with the example syntax in this write-up. It didn't mesh better with my mental model of relational algebra either at the logical or physical execution level - and the truth is you need a foot in both worlds to write good scalable SQL today.

Aside from the complexities of dynamic construction, my biggest problem with SQL is modal changes in query plans, owing to how declarative it is. It's a two-edged sword: the smart planner is great, up until it's stupid. And it usually turns stupid based on index statistics in production at random times.

[+] Flimm|5 years ago|reply

Yes, SQL has flaws, and the article forgot to mention one of them: you need to build a string to build an SQL query, rather than a more structured object, leading to flaws like SQL injection vulnerabilities, and difficulties adjusting the query.

Let's say you're building a CRUD app with search and filtering capabilities. Unless you are using an ORM (which has problems of its own), you might be tempted to build the SQL query string like this:

  conditions = " AND ".join(filter_key + " = '" + filter_value + "'" for kilter_key, filter_value in filter.items())
  order_by = column_name + " DESC"
  query = "SELECT col1 FROM tablename WHERE " + conditions + " ORDER BY " + order_by

But this has multiple SQL injection vulnerabilities. Doing it correctly is not just a matter of using SQL parameters, because column names need to be escaped differently than string literals. Linters can't distinguish between correctly escaped queries and incorrectly escape queries in non-trivial cases. Also, the query will throw a syntax error if the number of filters is zero, since you can't have an empty WHERE clause.

I don't think a new query language solves this problem.

[+] cmrdporcupine|5 years ago|reply

SQL's fundamental misstep was to try to present the relational model in a form they thought was more comprehensible by a) using an "English-like" language and b) renaming core relational concepts (relation -> table/view, tuple -> column, etc.) It has only muddied the waters and led to all sorts of misunderstandings and complaints ("but my data isn't tabular!")

The "English-like" syntax means that what is actually happening is obscured (so many misunderstandings of what "selection" is, for example), and it means that composing multiple operations gets very awkward and hard to read and in fact many things that the relational algebra itself permits are not really expressable.

And renaming core concepts means people means people get confused. They don't understand what the "relation" in relational is, and think it's about relationships. They think SQL is all about tables, when tables are just one way of representing predicates. Etc. etc.

The relational model is a very elegant method for presenting facts about the world and then the relational algebra is a nice functional programming style system for slicing and dicing those facts into information in basically arbitrary and recomposable ways.

SQL has obscured that. It's awful.

[+] mr_toad|5 years ago|reply

> The NoSQL movement was born, in part, out of the frustration with the perceived stagnation and inadequacy of SQL databases.

I would dispute this. The antecedents of NoSQL were the parallel programming models of HPC. They weren’t specifically excluding SQL, and NoSQL was a term that was invented after the fact.

[+] cjf4|5 years ago|reply

I'm not sold.

I acknowledge that these are real issues, and commend the authors for attempting to address them. However, these issues rarely cause any real friction for me - I generally find SQL among the most ergonomic languages I use (regardless of dialect).

[+] xupybd|5 years ago|reply

Started reading believing the article would be utter nonsense but by the end was convinced they might be onto something. My hat goes off to them if they pull this off. It will take a lot to push SQL out of its stronghold.

[+] pgt|5 years ago|reply

Datalog is superior to SQL in every single way, except that very few people know it: http://www.learndatalogtoday.org/

[+] jugg1es|5 years ago|reply

SQL is definitely not perfect but it's supported everywhere and integrates with everything, so you can augment it with other languages. That fact makes a lot of the critiques sort of moot.

461 comments