It's not you, it's SQL

commandlinefan|3 years ago

The article presents the move from the old, doddering SQL to the newer, better NoSQL - but the hierarchical database model actually _predate_ SQL. The first databases like IBM's IMS were hierarchical a lot like Mongo is: Codd was actually trying to address the problems with that model when he created the relational model.

jamwt|3 years ago

Article author, here. Good point -- but the article actually proposes document relational, not hierarchical. Relational is definitely good!

jd_mongodb|3 years ago

Comparing any modern database (i.e. developed after 1980) with IMS is like comparing a flint knife with a Barrett .50 calibre.

Justsignedup|3 years ago

My biggest issue with sql has and always will be the lack of definitions.

There's no way to express data structure knowledge in sql, only relations and keys. Discoverability can be quite lacking. And every time I want to join table A to B I have to re-define everything, because SQL doesn't store that.

ORMs help. They help because they encode relationships in meaningful ways. A has many Bs, so A.B works, and I don't need to repeat this join logic every damn time. But ORMs have down sides too. Sometimes they generate queries that are really sub-optimal.

I think my favorite ORM usage was with Hibernate. I KNOW!!! THAT DEVIL! But honestly we wrote SQL in hibernate, and then invoked it to populate our data. Yes it was a bit more work than RoR's "order.items.where("price > 40")" but when complex things happened, it was always easier.

I've always looked at nosql as "absolutely, sounds great, how do you represent relationships?"

cfeduke|3 years ago

You can sort of fake the joins between two tables and avoid the ceremony with `natural join` assuming you've named the columns the same. (This doesn't work if your team has followed a normal pattern of "table1.id" with a FK in "table2.table1_id".) It's more like a hack and it falls apart as soon as someone starts mucking with column names [what evil person would ever do a thing like that...] or happens to share unrelated columns with the same name across two tables. So yeah, probably don't use it, except maybe to save yourself some time writing OLAP queries but increase your time spent debugging those same queries. :)

dragonwriter|3 years ago

> There’s no way to express data structure knowledge in sql, only relations and keys.

Relations, keys, and (though you forgot them) constraints express…quite a lot of data structure knowledge.

> And every time I want to join table A to B I have to re-define everything, because SQL doesn’t store that.

SQL stores that if you tell it to; the usual way being view CREATE VIEW.

richiebful1|3 years ago

A database system could implement a feature that automatically uses the foreign key to join to a table. Maybe some RDBMS out there does this.

For example, you have a many to one relationship between posts and users. Instead of this:

  select *
  from user as u
  join post as p
  on p.user_id = u.user_id;

You could do:

  alter table post
  add constraint fk_user_id foreign key (user_id) 
  references user.user_id;

  select *
  from user as u
  left referent join post as p;

Any good sql auto-completer will also look up the foreign key information and auto-generate the on clause for you as well. Redgate SQL Prompt (mssql only) is one of the best tools out there for this reason

civilized|3 years ago

Hi, just curious - could you or someone else be more specific about the way in which an ORM encodes relationships that SQL doesn't?

MaxBarraclough|3 years ago

> Discoverability can be quite lacking.

A richer type system would help there, but to my knowledge this isn't offered by any major relational DBMS.

kneebonian|3 years ago

Holy crap, now I know I've been in this field to long we are right back to the NoSQL vs SQL debate, for the checks notes 3rd time.

Wow.

rjbwork|3 years ago

Not just that, but presenting hierarchical as if it isn't a pre-SQL thing that Codd's model was attempting to fix and displace (and succeeded beyond his wildest dreams at doing so, I'd guess).

schmichael|3 years ago

Wow, this is written by an early mentor of mine who introduced me to CouchDB and MongoDB circa 2008! I went on to take MongoDB to Urban Airship (now Airship), make a complete mess out of things (lots of evidence of that on HN even), and eventually port it all to a Postgres cluster. Meanwhile jamwt went to Dropbox and ended up managing thousands of SQL instances of his own!

And now both of us are back on non-SQL datastores: Convex for him, and Nomad's combination of Raft+MemDB for me. While Convex sounds influenced by transactional memory (define your critical section and let the platform keep trying to apply it until it succeeds), Nomad opted for Raft's serialization of mutations. On the read side we both opted for explicit index use, so it seems fair to assume both of us feel like SQL's decoupling of indexes from queries does more harm than good. Query planners are great until they're not.

I would love a SQL (well SELECT) interface to Nomad's datastore (MemDB). SELECT is like the C ABI for data instead of code: imperfect, but a lowest common denominator that if implemented unlocks a ton of workflows.

I wonder if jamwt feels the same and both projects will eventually grow a SQL adapter (or at least a SQL-like DSL... Nomad is close with bexpr filtering on a number of APIs, but JOINs would be really useful).

jamwt|3 years ago

Hi there Michael! We continue to love nomad.

SQL is the C ABI of querying for sure. BI tools will never adapt to use Convex directly, and nor should they.

So... yes, Convex actually had a prototype SQL adapter for the read side of things back in the early few months when we were figuring things out. Convex's read semantics are very compatible with SQL.

We've kept this adapter on ice in part because of point #3 in the article -- we don't want to overpromise things which are a bad idea.

Meaning, if we exposed SQL on the thing as-is, this would presumably be for more analytical type queries involving patterns normal Convex queries can't express. Right now that would be a Bad Idea because your website would slow down just like every other database system allows you to.

So the current recommended practice is use our Airbyte Egress connector (https://airbyte.com/connectors/convex) and get yourself into an offline Clickhouse/MySQL/Snowflake whatever and jam SQL over there to your heart's content. That's basically what we do.

We may one day abstract this away by embedding some sort of mirrored column store sql thing (maybe DuckDB based? who knows) so you can do your analytical work without impact on your OLTP or integrating 3 more systems. But the team hasn't invested in that yet and probably won't for some time.

jamwt|3 years ago

Also, yes, great pickup re: transactional memory. We talk about this internally all the time, this is the inspiration.

Haskell: IO -> STM -> (pure)

Convex: (The browser / Convex Actions / other effectful environments) -> Mutations -> queries

All the same benefits re: retry and memoization.

Steal steal from Haskell, so many great ideas there.

jrochkind1|3 years ago

> In Convex, the application and database types are automatically equivalent because the entire data pipeline from frontend to database uses the same exact types and definitions... There is no adapter code between languages because everything we write is just TypeScript.

It may actually be fine to just require everything to be typescript, but the idea that you'd require your application be written in the same language as your data store, and thus implement a different data store for each language you might want to write an app in (and not share between languages)... would formerly be thought of as pretty ridiculous?

But maybe it's not anymore?

But that seems to be the requirement for "the application and database types are automatically equivalent because the entire data pipeline from frontend to database uses the same exact types and definitions," yes?

In general, most of the purported benefits of convex seem to effectively require the db in the same language as the app, to make it all so seamless.

I guess we'll see if Convex can get popular enough to spawn lots of "what I wish I knew about convex a year ago" and "Goodbye Convex, hello Postgresql" posts...

jamwt|3 years ago

Article author here.

> It may actually be fine to just require everything to be typescript, but the idea that you'd require your application be written in the same language as your data store, and thus implement a different data store for each language you might want to write an app in (and not share between languages)... would formerly be thought of as pretty ridiculous?

The vision is definitely aspirational, and is reflecting on the fact that JS/TS is becoming overwhelmingly the most popular language for projects. With things like React Native, for many teams even mobile apps are taken care of all on one language.

There will obviously always be apps written in other languages, and in those cases, a bit more type mapping will be necessary, because (for now) Convex functions are always side-effect free, automatically cached JS/TS running in a very particular runtime. But we'll work on making that translation as smooth as possible in our client libraries.

We have a Python client library out now ( https://pypi.org/project/convex/ ) and a Rust one coming soon, just to keep us honest in this respect.

> I guess we'll see if Convex can get popular enough to spawn lots of "what I wish I knew about convex a year ago" and "Goodbye Convex, hello Postgresql" posts...

Well, that's the dream, isn't it?

DerArzt|3 years ago

Man what a well written ad.

treis|3 years ago

My reaction exactly. Specifically when they said what developers want is document DBs. That's not true. We (or a lot of us) want the equivalent of Typescript for SQL. Which people have tried to build but their efforts have ended up like CoffeeScript. Better but ultimately not worth using because they don't have the staying power.

eatonphil|3 years ago

On the other side of things, it's interesting to see SQL engines innovating. Though the only one I can name for sure is DuckDB: https://duckdb.org/2022/05/04/friendlier-sql.html.

Have other implementations done anything for making queries easier (that don't involve requiring an IDE or anything, so just query language innovations)?

Edit: Not query language innovations but ClickHouse and Snowflake's "standard library" of builtin functions just keep getting better and better. Once I saw `parseDateTimeBestEffort` in ClickHouse I wanted that everywhere...

jamwt|3 years ago

Article author here -- DuckDB is indeed cool.

Another interesting bit of work in this space is LINQ on the microsoft side of things. And even list comprehensions end up having a scanning/summing language kind of feel to them that would be interesting to see translated into a new database query language.

znaimon|3 years ago

+1 to ClickHouse builtin functions—they vastly simplify a lot of data analytics workloads

xpil|3 years ago

>> Unlike SELECT, these operations don't feature JOINs or subqueries or any other magic that brings together tables.

This is a false statement. Both INSERT and UPDATE support JOINs and subqueries / CTEs. At least according to the standard - not every engine implementing them is another story.

wvenable|3 years ago

I'm forgiving on the authors point here. If you have JOINs and subqueries, you're just doing a SELECT to get data that can only be UPDATEd/INSERTed on a single table. You can't do an INSERT across 5 tables in one statement.

eatonphil|3 years ago

> not every engine implementing them is another story

Which don't? I'd have assumed anything inside of `SELECT`'s `FROM` would be allowed inside of `INSERT` and `UPDATE`.

Or maybe you're not saying you know there are implementations that have these restrictions just that any random implementation might not be there (yet).

cube2222|3 years ago

If I understand correctly this is basically trying to solve a very similar set of issues as something like entgo[0] but in a very TypeScript-native way?

Also, regarding the transaction functions, are those run locally, or are they serialized and run remotely on the database server? Both have their caveats.

Number 1 (and that approach to retries) already works just fine with Postgres, you just need a proper library (which wraps your function and retries on transaction isolation errors). But you also have to keep in mind that if you interface with 3rd party services during your transactions, those calls will need to be idempotent as well. This is actually the proper way to do transactions in application code with a SQL database, esp. if you're running with serializable transaction isolation.

Number 2 is very limiting, as you can't have all those third party services and libraries used between different operations in a transaction, which is often (I'd even argue - usually) very useful.

Since you're citing stored procedures as a viable alternative to convex's take, while not listing the above number 1, it sounds like you're doing 2.

Anyhow, good luck!

[0]: https://entgo.io

JohnDeHope|3 years ago

"It can be done. But can doesn't mean should." I'm getting worn out on this turn of phrase. Yes, can doesn't mean should. But also, couldn't doesn't mean shouldn't. You have to justify "shouldn't". TFA does go on to justify it, which is appreciated.

richbell|3 years ago

> Tab! Tab! Tab! PostgreSQL demonstrates its field autocomplete feature.

SELECT preceeding FROM is such a thorn in the side. :(

paulfitz|3 years ago

Agree, I started paying attention to duckdb when it started allowing FROM before SELECT https://github.com/duckdb/duckdb/pull/5076

BitwiseFool|3 years ago

I wish SQL did not require a comma between items after the SELECT and before the FROM. We don't need commas in between joins. I feel like someone could write a way to parse queries so that it isn't needed.

Can you imagine how much time and effort that would save people?

Edit: You do need commas in ORDER BYs, that slipped my mind when typing out this pet peeve of mine.

WkndTriathlete|3 years ago

Tell us you don't understand the theory or use cases for SQL databases without telling us you don't understand the theory or use cases for SQL databases.

There are indeed times when a simple key-value store is appropriate, but the encoding of hierarchy and the relational calculus are useful for so many more use cases in business applications. SQL as a language is certainly flawed - it would be nice to be able to algebraically/programmatically join select clauses and conditions, for instance, or to perform multi-inserts - but the semantics of the storage and engine are based on some pretty mathematically sound foundations.

You may find the seminal paper by Codd as illuminating as I did if you read it.

vlunkr|3 years ago

I agree with lots of the points here. However one of the great strengths of the SQL family is the FOSS history. I doubt something proprietary is ever going win everyone over. At least I hope not.

schmichael|3 years ago

SQL took a long winding road to reach the FOSS friendliness it has today. Its history is littered with patent, copyright, and trademark disputes. Not to mention widely varying implementations and the continuing lack of a crossplatform wire protocol or standard interface. Proprietary incumbents still dominate huge sectors of the SQL landscape and pose compatibility and intellectual property issues for FOSS implementations.

I think any SaaS APIs that reach any degree of widespread adoption eventually get OSS implementations (eg S3's API is widely supported by FOSS and proprietary implementations). This evolutionary path is awfully similar to SQL's for better and/or worse.

jamwt|3 years ago

Article author here.

Agree this is a necessary part of the change. Convex is working on our OSS strategy this year. Thanks for the feedback!

cgh|3 years ago

Optimistic concurrency control is mentioned as a feature of Convex, the advertised product. But OCC is a common concern and has been for ages. In Java, the JPA standard specifies the @Version annotation which enables OCC on entity objects and it's implemented by the usual suspects like Hibernate. It's been around for at least a decade so I'm struggling to understand why this is being presented as some big innovative win, unless I'm completely missing something here.

jamwt|3 years ago

Article author here.

It's not new "in the world", but often new to our users, who understand it less intuitively than pessimistic locking.

Unless they're using Haskell or probably other very exotic environments with strict understanding of mutability and so on, they know Mutexes but not STM or other OCC patterns. So we lean heavily on explaining it. Not because it is original, but because it is unfamiliar.

015a|3 years ago

I mean... yeah MongoDB got a lot of hate, but I think the broader point is that it was one of the first technologies to popularize the domain of NoSQL. No one knew how to use it properly; so we adapted SQL-like schema design, and when it became obvious that didn't work well the hate started spilling over to the first technology to arrive at the party.

The elephant in the room is, I suppose, that the modern internet literally would not be possible without NoSQL. It may be possible without SQL; that seems likely to me. Part of that is because NoSQL is a big umbrella, and covers extremely critical databases like Redis or even databases like Cloudflare's proprietary edge cache. But, even document stores are extremely critical to enterprise scaling; during Prime Day 2022, DynamoDB peaked at 150M rps. There's no SQL setup on the planet that could handle volume like that while still maintaining all the things that Make It SQL; you could start throwing read replicas and removing joins and cross-table lookups and sharding data and ope, you just re-invented a document store.

Here's the couple conclusions I have started operating by:

1. Document stores are, today, a strong choice at both low and high scales on the spectrum of systems scaling. Its great at low scales because you can avoid thinking about it too much. Its great at high scales because once you have the space to think about it you can attain substantially higher efficiency (performance+cost).

2. Making a document store operate more like SQL, for the things SQL is good at (joins, validation, etc) is a lot easier than making a SQL database operate like a document store for the things document stores are good at (Planetscale? there's a couple players in this game).

3. SQL-the-language sucks. There I said it; I'll die on that hill. The language was invented at a time +/- 2 years of Scheme, ML, Prolog, and Smalltalk. Our industry has rejected all of those. We haven't rejected SQL (yet). SQL is demonstrably, as a syntax, just as bad as those languages, evidenced by all the ORMs, injection attacks, etc. Databases tend to have a longer shelf life than programming languages, but SQL-the-language will die.

4. But, duh, SQL is fine. Go ahead and use it. In some situations it makes sense. In others it doesn't. Its a tool; one that has had 60 years to form around solving as many problems as possible.

[1] https://aws.amazon.com/blogs/aws/amazon-prime-day-2022-aws-f...

brightball|3 years ago

> 2. Making a document store operate more like SQL, for the things SQL is good at (joins, validation, etc) is a lot easier than making a SQL database operate like a document store for the things document stores are good at (Planetscale? there's a couple players in this game).

For a lot of NoSQL document stores, the simple acting of having multiple indexes was a bridge too far. Couchbase comes to mind here.

I can't think of many use cases for a document store over PostgreSQL. Maybe as a cache layer...but I'd just use a materialized view. Where some of the data structure was going to be unknown or user defined, such as a system monitoring tool? Server Density was one of the first big MongoDB adopters for this reason. Now we can store than in a JSONB column though.

> 3. SQL-the-language sucks.

You're entitled to your opinion. I've been doing this 20 years and basic SQL knowledge has been by far the most valuable and portable skill of my career. I'm by no means an expert, but you asked developers who have been in the field for 5+ years what a HAVING statement does and they have no idea. There's a huge gap in basic SQL knowledge, stuff that can be learned in < 1 week. I taught somebody who came out of a code boot camp basic SQL for a couple of weeks during an internship. Her first job made her acting DBA because she was the only person who knew SQL.

The hill I'll die on, is that ORM's and frameworks are creating an astounding lack of basic SQL knowledge in today's developers which leads to a lot of completely unwarranted griping about SQL. (not directed at you, just in general on this topic)

xupybd|3 years ago

One of the best things I've seen that integrates SQL with your code is F#'s type providers. http://fsprojects.github.io/SQLProvider/

It's amazing how it all works. But only when it works. I've found it so flakey I don't use it for production but if the bugs were ironed out it would be amazing.

legerdemain|3 years ago

  > We paused just long enough to take a sip of our Spicy Maya Mocha from Coupa Cafe.

To be honest, I think almost all drinks at Coupa Cafe are pretty bad.

AnEro|3 years ago

Graph databases solve most of his issues, outside the complexity of queries.

If we are begging DB engineers for things can we get a graph layer for accessing SQL tables please?

bambax|3 years ago

This example is really curious:

    BEGIN;
    SELECT post_count, ... from users where ... FOR UPDATE;
    INSERT INTO posts VALUES(...);
    UPDATE users SET post_count = new_post_count WHERE ...;
    COMMIT;

For one, it's unlikely the number of posts per user is so important and so often requested that it needs to be cached in the database itself.

Secondly, why would that value need to be stored with each new insert? Simply insert posts as they come, and calculate metadata about posts at a later time as a batch process, or when some part of the application actually request them.

83 comments