top | item 29963448

How and why the Relational Model works for databases

178 points| uvdn7 | 4 years ago |blog.the-pans.com | reply

243 comments

order
[+] brianmcc|4 years ago|reply
The best advice I can give is that you can think of your app and its data store as either:

#1 code is of prime importance, data store is simply a "bucket" for its data

#2 data is of prime importance, code is simply the means to read/write/display it

In my personal experience, #2 is a way better way to work. Apps can come and go, but data can last a long time, and the better your database is modelled the better the outcomes you'll have long term.

Corollary - I have seen some abject disasters where #1 has been adopted. Not necessarily just because of #1 alone but it's certainly been a major factor.

[+] gpderetta|4 years ago|reply
"Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowchart; it'll be obvious."

-- Fred Brooks, The Mythical Man Month (1975)

[+] forinti|4 years ago|reply
One awful thing I've seen a few times is people who put data into a database in a format meant to make it easier for working with a particular platform, such as serialized Java dates (or even dates as longs).

It completely ties you to one implementation and precludes using the data for other things such as reporting and alerts or even ad hoc queries.

[+] naasking|4 years ago|reply
> In my personal experience, #2 is a way better way to work. Apps can come and go, but data can last a long time

Exactly. Data is far more valuable than code. We still work with data from literally centuries ago. Most programs written even 10 years ago are probably no longer in use. I would be surprised if the oldest code still doing something useful was more than 20 or so years old.

[+] oblio|4 years ago|reply
You've just torn apart most NoSQL databases in a very short comment :-))
[+] mmcnl|4 years ago|reply
I totally agree. Data lasts forever, code is volatile.
[+] andy_ppp|4 years ago|reply
Sure, I agree largely with two but writing tests for this logic can get difficult (pgTap looks alright but not amazing) as can making all the database logic scale as you grow to sharding and multiple DBs. I’m not sure you’d want this if you were scaling to hundreds of millions of users but then you have loads of good problems…
[+] Dave3of5|4 years ago|reply
Or maybe it's shades of grey in between ...
[+] caditinpiscinam|4 years ago|reply
> All these database aspects remained virtually unchanged since 1970, which is absolutely remarkable. Thanks to the NoSQL movement, we know it's not because of a lack of trying.

Kind of an aside, but I find it odd that people treat "relational" and "SQL" as synonymous (as well as "non-relational" and "NoSQL"). You could make a relational database that was managed with a language other than SQL, right?

[+] goto11|4 years ago|reply
True, but SQL is (for better or worse) the de-facto standard query language for relational databases. The alternatives are all very niche.

I would like a better alternative, but it would need some very significant benefits compared to SQL to gain any traction. SQL is just so entrenched at this point that even NoSql database engines are adding support for pseudo-SQL query languages (which is really the worst of both worlds - the clunkyness of SQL syntax without the power of the relational model).

[+] roenxi|4 years ago|reply
Which in itself is confusing given the upside potential of having relational model without the baggage. Hadley Wickham has had huge success in the context of R re-implementing the relational model without SQL (see: dplyr). It is more user friendly and much more pleasant to use than the mess that database users have to put up with.
[+] HelloNurse|4 years ago|reply
But you could also use SQL for that new RDBMS, or support multiple query languages. It doesn't matter, all relational databases are very similar because they are relational: they do the same things, potentially in the same way, regardless of the query language front end.

Apart from being subjectively displeased at SQL syntax, most problems with SQL are actually query language independent issues in the RDBMS: maddening proprietary extensions, implementation limits, nonportable details, library and system issues (e.g. character encoding and default configurations).

The best alternative query languages can do is making certain queries (advanced, rare ones) easier to express.

[+] mamcx|4 years ago|reply
One branch that was super-popular in the day was the dbase family (I work in FoxPro, it was THE dream).

Working in Fox dispel many of the weird limitations of sql (and BTW, you can use SQL on Fox alike linq, is first-class).

The #1? You can build ALL using the relational model. After MS kill Fox my career can be summarized as: Trying N workarounds because I don't have Fox anymore.

Other langs (python, delphi, f#, rust) capture a little of the magic or make the workaround bearable, but none is as productive.

[+] chana_masala|4 years ago|reply
Your intuition is correct. The book "Designing Data Intensive Applications" by Martin Kleppmann goes into detail about this, including some of the history.
[+] simonh|4 years ago|reply
It's for similar reasons that people call vacuum cleaners 'hoovers', they 'google' for things on DuckDuckGo, they 'Xerox' documents on a Cannon copier, all on-road reflective lane markers are a 'Catseye' etc. It may not be accurate but I don't find it at all odd.
[+] manbart|4 years ago|reply
Tutorial D is one such example of a non-SQL relational db language
[+] vaughan|4 years ago|reply
SQL is simply converted into a logical relation algebra tree. So anything that can do that could be used. It's a shame that PSQL doesn't provide an API for this, then all the quirks of SQL could be fixed. However, then we would probably have a proliferation of non-standard SQL like what happened with JavaScript, and standard tooling couldn't be used.

SQL is somewhat nice because its fairly well standardized.

[+] bena|4 years ago|reply
As the pirate meme says, "Well yes, but actually no".

This is kind of like a sticking effect. It got in at the right time and it was good enough and there wasn't enough interest in developing something to replace it that it just became standard.

But if you want to get down to it, any entity data model implemented is kind of an attempt at a replacement.

[+] dotancohen|4 years ago|reply

  > You could make a relational database that was managed with a language other than SQL, right?
Other commenters have shown examples of non-SQL relational databases. However, all the alternatives were also examples of Structured Query Languages. Languages designed to query a data store with _Structure_.
[+] samatman|4 years ago|reply
Well if NoSQL had called their movement "non-relational databases" that would have been an obviously bad idea, right?

But the idea of never writing SQL again remains compelling to a certain sort of mindset, one which I have been known to share when throwing myself against the proverbial wall.

SQL isn't a bad dialect with which to hand-roll queries against relational data. I'm quite sure there's room for improvement, especially when it comes to generating queries; I've dipped my toes in this and the compiler has to 'think in SQL' eventually, it's not composable in the way that it should be.

But "SQL is annoying and ORMs are terrible" is, from my recollection, the sentiment which gave us the meteoric (pun intended!) rise of Mongo, and "relational modeling isn't negotiable" is the iron fact of our profession which lead to its ignominious fall.

[+] uvdn7|4 years ago|reply
NoSQL was trying to “break” the relational model. It just happens to have “SQL” in its name. I intentionally avoided associating relational model with SQL in the article. Maybe I failed lol.
[+] datatrashfire|4 years ago|reply
You're absolutely correct, but in practice I'm not aware of any relational databases with widespread adoption that don't use SQL.
[+] bob1029|4 years ago|reply
I have grown to understand that the relational model is the answer for solving all hyper-complex problems. The Out of the Tar Pit paper was a revolution for my understanding of how to approach properly hard things:

http://curtclifton.net/papers/MoseleyMarks06a.pdf

The sacred artifact in this paper is Chapter 9: Functional Relational Programming.

Based upon inspiration in this paper, we have developed a hybrid FRP system where we map our live business state to a SQLite database (in memory) and then use queries defined by the business to determine logical outcomes or projections of state for presentation. Assuming you have all facts contained in appropriate tables, there is always some SQL query you could write to give the business what they want. An example:

> Give me a SQL rule that says the submit button is disabled if the email address or phone number are blank/null on their current order.

  --Disable Order Submit Button Rule
  SELECT 1          -- 1 == true, 0 == false
  FROM Customer c, Order o
  WHERE o.CustomerId = c.Id
  AND c.IsActiveCustomer = 1
  AND o.IsActiveOrder = 1
  AND (IsNullOrEmpty(o.EmailAddress) OR IsNullOrEmpty(o.PhoneNumber))
I hope the advantages of this are becoming clear - You can have non-developers (ideally domain experts with some SQL background) build most of your complex software for you. No code changes are required when SQL changes.

The relational model in this context is powerful because it is something that most professionals can adopt and collaborate with over time. You don't have to be a level 40 code wizard to understand that a Customers table is very likely related to a ShoppingCarts table by way of some customer identity. If anyone starts to glaze over at your schema diagrams, just move everything into excel and hand the stakeholders some spreadsheets with example data.

[+] wotbrew|4 years ago|reply
Ahhh... a fellow traveller on the road.

I have been hacking on a Clojure/Script library (https://github.com/wotbrew/relic) to experiment with exactly this kind of thing. / PLUG

For most problems I encounter I want to focus on the data and its relationships.

I cannot help feeling like programming languages and idioms have me deal with too much 'mechanism', the machine like structures and forms that are mostly incidental to the data domains I work in, and are entirely about performance, layout, access pattern etc - when a machine could do an ok job of making those decisions for me until optimisation becomes important.

[+] freeqaz|4 years ago|reply
Do you have any additional resources about this model of thought? It's like Redux on steroids lol. I wonder if anybody has done a SQLite-as-the-Store pattern library for front end apps before. I'd use the hell out of that!
[+] kingdomcome50|4 years ago|reply
I think your example is a solid illustration as to why you may not want non-developers writing SQL:

    SELECT 
        CASE WHEN o.EmailAddress IS NULL OR o.PhoneNumber IS NULL
            THEN 1
            ELSE 0
        END AS [IsSubmitDisabled]
    FROM Customer c 
    INNER JOIN Order o ON c.Id = o.CustomerId
        AND o.IsActiveOrder = 1
    WHERE c.IsActiveCustomer = 1
These queries aren't identical in their output, though they are logically the same. Is that a good thing? What are the implications of allowing such ad-hoc queries within a system?
[+] Jordanpomeroy|4 years ago|reply
The promise of low code.

I’m sure there is a time and place for it (Power Apps is a billion dollar business), like time keeping frontend logic. Apps with significant logic and complexity will always need developers.

I’m not critical of your app as I have no real understanding of what it does and why the design choices were made. I do not think embedding logic in SQL would be an ideal way for any app I’ve worked on to work.

[+] twofornone|4 years ago|reply
In my limited experience, code built around relational databases is difficult to grok. The structure is inverted with respect to typical OOP. Rather than having a sane class hierarchy, where one can start at the top and navigate down to understand how objects are nested, the structure is inverted, piecing together class relationships requires looking at the tables and following foreign keys which effectively point to parent classes. It feels backwards.
[+] samatman|4 years ago|reply
"Access Path Dependence" continues to plague our computers because they are a built-in assumption of file systems.

This is really showing its age. I want my computer to think in terms of what data is, where it came from, any other metadata I received it with, and any metadata I've added, notably, but not primarily, the various "places" I've put it.

The fact that I can't retrieve the URL I downloaded anything from, years later, no mater how many times I've moved it, is just shameful. It's cheap information our tools could be preserving but aren't.

So if you ask me what I think of the relational model, I'll tell you: it's a good idea and we should try it.

[+] ashvardanian|4 years ago|reply
Have opened this thread a couple of times today, hoping to see comments disagreeing with the post. Still no such comments, so I’ll take the duty… hopefully not causing a

I have been programming for ~15 years, the majority of my life. Web, mobile, HPC, CUDA, assembly for x86 and ARM, kernel modules, LLVM plugins, and databases… Not once in my career I found the concept of relational databases efficient or relevant. They are frustrating to use, exceptionally slow and generally provide SQLs, or other DSLs, which is the most archaic form of query representation I could think of - not binary and not general-purpose.

It reminds me of no-code development platforms. They may work (not very well) for some super simple tasks, but as soon as you want to do something at least remotely non-trivial, they put more barriers, then provide help. From personal experience, again, I have grown to hate frontend development so much, that have tried a dozen of website builders, before reverting to good old HTML/CSS (plus a bit of JS) every time I wanted to refresh my blog or companies website.

Plus, I wouldn’t immediately dismiss the concept of Graph Databases. If we want to be truly canonical, we wouldn’t create hundreds of columns in our tables, with just a few relations. The ideology is that every unique “type” (in any sense you prefer), should be in its own table, linked with the other “types” in other tables… Then theory ends and starts practice. Try implementing a fast JOIN in a relational database. Then increase the depth to 3, tracing the relations of relations of relations. Even in a non distributed case it is a horror. Both the nested SQL queries and the program that will be evaluating them. Graph DBs are designed to solve specifically that issue really well.

Another point: how “relational algebra” suddenly makes smth superior to anything else? It’s not a Grand Unified Theory of Physics, not rocket science and not even Graph Theory for that matter. The latter being the biggest and most studied branch of Theoretical Computer Science with brilliant theorems being published even today.

Not saying that todays popular Graph DBs are good (they are mostly disgusting), but I would still much rather think of my data as a graph, than a table with some foreign keys

[+] SuperCuber|4 years ago|reply
One problem I have with relational databases is I don't know of a good way to represent sum types - I remember seeing some possible solutions but they looked very complex and hard to understand (or dbms-specific)
[+] uvdn7|4 years ago|reply
Maybe the File System shouldn’t be hierarchical but rather relational as well.
[+] mftb|4 years ago|reply
Microsoft Sharepoint uses SQL Server for all user Item storage, so it's an example of an all relational file store. They did this around the time they were also experimenting with the Cairo a sibling comment mentions. Also as another commenter noted you query on the metadata. Folders/directories for instance are represented as metadata. Installations can also describe very detailed ontologies including ones provided by third parties. It's a lot of work.

Edit: I think I was actually thinking of WinFS which came out of Cairo later, around 2000.

[+] taeric|4 years ago|reply
And then we could have fun looking at the execution plan of our file access. :D

Meant in jest; though I think the idea of a "one true way" to access files is a pipe dream. The hierarchical works better on my computer than whatever scheme we've cooked up for our phones.

[+] roenxi|4 years ago|reply
The only thing the relational models really gives you is the ability to join two relations, which lets data decompose for storage but re-compose for many different uses. I'm not immediately seeing how that would be useful in a file context, where generally you want to look up a specific blob of data (aka a file).

A tree-based file-system is optimised for doing a search from the users perspective, finding a file takes log(files) steps and finding related files is trivially cheap. It is likely hard to outdo that with a relational model.

[+] jasfi|4 years ago|reply
What is the equivalent of a directory, and of a file? From the relation point of view, what would your tables be?

It doesn't map that well, the idea sounds intriguing, but in practice a filesystem seems like a better model for files.

[+] dboreham|4 years ago|reply
> Maybe the File System shouldn’t be hierarchical but rather relational as well

How do you know it isn't?

[+] tyingq|4 years ago|reply
Most operating systems do have the ability to add key/value pairs to files via extended attributes, like xattr on Linux. That's been around for quite some time. I suppose it goes unused since it's not centrally indexed, and also gets left behind in most types of file transfers.
[+] tuatoru|4 years ago|reply
> What is a relation in English? Actually the dictionary definition works pretty well in this case. The following definition comes from Merriam Webster.

    an aspect ... that connects two or more things or parts as being or belonging or working together
This is needlessly and wrongly freighting the model with a semantic interpretation. A relation is a subset of a cross-product of sets. Nothing more.
[+] Mikhail_Edoshin|4 years ago|reply
But there is a semantic interpretation: it's a set of true statements out of all possible statements.
[+] ngcc_hk|4 years ago|reply
Other than parts and bill-of-material with self loop, I do not know why you do not use data model.

Show me the code I am confused.

Show me the data, …

[+] vaughan|4 years ago|reply
I've come to realize the relational model is not a great fit for most data.

It constrains your data model for the purpose of representing your queries in relational calculus, which allows a corresponding relational algebra to operate on them to help optimize disk access. This comes at a cost. Although, if this is your primary goal then that's fine, which it has been for many.

Data these days is deeply nested or document-based, and encoding this in the relational model is incredibly unwieldy, with huge ugly join queries, and the planner starts making random guesses at 6 joins or so.

Everyone ends up with a rigid, and poorly normalized physical schema to suit the sql planner. Think about all the times you avoid M-M joins because your queries will explode in complexity.

Also, pretty much every app these days wants streaming updates to queries. The optimizer doesn't optimize for streaming updates and most streaming is done by polling. Incremental view maintenance is also very difficult to achieve as well as streaming SQL.

[+] mshaler|4 years ago|reply
Two thoughts here: - This is why ETL + variants are hard - Graph databases are (arguably because HN) better than the relational model (e.g., flexibility, ease of modeling, accessibility of algorithms/analytics)
[+] emadda|4 years ago|reply
If you want to use SQL with your Stripe data, I recently released a CLI called tdog that downloads your Stripe data to a SQL database.

https://table.dog

[+] _xnmw|4 years ago|reply
Anyone who thinks they don't need a relational database eventually ends up reinventing and rewriting aspects of an RDBMS, except badly. You're just kicking the can down the road.
[+] nchi3|4 years ago|reply
Depends on the use cases, but yeah, for most people/companies adopting NoSQL before even using SQL, sure.
[+] fnord123|4 years ago|reply
Literally technical debt. And managing debt is part of any project.