The best advice I can give is that you can think of your app and its data store as either:
#1 code is of prime importance, data store is simply a "bucket" for its data
#2 data is of prime importance, code is simply the means to read/write/display it
In my personal experience, #2 is a way better way to work. Apps can come and go, but data can last a long time, and the better your database is modelled the better the outcomes you'll have long term.
Corollary - I have seen some abject disasters where #1 has been adopted. Not necessarily just because of #1 alone but it's certainly been a major factor.
"Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowchart; it'll be obvious."
One awful thing I've seen a few times is people who put data into a database in a format meant to make it easier for working with a particular platform, such as serialized Java dates (or even dates as longs).
It completely ties you to one implementation and precludes using the data for other things such as reporting and alerts or even ad hoc queries.
> In my personal experience, #2 is a way better way to work. Apps can come and go, but data can last a long time
Exactly. Data is far more valuable than code. We still work with data from literally centuries ago. Most programs written even 10 years ago are probably no longer in use. I would be surprised if the oldest code still doing something useful was more than 20 or so years old.
Sure, I agree largely with two but writing tests for this logic can get difficult (pgTap looks alright but not amazing) as can making all the database logic scale as you grow to sharding and multiple DBs. I’m not sure you’d want this if you were scaling to hundreds of millions of users but then you have loads of good problems…
> All these database aspects remained virtually unchanged since 1970, which is absolutely remarkable. Thanks to the NoSQL movement, we know it's not because of a lack of trying.
Kind of an aside, but I find it odd that people treat "relational" and "SQL" as synonymous (as well as "non-relational" and "NoSQL"). You could make a relational database that was managed with a language other than SQL, right?
Yes. PostgreSQL is a reference to the Post Ingres database (POSTGRES) adopting SQL. Ingres used QUEL. The POSTGRES database used POSTQUEL which was similar to Ingres’ QUEL. This is why the Postgres C library is called libpq (library post quel)
True, but SQL is (for better or worse) the de-facto standard query language for relational databases. The alternatives are all very niche.
I would like a better alternative, but it would need some very significant benefits compared to SQL to gain any traction. SQL is just so entrenched at this point that even NoSql database engines are adding support for pseudo-SQL query languages (which is really the worst of both worlds - the clunkyness of SQL syntax without the power of the relational model).
Which in itself is confusing given the upside potential of having relational model without the baggage. Hadley Wickham has had huge success in the context of R re-implementing the relational model without SQL (see: dplyr). It is more user friendly and much more pleasant to use than the mess that database users have to put up with.
But you could also use SQL for that new RDBMS, or support multiple query languages. It doesn't matter, all relational databases are very similar because they are relational: they do the same things, potentially in the same way, regardless of the query language front end.
Apart from being subjectively displeased at SQL syntax, most problems with SQL are actually query language independent issues in the RDBMS: maddening proprietary extensions, implementation limits, nonportable details, library and system issues (e.g. character encoding and default configurations).
The best alternative query languages can do is making certain queries (advanced, rare ones) easier to express.
One branch that was super-popular in the day was the dbase family (I work in FoxPro, it was THE dream).
Working in Fox dispel many of the weird limitations of sql (and BTW, you can use SQL on Fox alike linq, is first-class).
The #1? You can build ALL using the relational model. After MS kill Fox my career can be summarized as: Trying N workarounds because I don't have Fox anymore.
Other langs (python, delphi, f#, rust) capture a little of the magic or make the workaround bearable, but none is as productive.
Your intuition is correct. The book "Designing Data Intensive Applications" by Martin Kleppmann goes into detail about this, including some of the history.
It's for similar reasons that people call vacuum cleaners 'hoovers', they 'google' for things on DuckDuckGo, they 'Xerox' documents on a Cannon copier, all on-road reflective lane markers are a 'Catseye' etc. It may not be accurate but I don't find it at all odd.
SQL is simply converted into a logical relation algebra tree. So anything that can do that could be used. It's a shame that PSQL doesn't provide an API for this, then all the quirks of SQL could be fixed. However, then we would probably have a proliferation of non-standard SQL like what happened with JavaScript, and standard tooling couldn't be used.
SQL is somewhat nice because its fairly well standardized.
As the pirate meme says, "Well yes, but actually no".
This is kind of like a sticking effect. It got in at the right time and it was good enough and there wasn't enough interest in developing something to replace it that it just became standard.
But if you want to get down to it, any entity data model implemented is kind of an attempt at a replacement.
> You could make a relational database that was managed with a language other than SQL, right?
Other commenters have shown examples of non-SQL relational databases. However, all the alternatives were also examples of Structured Query Languages. Languages designed to query a data store with _Structure_.
Well if NoSQL had called their movement "non-relational databases" that would have been an obviously bad idea, right?
But the idea of never writing SQL again remains compelling to a certain sort of mindset, one which I have been known to share when throwing myself against the proverbial wall.
SQL isn't a bad dialect with which to hand-roll queries against relational data. I'm quite sure there's room for improvement, especially when it comes to generating queries; I've dipped my toes in this and the compiler has to 'think in SQL' eventually, it's not composable in the way that it should be.
But "SQL is annoying and ORMs are terrible" is, from my recollection, the sentiment which gave us the meteoric (pun intended!) rise of Mongo, and "relational modeling isn't negotiable" is the iron fact of our profession which lead to its ignominious fall.
NoSQL was trying to “break” the relational model. It just happens to have “SQL” in its name. I intentionally avoided associating relational model with SQL in the article. Maybe I failed lol.
I have grown to understand that the relational model is the answer for solving all hyper-complex problems. The Out of the Tar Pit paper was a revolution for my understanding of how to approach properly hard things:
The sacred artifact in this paper is Chapter 9: Functional Relational Programming.
Based upon inspiration in this paper, we have developed a hybrid FRP system where we map our live business state to a SQLite database (in memory) and then use queries defined by the business to determine logical outcomes or projections of state for presentation. Assuming you have all facts contained in appropriate tables, there is always some SQL query you could write to give the business what they want. An example:
> Give me a SQL rule that says the submit button is disabled if the email address or phone number are blank/null on their current order.
--Disable Order Submit Button Rule
SELECT 1 -- 1 == true, 0 == false
FROM Customer c, Order o
WHERE o.CustomerId = c.Id
AND c.IsActiveCustomer = 1
AND o.IsActiveOrder = 1
AND (IsNullOrEmpty(o.EmailAddress) OR IsNullOrEmpty(o.PhoneNumber))
I hope the advantages of this are becoming clear - You can have non-developers (ideally domain experts with some SQL background) build most of your complex software for you. No code changes are required when SQL changes.
The relational model in this context is powerful because it is something that most professionals can adopt and collaborate with over time. You don't have to be a level 40 code wizard to understand that a Customers table is very likely related to a ShoppingCarts table by way of some customer identity. If anyone starts to glaze over at your schema diagrams, just move everything into excel and hand the stakeholders some spreadsheets with example data.
I have been hacking on a Clojure/Script library (https://github.com/wotbrew/relic) to experiment with exactly this kind of thing. / PLUG
For most problems I encounter I want to focus on the data and its relationships.
I cannot help feeling like programming languages and idioms have me deal with too much 'mechanism', the machine like structures and forms that are mostly incidental to the data domains I work in, and are entirely about performance, layout, access pattern etc - when a machine could do an ok job of making those decisions for me until optimisation becomes important.
Do you have any additional resources about this model of thought? It's like Redux on steroids lol. I wonder if anybody has done a SQLite-as-the-Store pattern library for front end apps before. I'd use the hell out of that!
I think your example is a solid illustration as to why you may not want non-developers writing SQL:
SELECT
CASE WHEN o.EmailAddress IS NULL OR o.PhoneNumber IS NULL
THEN 1
ELSE 0
END AS [IsSubmitDisabled]
FROM Customer c
INNER JOIN Order o ON c.Id = o.CustomerId
AND o.IsActiveOrder = 1
WHERE c.IsActiveCustomer = 1
These queries aren't identical in their output, though they are logically the same. Is that a good thing? What are the implications of allowing such ad-hoc queries within a system?
I’m sure there is a time and place for it (Power Apps is a billion dollar business), like time keeping frontend logic. Apps with significant logic and complexity will always need developers.
I’m not critical of your app as I have no real understanding of what it does and why the design choices were made. I do not think embedding logic in SQL would be an ideal way for any app I’ve worked on to work.
In my limited experience, code built around relational databases is difficult to grok. The structure is inverted with respect to typical OOP. Rather than having a sane class hierarchy, where one can start at the top and navigate down to understand how objects are nested, the structure is inverted, piecing together class relationships requires looking at the tables and following foreign keys which effectively point to parent classes. It feels backwards.
"Access Path Dependence" continues to plague our computers because they are a built-in assumption of file systems.
This is really showing its age. I want my computer to think in terms of what data is, where it came from, any other metadata I received it with, and any metadata I've added, notably, but not primarily, the various "places" I've put it.
The fact that I can't retrieve the URL I downloaded anything from, years later, no mater how many times I've moved it, is just shameful. It's cheap information our tools could be preserving but aren't.
So if you ask me what I think of the relational model, I'll tell you: it's a good idea and we should try it.
Have opened this thread a couple of times today, hoping to see comments disagreeing with the post. Still no such comments, so I’ll take the duty… hopefully not causing a
I have been programming for ~15 years, the majority of my life. Web, mobile, HPC, CUDA, assembly for x86 and ARM, kernel modules, LLVM plugins, and databases… Not once in my career I found the concept of relational databases efficient or relevant. They are frustrating to use, exceptionally slow and generally provide SQLs, or other DSLs, which is the most archaic form of query representation I could think of - not binary and not general-purpose.
It reminds me of no-code development platforms. They may work (not very well) for some super simple tasks, but as soon as you want to do something at least remotely non-trivial, they put more barriers, then provide help. From personal experience, again, I have grown to hate frontend development so much, that have tried a dozen of website builders, before reverting to good old HTML/CSS (plus a bit of JS) every time I wanted to refresh my blog or companies website.
Plus, I wouldn’t immediately dismiss the concept of Graph Databases. If we want to be truly canonical, we wouldn’t create hundreds of columns in our tables, with just a few relations. The ideology is that every unique “type” (in any sense you prefer), should be in its own table, linked with the other “types” in other tables… Then theory ends and starts practice. Try implementing a fast JOIN in a relational database. Then increase the depth to 3, tracing the relations of relations of relations. Even in a non distributed case it is a horror. Both the nested SQL queries and the program that will be evaluating them. Graph DBs are designed to solve specifically that issue really well.
Another point: how “relational algebra” suddenly makes smth superior to anything else? It’s not a Grand Unified Theory of Physics, not rocket science and not even Graph Theory for that matter. The latter being the biggest and most studied branch of Theoretical Computer Science with brilliant theorems being published even today.
Not saying that todays popular Graph DBs are good (they are mostly disgusting), but I would still much rather think of my data as a graph, than a table with some foreign keys
One problem I have with relational databases is I don't know of a good way to represent sum types - I remember seeing some possible solutions but they looked very complex and hard to understand (or dbms-specific)
Microsoft Sharepoint uses SQL Server for all user Item storage, so it's an example of an all relational file store. They did this around the time they were also experimenting with the Cairo a sibling comment mentions. Also as another commenter noted you query on the metadata. Folders/directories for instance are represented as metadata. Installations can also describe very detailed ontologies including ones provided by third parties. It's a lot of work.
Edit: I think I was actually thinking of WinFS which came out of Cairo later, around 2000.
Microsoft wondered that too and for a while developed "CairoOFS" as a possible replacement for NTFS. It was intended as a relational "object filesystem" https://betawiki.net/wiki/Microsoft_Cairo
And then we could have fun looking at the execution plan of our file access. :D
Meant in jest; though I think the idea of a "one true way" to access files is a pipe dream. The hierarchical works better on my computer than whatever scheme we've cooked up for our phones.
The only thing the relational models really gives you is the ability to join two relations, which lets data decompose for storage but re-compose for many different uses. I'm not immediately seeing how that would be useful in a file context, where generally you want to look up a specific blob of data (aka a file).
A tree-based file-system is optimised for doing a search from the users perspective, finding a file takes log(files) steps and finding related files is trivially cheap. It is likely hard to outdo that with a relational model.
Most operating systems do have the ability to add key/value pairs to files via extended attributes, like xattr on Linux. That's been around for quite some time. I suppose it goes unused since it's not centrally indexed, and also gets left behind in most types of file transfers.
> What is a relation in English? Actually the dictionary definition works pretty well in this case. The following definition comes from Merriam Webster.
an aspect ... that connects two or more things or parts as being or belonging or working together
This is needlessly and wrongly freighting the model with a semantic interpretation. A relation is a subset of a cross-product of sets. Nothing more.
I've come to realize the relational model is not a great fit for most data.
It constrains your data model for the purpose of representing your queries in relational calculus, which allows a corresponding relational algebra to operate on them to help optimize disk access. This comes at a cost. Although, if this is your primary goal then that's fine, which it has been for many.
Data these days is deeply nested or document-based, and encoding this in the relational model is incredibly unwieldy, with huge ugly join queries, and the planner starts making random guesses at 6 joins or so.
Everyone ends up with a rigid, and poorly normalized physical schema to suit the sql planner. Think about all the times you avoid M-M joins because your queries will explode in complexity.
Also, pretty much every app these days wants streaming updates to queries. The optimizer doesn't optimize for streaming updates and most streaming is done by polling. Incremental view maintenance is also very difficult to achieve as well as streaming SQL.
Two thoughts here:
- This is why ETL + variants are hard
- Graph databases are (arguably because HN) better than the relational model (e.g., flexibility, ease of modeling, accessibility of algorithms/analytics)
Anyone who thinks they don't need a relational database eventually ends up reinventing and rewriting aspects of an RDBMS, except badly. You're just kicking the can down the road.
[+] [-] brianmcc|4 years ago|reply
#1 code is of prime importance, data store is simply a "bucket" for its data
#2 data is of prime importance, code is simply the means to read/write/display it
In my personal experience, #2 is a way better way to work. Apps can come and go, but data can last a long time, and the better your database is modelled the better the outcomes you'll have long term.
Corollary - I have seen some abject disasters where #1 has been adopted. Not necessarily just because of #1 alone but it's certainly been a major factor.
[+] [-] gpderetta|4 years ago|reply
-- Fred Brooks, The Mythical Man Month (1975)
[+] [-] forinti|4 years ago|reply
It completely ties you to one implementation and precludes using the data for other things such as reporting and alerts or even ad hoc queries.
[+] [-] naasking|4 years ago|reply
Exactly. Data is far more valuable than code. We still work with data from literally centuries ago. Most programs written even 10 years ago are probably no longer in use. I would be surprised if the oldest code still doing something useful was more than 20 or so years old.
[+] [-] oblio|4 years ago|reply
[+] [-] mmcnl|4 years ago|reply
[+] [-] andy_ppp|4 years ago|reply
[+] [-] Dave3of5|4 years ago|reply
[+] [-] caditinpiscinam|4 years ago|reply
Kind of an aside, but I find it odd that people treat "relational" and "SQL" as synonymous (as well as "non-relational" and "NoSQL"). You could make a relational database that was managed with a language other than SQL, right?
[+] [-] mulmen|4 years ago|reply
https://en.m.wikipedia.org/wiki/QUEL_query_languages https://en.m.wikipedia.org/wiki/PostgreSQL#
[+] [-] goto11|4 years ago|reply
I would like a better alternative, but it would need some very significant benefits compared to SQL to gain any traction. SQL is just so entrenched at this point that even NoSql database engines are adding support for pseudo-SQL query languages (which is really the worst of both worlds - the clunkyness of SQL syntax without the power of the relational model).
[+] [-] roenxi|4 years ago|reply
[+] [-] HelloNurse|4 years ago|reply
Apart from being subjectively displeased at SQL syntax, most problems with SQL are actually query language independent issues in the RDBMS: maddening proprietary extensions, implementation limits, nonportable details, library and system issues (e.g. character encoding and default configurations).
The best alternative query languages can do is making certain queries (advanced, rare ones) easier to express.
[+] [-] mamcx|4 years ago|reply
Working in Fox dispel many of the weird limitations of sql (and BTW, you can use SQL on Fox alike linq, is first-class).
The #1? You can build ALL using the relational model. After MS kill Fox my career can be summarized as: Trying N workarounds because I don't have Fox anymore.
Other langs (python, delphi, f#, rust) capture a little of the magic or make the workaround bearable, but none is as productive.
[+] [-] chana_masala|4 years ago|reply
[+] [-] simonh|4 years ago|reply
[+] [-] manbart|4 years ago|reply
[+] [-] vaughan|4 years ago|reply
SQL is somewhat nice because its fairly well standardized.
[+] [-] bena|4 years ago|reply
This is kind of like a sticking effect. It got in at the right time and it was good enough and there wasn't enough interest in developing something to replace it that it just became standard.
But if you want to get down to it, any entity data model implemented is kind of an attempt at a replacement.
[+] [-] dotancohen|4 years ago|reply
[+] [-] samatman|4 years ago|reply
But the idea of never writing SQL again remains compelling to a certain sort of mindset, one which I have been known to share when throwing myself against the proverbial wall.
SQL isn't a bad dialect with which to hand-roll queries against relational data. I'm quite sure there's room for improvement, especially when it comes to generating queries; I've dipped my toes in this and the compiler has to 'think in SQL' eventually, it's not composable in the way that it should be.
But "SQL is annoying and ORMs are terrible" is, from my recollection, the sentiment which gave us the meteoric (pun intended!) rise of Mongo, and "relational modeling isn't negotiable" is the iron fact of our profession which lead to its ignominious fall.
[+] [-] uvdn7|4 years ago|reply
[+] [-] datatrashfire|4 years ago|reply
[+] [-] bob1029|4 years ago|reply
http://curtclifton.net/papers/MoseleyMarks06a.pdf
The sacred artifact in this paper is Chapter 9: Functional Relational Programming.
Based upon inspiration in this paper, we have developed a hybrid FRP system where we map our live business state to a SQLite database (in memory) and then use queries defined by the business to determine logical outcomes or projections of state for presentation. Assuming you have all facts contained in appropriate tables, there is always some SQL query you could write to give the business what they want. An example:
> Give me a SQL rule that says the submit button is disabled if the email address or phone number are blank/null on their current order.
I hope the advantages of this are becoming clear - You can have non-developers (ideally domain experts with some SQL background) build most of your complex software for you. No code changes are required when SQL changes.The relational model in this context is powerful because it is something that most professionals can adopt and collaborate with over time. You don't have to be a level 40 code wizard to understand that a Customers table is very likely related to a ShoppingCarts table by way of some customer identity. If anyone starts to glaze over at your schema diagrams, just move everything into excel and hand the stakeholders some spreadsheets with example data.
[+] [-] wotbrew|4 years ago|reply
I have been hacking on a Clojure/Script library (https://github.com/wotbrew/relic) to experiment with exactly this kind of thing. / PLUG
For most problems I encounter I want to focus on the data and its relationships.
I cannot help feeling like programming languages and idioms have me deal with too much 'mechanism', the machine like structures and forms that are mostly incidental to the data domains I work in, and are entirely about performance, layout, access pattern etc - when a machine could do an ok job of making those decisions for me until optimisation becomes important.
[+] [-] freeqaz|4 years ago|reply
[+] [-] kingdomcome50|4 years ago|reply
[+] [-] Jordanpomeroy|4 years ago|reply
I’m sure there is a time and place for it (Power Apps is a billion dollar business), like time keeping frontend logic. Apps with significant logic and complexity will always need developers.
I’m not critical of your app as I have no real understanding of what it does and why the design choices were made. I do not think embedding logic in SQL would be an ideal way for any app I’ve worked on to work.
[+] [-] twofornone|4 years ago|reply
[+] [-] samatman|4 years ago|reply
This is really showing its age. I want my computer to think in terms of what data is, where it came from, any other metadata I received it with, and any metadata I've added, notably, but not primarily, the various "places" I've put it.
The fact that I can't retrieve the URL I downloaded anything from, years later, no mater how many times I've moved it, is just shameful. It's cheap information our tools could be preserving but aren't.
So if you ask me what I think of the relational model, I'll tell you: it's a good idea and we should try it.
[+] [-] ashvardanian|4 years ago|reply
I have been programming for ~15 years, the majority of my life. Web, mobile, HPC, CUDA, assembly for x86 and ARM, kernel modules, LLVM plugins, and databases… Not once in my career I found the concept of relational databases efficient or relevant. They are frustrating to use, exceptionally slow and generally provide SQLs, or other DSLs, which is the most archaic form of query representation I could think of - not binary and not general-purpose.
It reminds me of no-code development platforms. They may work (not very well) for some super simple tasks, but as soon as you want to do something at least remotely non-trivial, they put more barriers, then provide help. From personal experience, again, I have grown to hate frontend development so much, that have tried a dozen of website builders, before reverting to good old HTML/CSS (plus a bit of JS) every time I wanted to refresh my blog or companies website.
Plus, I wouldn’t immediately dismiss the concept of Graph Databases. If we want to be truly canonical, we wouldn’t create hundreds of columns in our tables, with just a few relations. The ideology is that every unique “type” (in any sense you prefer), should be in its own table, linked with the other “types” in other tables… Then theory ends and starts practice. Try implementing a fast JOIN in a relational database. Then increase the depth to 3, tracing the relations of relations of relations. Even in a non distributed case it is a horror. Both the nested SQL queries and the program that will be evaluating them. Graph DBs are designed to solve specifically that issue really well.
Another point: how “relational algebra” suddenly makes smth superior to anything else? It’s not a Grand Unified Theory of Physics, not rocket science and not even Graph Theory for that matter. The latter being the biggest and most studied branch of Theoretical Computer Science with brilliant theorems being published even today.
Not saying that todays popular Graph DBs are good (they are mostly disgusting), but I would still much rather think of my data as a graph, than a table with some foreign keys
[+] [-] SuperCuber|4 years ago|reply
[+] [-] uvdn7|4 years ago|reply
[+] [-] mftb|4 years ago|reply
Edit: I think I was actually thinking of WinFS which came out of Cairo later, around 2000.
[+] [-] seanhunter|4 years ago|reply
[+] [-] taeric|4 years ago|reply
Meant in jest; though I think the idea of a "one true way" to access files is a pipe dream. The hierarchical works better on my computer than whatever scheme we've cooked up for our phones.
[+] [-] roenxi|4 years ago|reply
A tree-based file-system is optimised for doing a search from the users perspective, finding a file takes log(files) steps and finding related files is trivially cheap. It is likely hard to outdo that with a relational model.
[+] [-] jasfi|4 years ago|reply
It doesn't map that well, the idea sounds intriguing, but in practice a filesystem seems like a better model for files.
[+] [-] dboreham|4 years ago|reply
How do you know it isn't?
[+] [-] tyingq|4 years ago|reply
[+] [-] tuatoru|4 years ago|reply
[+] [-] Mikhail_Edoshin|4 years ago|reply
[+] [-] unknown|4 years ago|reply
[deleted]
[+] [-] ngcc_hk|4 years ago|reply
Show me the code I am confused.
Show me the data, …
[+] [-] vaughan|4 years ago|reply
It constrains your data model for the purpose of representing your queries in relational calculus, which allows a corresponding relational algebra to operate on them to help optimize disk access. This comes at a cost. Although, if this is your primary goal then that's fine, which it has been for many.
Data these days is deeply nested or document-based, and encoding this in the relational model is incredibly unwieldy, with huge ugly join queries, and the planner starts making random guesses at 6 joins or so.
Everyone ends up with a rigid, and poorly normalized physical schema to suit the sql planner. Think about all the times you avoid M-M joins because your queries will explode in complexity.
Also, pretty much every app these days wants streaming updates to queries. The optimizer doesn't optimize for streaming updates and most streaming is done by polling. Incremental view maintenance is also very difficult to achieve as well as streaming SQL.
[+] [-] mshaler|4 years ago|reply
[+] [-] emadda|4 years ago|reply
https://table.dog
[+] [-] _xnmw|4 years ago|reply
[+] [-] nchi3|4 years ago|reply
[+] [-] fnord123|4 years ago|reply