top | item 27851350

(no title)

I wouldn't say relational databases are poorly designed, any more than saying a hammer is poorly designed because it makes for a bad screwdriver. A hammer is still excellent at working with nails, it's hard to find a better tool to work with that. This is just about using the right tool for the job.

Back when data was simpler and not as big, relational databases were perfect, and there have been years of engineering and bug fixes that have gone into them. They are excellent at what they do, and they continue to improve.

But as technology has improved, as our disks and memory have gotten bigger, as the data we collect and want to query over has gotten bigger, and as our queries have gotten more complex, we've been running against the limitations of log(n) joins and relational database technology for some use cases. Now, not every problem is a nail. Some are screws. Some are more exotic.

That's been the reason for nosql databases in the first place, to try to address the shortcomings that arise as data gets bigger, more complex, and as queries and operations become more complex over large data.

log(n) joins are fine...until data explodes, and you're no longer doing just a handful of joins per query, but a very large number of them, maybe even unbounded, and maybe the rules for what data to traverse has soft or even no restrictions. When your data is graphy, when the questions you want to answer require traversals of this scale and nature, and when you want to make sure your traversal costs are proportional only to the graph you want to traverse (and not proportional to the total data in the database), then graph databases provide a very good tool for modeling and efficiently querying over that data.

Graph databases are relatively young, compared to relational databases. Yet their usage has been proven, especially as more graphy problems and data have grown more common.

Relational databases are still useful, and still improving, and graph databases will also continue to grow and improve side by side with them.

We even have a GQL initiative, on the language side, aimed at becoming an ISO standard that will hold an equivalent position as SQL, but for graph querying. That should speak to the value and importance of the paradigm.

discuss

goto11|4 years ago

The fundamental premise of the relational model is the physical/logical distinction. The relational model deliberately does not make any requirements or assumptions about how data is physically stored or structured.

The difference between relational and graph (and other NoSQL database systems) is not about particular sizes and shapes of data, it is about level of abstraction. For example assuming joins are "log(n)" makes certain assumptions about how relations and indexes are implemented which is only true for some naive implementation (like Access or MySQL).

Just as an example, materialized views is a physical-level optimization where an arbitrary complex query result is stored and kept updated, which means data can be retrieved as fast as physically possible. Of course this has a cost at insert-time, since materialized views also have to be updated - but this is a performance trade-off just like the structure of a graph database is a performance trade-off.

NoSQL databases has a tight coupling between the physical and logical structure, which makes them easier to optimize towards particular usage patterns but harder to adapt to changing requirements over time. The relational model was specifically designed for large databases used by multiple applications and changing over time.