top | item 18353813

(no title)

moxious | 7 years ago

> What's stopping you from having one table that contains A and what A points to? So you only have 2 tables?

So you can do that. Suppose you denormalize the graph into a single table. Either your denormalizing, and to connect A to many B's you have to duplicate A's data in the table, or you have the constraint that the A -> B link can only have a cardinality of 1.

This would be not a good choice for nodes with many relationships to other things, say for example "Person friended Person". It might work if the cardinality was somewhat capped, like say for example "Customer ordered Product" (a common denormalization).

For a real application, you're not going to have 1 of these. You're going to end up with 20+.

> I'm yet to be convinced that a problem can't be reshaped and mapped on a traditional RDMS and yet remain performant.

Any problem can be re-shaped to any database formalism. For that matter we can re-shape everything we're discussing for a straight K/V store like Redis. The expressive power of a database isn't at issue, because any database can store any dataset. Period.

More relevant questions are about performance and about conceptual fit for the problem. First on performance, have a look at the performance growth graph here:

https://neo4j.com/blog/oracle-rdbms-neo4j-fully-sync-data/

If you reason from the computer science of how these systems work, this result makes sense.

From the conceptual simplicity standpoint, that's kind of a matter of personal taste and application. Can you do it all with an RDBMS? Sure. But all of these different tech niches exist because sometimes you want more than one kind of tool for the wide variety of jobs you need to accomplish.

I'd argue that it's conceptually simpler to think of your graph as nodes and relationships, rather than to remember each time which node/rel set was denormalized into one table, which node label was split out into its own separate table, what that join table was, how the key naming differed between tables, etc. etc. etc. (Because once you have a non-trivial sized graph, you'll have a lot of these, and maybe you made different decisions at different spots).

The prize if you do remember all of that is that you get to write quite complex SQL to join together the datastructures for non-trivial traversals, because substantial graph use cases that you can answer on the basis of a single denormalized table are going to be rare.

In terms of conceptual simplicity, compare a recursive join SQL query to a cypher snippet like "MATCH (user:User {login:'bob'})-[:KNOWS*..5]->(foaf)". The equivalent SQL is...difficult.

It's just a "use the right tool for the job" situation at its core.

discuss

No comments yet.