top | item 18353519

(no title)

moxious | 7 years ago

Many of us who have worked in the graph database space have been tempted to use the graph abstractions on top of a relational database. It's a reasonable first step. Tables are nodes, join tables are relationships and so forth.

It's also however a bit of a dead end once you go beyond the basics. The costs of joins get worse the deeper you go, and "hundreds of milliseconds" is at least an order of magnitude slower than what Neo4j would do for you.

Once you take that major performance penalty, and then layer it into more complex graph algorithms or analytics, it gets really, really painful quickly. Granted, you might not notice this if you never needed to go further than 2-3 hops in a graph. But once you start working with graphs you're not going to want to stick to such basics.

More technical detail on the difference between a graph abstraction on top of another database, and a native graph database, can be found here:

https://neo4j.com/blog/note-native-graph-databases/

discuss

segmondy|7 years ago

Well, of course it would be expensive to use tables as an arc.

So say A points to B you have 3 tables right? Table 1 for A, Table(2) for B and a join table (3) showing that A points to B right? Why would you do that? What's stopping you from having one table that contains A and what A points to? So you only have 2 tables?

What if you have a node that can point to many items, a column can contain a list in postgres, so we can still have one Table containing your node data and a list of items they point to.

I'll concede that graph databases are easier to write query for, most people already struggle with basic SQL, let alone CTE and recursive CTE.

I'm yet to be convinced that a problem can't be reshaped and mapped on a traditional RDMS and yet remain performant.

moxious|7 years ago

> What's stopping you from having one table that contains A and what A points to? So you only have 2 tables?

So you can do that. Suppose you denormalize the graph into a single table. Either your denormalizing, and to connect A to many B's you have to duplicate A's data in the table, or you have the constraint that the A -> B link can only have a cardinality of 1.

This would be not a good choice for nodes with many relationships to other things, say for example "Person friended Person". It might work if the cardinality was somewhat capped, like say for example "Customer ordered Product" (a common denormalization).

For a real application, you're not going to have 1 of these. You're going to end up with 20+.

> I'm yet to be convinced that a problem can't be reshaped and mapped on a traditional RDMS and yet remain performant.

Any problem can be re-shaped to any database formalism. For that matter we can re-shape everything we're discussing for a straight K/V store like Redis. The expressive power of a database isn't at issue, because any database can store any dataset. Period.

More relevant questions are about performance and about conceptual fit for the problem. First on performance, have a look at the performance growth graph here:

https://neo4j.com/blog/oracle-rdbms-neo4j-fully-sync-data/

If you reason from the computer science of how these systems work, this result makes sense.

From the conceptual simplicity standpoint, that's kind of a matter of personal taste and application. Can you do it all with an RDBMS? Sure. But all of these different tech niches exist because sometimes you want more than one kind of tool for the wide variety of jobs you need to accomplish.

I'd argue that it's conceptually simpler to think of your graph as nodes and relationships, rather than to remember each time which node/rel set was denormalized into one table, which node label was split out into its own separate table, what that join table was, how the key naming differed between tables, etc. etc. etc. (Because once you have a non-trivial sized graph, you'll have a lot of these, and maybe you made different decisions at different spots).

The prize if you do remember all of that is that you get to write quite complex SQL to join together the datastructures for non-trivial traversals, because substantial graph use cases that you can answer on the basis of a single denormalized table are going to be rare.

In terms of conceptual simplicity, compare a recursive join SQL query to a cypher snippet like "MATCH (user:User {login:'bob'})-[:KNOWS*..5]->(foaf)". The equivalent SQL is...difficult.

It's just a "use the right tool for the job" situation at its core.