top | item 21045651

(no title)

AndrewBowman | 6 years ago

In Neo4j at least the node ids are offsets into the node and relationship stores, so you are literally pointer hopping through the store files from node structure to relationship structure to node structure. No need for a hash table or b-tree index (excepting finding your starting nodes in the graph before beginning traversal.)

discuss

order

kyllo|6 years ago

Wouldn't that make deletes super expensive, similar to deleting nodes from a doubly-linked list on disk in terms of complexity? How could it delete a node with millions of relationships if it needs to read all those blocks from disk to traverse all the pointers?

And doesn't it cause more cache misses when the on-disk pointers refer to nodes that are spread out across different file blocks?

And why does Neo4j have indexes if it claims to have no need for them? https://neo4j.com/docs/cypher-manual/current/schema/index/

I don't know how Neo4j is implemented, but I'm skeptical that it's purely index-free adjacency, I suspect there is some hybrid data structure backing it.

AndrewBowman|6 years ago

Deletes do have an extra cost as relationships do need to be deleted first, and there are some batching approaches for handling this case. For graph databases there's not much choice on this unless you want to deal with dangling relationships (and resulting inconsistencies).

Note that deleting of nodes does not have to create new relationships between the adjacent nodes, so not quite like deleting nodes from the middle of a doubly-linked list.

A large pagecache is recommended for optimal speed, and SSDs are also recommended. Hardware continues to become cheaper.

Relational databases and Neo4j use indexes differently, which I think is part of your confusion here. We both use indexes for looking up nodes, true, but Neo4j only uses this for finding certain starting (or end) nodes in the graph. The important (and more complicated and costly) part of a query isn't finding your starting nodes...it's expanding and traversing from these nodes through your graph.

Neo4j uses index-free adjacency for traversing the graph. Relational dbs need to use table joins. One of these is only dependent on the relationships present on the nodes traversed (or rather only the relationships you're interested in, if you've specified restrictions on the relationship type and/or direction). Table joins are dependent on the size of the tables joined (then of course you must consider how many joins you must perform...and how to do these joins if there's nothing restricting which tables to join in the course of traversal).

Again, index-free adjacency does not mean that we must adhere to this in the most literal sense. Ideological purity is not the point. Graph traversals are the most complex part of a graph query, and this is where index-free adjacency is used to the advantage of native graph dbs.

And just to note, we certainly can join nodes based on property values, just like a relational database, and yes we can even use an index to speed that up, in the same manner as relational dbs. In fact you may indeed need to do this in order to create the relationships that you'll use later in your queries. Graph dbs are optimized such that if you do need to use joins, you'll perform them early, and once, so that you can take advantage of index-free adjacency during traversal in your read queries. Traversal speed and efficiency is the point of index-free adjacency.