Getting Started with Graph Databases

[+] jmiserez|10 years ago|reply

FYI: There are two "Graph Databases 101" posts on the front page now. This one and the older one here:

https://news.ycombinator.com/item?id=11257280 (4 hours ago, 15 comments)

[+] dang|10 years ago|reply

We've changed the title of this one back from "Graph databases 101".

Submitters: the HN guidelines ask you to "please use the original title unless it is misleading or linkbait". Note how that does not read "please change the title to make it more misleading and linkbait".

[+] zeeZ|10 years ago|reply

There's a third one posted 4 hours ago: https://news.ycombinator.com/item?id=11258416

[+] fibo|10 years ago|reply

It seems he did it to gain votes.

[+] dcw303|10 years ago|reply

As I had zero experience with graph databases, this was generally a good intro, but the article could do with some polishing to save newbies like me from the underlying suspicion that they're missing something obvious. I tried just reading the tutorial without watching the video, and suffered some cognitive dissonance.

At the end of the article, there's two diagrams that show the behaviour of jcvd.out() and jcvd.outE(). The little gremlins are pointing at two vertices and two edges respectively, but from the 15 lines of code earlier, they're the wrong connections, right? jcvd only has edges to kickboxer and bloodsport, but the diagrams show connections to kickboxer and timecop.

So I looked at the code again, and realized the timecop vertex was never created, which seems kinda odd if you're going to use it in the diagram.

I eventually watched the video and saw animations where the little gremlins go to all three vertices/edges, so it's probably just a badly timed screencap for the article. Not that that explains why timecop is not in the code example, but whatever.

[+] rustyrazorblade|10 years ago|reply

You're right, there's some inconsistencies in the code vs the slides. Must have missed that. I'll fix that going forward.

[+] fibo|10 years ago|reply

Ahah you put the same title as https://news.ycombinator.com/item?id=11257280

[+] mitsoz|10 years ago|reply

I was very interested in the subject, but hated this video.

Very superficial, started off with a complicated relational schema to criticize relational databases, but never ended up explaining how a graph database would simplify the problem. I thought that the graph database concepts + language was way more complex than SQL schema + language.

Very fast talking and moving of slides, is this supposed to sound or look smart? On top of that, 50% of the time the video was a close up to the presenter's face moving left and right in an awkward fashion.

[+] owen11|10 years ago|reply

Good feedback. I might need similar feedback for my upcoming talk. I am about to give a talk about Cayley (open source graph db written in Go) and I am working on my slides http://oren.github.io/adventure-graphs

Let me know what you think and also join us on IRC (#cayley on freenode) if you find it interesting.

[+] mullsork|10 years ago|reply

Kudos for providing both a video walkthrough AND an identical text version. Made me really happy!

[+] dperfect|10 years ago|reply

Maybe I still just don't "get it", but this explanation didn't really show me how a graph database is any better than an RDBMS, apart from a somewhat simpler interface (which in my opinion is still no better than many ORMs).

For good performance, it sounds like you still need to make good decisions about what to index, as well as putting hard limits on your data - even if not strictly enforced by the data model. And if those kinds of things affect performance, then surely changes to the schema (or whatever you'd call it here) will result in a need for migration/reoptimization. The trouble is, when that needs to happen, I personally would rather have tight control over when and how it happens (with a migration), rather that rely on a black box that supposedly makes everything simple. I'm assuming graph databases have ways to control that process, but that kind of proves my point - you don't get greater performance, simplicity, and flexibility for free, especially when you compare it to something as mature as the current RDBMS's. So what problem is it really solving?

Also, the comparison is a little unfair to RDBMS's - this makes it sound like you'd need separate join tables for every kind of person-media relationship, when you could certainly just use one join table with a column for various relationship types. And the complexity of TV shows with seasons and episodes? I'm pretty sure those distinctions would still need to be modeled in a thoughtful way with a graph database, but I could be wrong.

[+] jonpaine|10 years ago|reply

Index-free-adjacency.

There are myriad pros/cons between graph/relational/nosql, but to me, a "real" graph db will have index free adjacency, allowing it to do deep traversals (friend of a friend-of a friend-oaf-oaf....) in constant time. It finds it's value in traversal of deeply connected datasets.

Any article or comparison that doesn't at least try to explain index free adjacency isn't going to make a compelling case for a graphdb, let along a native graph db. One reason for that may be that many "graph" databases don't have index free adjacency, so have worst than expected deep traversal characteristics.

[+] lqdc13|10 years ago|reply

Is Titan going to survive even though datastax bought out the team? Their github repo hasn't been very active recently.

My issue with graph dbs is that as requirements change you usually have to add more granularity to the edges and nodes. Eventually the schema becomes much more complicated than a RDB.

[+] rail2rail|10 years ago|reply

FWIW AWS recently added DynamoDB integration support for it.

https://aws.amazon.com/blogs/aws/new-store-and-process-graph...

[+] sschueller|10 years ago|reply

Very cool. What are some of the issues to look out for switching from an old SQL model?

[+] woodman|10 years ago|reply

I don't have any experience with this particular product, but done work with a bunch of semantic web software (which is graph based). The most difficult part of migration is related to ontology, the edges. Feature creep is very easy and if you don't set hard limits you can easily find yourself graphing metadata about graph metadata :) You can do this with relational databases as well - recursive logging tables and the like, but it is easier to catch because of the exploding table count. Authorization isn't as easy either, so you'll want to give that some thought before you jump in.

[+] hidro|10 years ago|reply

[deleted]

[+] TerryADavis|10 years ago|reply

[deleted]

[+] marknadal|10 years ago|reply

This article overly inflates the complexity of graphs and databases in order to sound fancy. I've written a response that is very direct and shows how simple a graph database can be: https://github.com/amark/gun/wiki/Graph-Databases-101 .

[+] rustyrazorblade|10 years ago|reply

Hi. Presenter here. Honestly the goal wasn't to sound fancy. If you're going to work in the GraphDB world, you're going to come across this terminology.

If your concern around my intro is the complexity described of the relational world, well, that's kind of the point. Anyone with at least a few years experience in the RDBMS world has probably come across a project that's spiraled completely out of control with a outrageous number of many to many relationships that are almost impossible to work with. The role of the DBA just to manage your queries and tables is a reflection of that difficulty.

GUN looks like a cool project. Good intro, & thanks for the feedback.

[+] zero_iq|10 years ago|reply

Wow, you have some colossal nerve accusing others of trying to sound fancy.

[+] losvedir|10 years ago|reply

Hmm, maybe I'm missing something, but how do you deal with "edges" in your example? That was the cool part of the original article for me: the concept of specifying relationships between objects via edges between the vertices.

[+] Finster|10 years ago|reply

Personally, I don't need to be shown how simple a graph database can be. I need to know how it will interact with real world data.

32 comments