top | item 18795498

Ask HN: What was your experience using a graph database?

161 points| tiuPapa | 7 years ago

I have an idea that I want to work on during the break. And I think this is something that would suit a graph db, (a service that would link users together depending on their choice profile). So what was your experience of working with one? Which graph db did you use?

Edit: What sort of problems are well-suited to graph databases? In other words, what are some scenarios where something like Postgres is not suitable anymore?

93 comments

order
[+] jjguy|7 years ago|reply
Graph databases are the NoSQL of this half decade. Move cautiously. Just because you conceptualize it in your mental model does not mean you need a graph database. Further, recognize most (all?) implementations are not yet as performant or scalable as traditional data storage solutions.

Design your data schema first, then design your queries and finally your data lifecycle pipeline. Run some estimates on the order of magnitude for inserts, query rates, query types and storage sizes - then compare those numbers to the real-world perf of the various graphdb solutions. In general, compared to more typical solutions, you have more expensive inserts, query costs and storage sizes in exchange for more expressive queries. There aren't many application where those cost tradeoffs make sense.

Source: Twice now (2012 and 2018) I've reviewed available graphdbs for storage of enterprise security data when doing the initial platform technology selection. Both times the team fell back onto more traditional approaches.

[+] dajohnson89|7 years ago|reply
I agree completely with this. move cautiously. I personally found the entire space very immature.

neo4j is the most mature solution I found (in the Java space). if you want to use something else go for it, but you may be surprised at the low quality.

op: I strongly recommend implementing most/all of your pipeline using graph & non-graph approaches. choose the graph approach iff you can demonstrate with hard evidence that it makes sense.

[+] usgroup|7 years ago|reply
+1. We take for granted the maturity of RDB systems, but it makes for a stark comparison to GraphDBs.

Calculating over or walking over graphs sucks because there is usually a better, less brute way for any particular query.

Unless you have a set of use cases that require the ability to query across near enough random and unindexable subsets of a graph (eg Facebook), you’ll probably be better off with a DB and a spot of flattening.

[+] aasasd|7 years ago|reply
Same here, I've gotten used to moving thousands of records a second even with MySQL's InnoDB on spinning metal. Then tried Neo4j and, I think, one other software—and that was the end of my experience with graph dbs.

If I were interested in them again now, I would try new and fancy solutions first, to see if there are nosql-level performance improvements in the graph db space.

[+] allochthon|7 years ago|reply
> In general, compared to more typical solutions, you have more expensive ... query costs

Not to detract from your general point, but curious whether you looked at Dgraph in your analysis. It's quite fast and was built for speed.

https://dgraph.io/

[+] bane|7 years ago|reply
> Just because you conceptualize it in your mental model does not mean you need a graph database.

Yes! When I was younger I worked on a problem once that needed to compute some very basic graph metrics. My seniors tried to do the work in an early graph database and it was a disaster. It turns out literally just reading in the lines from a file and counting things got the job done in a few seconds.

They refused to use the results until they were coming out of the graph database because "just in case we needed other metrics". We never needed the other metrics.

[+] Scarbutt|7 years ago|reply
Storage is very cheap these days.
[+] maxdemarzi|7 years ago|reply
Disclaimer: I’m an 8 year Neo4j user and 6 year employee.

Neo4j is a great database if you learn how to use it and are willing to get your hands dirty every once in a while (write Java). I keep a blog at maxdemarzi.com on the things you can do with it. See the dating site blog series it may be relevant to what you are doing.

We have thousands of videos, slideshares, blog posts trying to teach graphs. If you take the time to learn you will be successful. If you connect with us on Slack and ask you will be successful.

If you treat it like an rdbms you will fail. See https://m.youtube.com/watch?v=oALqiXDAYhc for a primer and see https://m.youtube.com/watch?v=cup2OyTfrBM for the crazy stuff you can do that most DBs can’t.

[+] xtracto|7 years ago|reply
Every time that Neo4J is mentioned here, the pricing issues are raised.

No exception today: I used Neo4J at a previous startup, but after using the "free" non-scaled version of it, we got into a hard bottleneck due to a lack of scalability. When we looked to scale Neo4J, we almost had a heart attack when seeing the price. Being this a 50 people "developing country" startup we could not afford to pay the very steep prices.

[+] espeed|7 years ago|reply
New graph DBs implemented with the GraphBLAS linear algebra model will be orders of magnitude more performant than previous gen DB models. RedisGraph 1.0 is the first public GraphBLAS database implementation. And things are about to get even faster with the GraphBLAS GPU implementations in the works.

See previous discussions on GraphBLAS https://hn.algolia.com/?query=GraphBLAS&sort=byPopularity&pr...

[+] michelpp|7 years ago|reply
Thanks for this link, very interesting talk from the graphblas author:

https://www.youtube.com/watch?v=xnez6tloNSQ

I've never seen any advantage to graphdbs over relational models until I saw this talk. Raising graph analysis to the level of linear algebra is brilliant.

[+] ww520|7 years ago|reply
The insight there is turning the adjacency list of a graph into matrix and suddenly the goodness of linear algebra can be utilized. That's ingenious!

Thanks for the link.

[+] remingtonc|7 years ago|reply
Actively using ArangoDB. It has good performance and features. The only thing lacking is something akin to “views” but you can always denormalize into another collection albeit managed by the application. Graphs are a very natural way of thinking about entities and relationships and in its simplicity my development sped up. I personally like to try and schema out in MySQL Workbench etc. but if you want to get started doing something you can basically just make a mind-map and that’s effectively your schema, very intuitive and quick. Great for proof of concept.

Oh and the ability to query for shortest path and similar graph computations, and the DB does all the heavy lifting, is super nice.

[+] robertkluin|7 years ago|reply
I have worked on a few projects where graph databases were used. I have not personally seen a case where I feel they add much value relative to their complexity and tradeoffs.

One of the projects was a business workflow application centered around validating business processes by collecting and reporting on process data—think manufacturing quality control. A graph database was used in an attempt to allow application users more control in defining their workflows and give them more expressive semantic reporting abilities. We tried several graph databases. In reality what happened was that the scheme became implicit and performance was truly awful. The choice of a graph db was a strategic decision; we wanted to enable a different user experience. We probably could have done this project in 20% of the time with a standard database and wound up with a better result.

I have also worked on a problem related to storing and retrieving graph data for image processing. The graph db was obscenely slow and inefficient despite the data models being actual graphs.

Both of the projects I worked on involved people who are experts in graph databases. The level of nuance and complexity was astounding. Even simple tasks like trying to visualize the data became monumentally complex.

My takeaway from both of these experiences was that unless you intend to ask questions about the relationships, a graph might not be a very good fit. Even in that case, other databases will likely perform just as well.

(edit: add last sentence)

[+] asark|7 years ago|reply
Holy crap, have we worked on some of the same projects? GraphDB inappropriately applied to a process definition/exploration application. In my case I'm pretty sure the correct solution taking into account all desired functionality and existing support was a desktop (sigh, maybe Electron I guess) app and good ol' SQLite, but nooooo, we did a web app with server-side storage in Neo4j. I tried to sell PostgreSQL as it was 100% for sure a better fit for the kind of queries we'd be running, but that didn't fly. They had a Neo4j "expert" to whom I sometimes had to explain how Neo4j worked. The highest-tier tech manager at the client with whom I interacted was learning about Neo4j from what was mostly a marketing book from the Neo4j folks, turns out.

They burned a shitload of money on those bad decisions, on that and other products they'd previously stuck on Neo4j for no good reason, which were also seeing poor and unpredictable performance and having a rough time with immature supporting tools for the DB. Whole thing's closely related to the "we have big data, it's in the single digit GB range, so big! We need big data tools!" error, I think.

> My takeaway from both of these experiences was that unless you intend to ask questions about the relationships, a graph might not be a very good fit. Even in that case, other databases will likely perform just as well.

Precisely the same conclusion I reached, at least in the case of Neo4j. If the main thing you need to do is answer questions about graphs, it might be an OK DB to use. If the main thing you need to do is extract data from graphs, then you sure as hell don't want it as your primary datastore. Maybe—maybe—some kind of supplement to a SQL DB or whatever depending on your exact needs, but it shouldn't be what you're actually storing most of the data in.

[+] mitchtbaum|7 years ago|reply
I've played around with RDF/N3 databases a bit, but mostly then from a document-oriented storage angle. I believe this has much of what you're looking for, ie getting away from table-based databases, and moving closer to optimal data architectures.

Also, if you're curious about this sort of database design, NASA shared some interesting work on XDF: TheExtensible Data Format Based on XML Concepts[0][1], which was part of their long-range Constellation Project[2] toolset for building, launching, and operating the Ares spacecraft. They detailed it in this NExIOM slideshow[3], which reading again after quite a while brings back some very good memories. Enjoy!

0: https://nssdc.gsfc.nasa.gov/nssdc_news/june01/xdf.html

1: https://github.com/sccn/xdf/wiki/Specifications

2: https://en.wikipedia.org/wiki/Constellation_program

3: https://step.nasa.gov/pde2009/slides/20090506145822/PDE2009-...

[+] dustingetz|7 years ago|reply
Datomic (which I already shilled in this thread) is basically immutable RDF with an explicit time dimension, plus an intuitive sql-ish relational/graph query library
[+] good-idea|7 years ago|reply
I have used Dgraph on a couple of projects and enjoyed it. It seems a little more natural to me to think about all of the relationships between my data. It also can be queried using a GraphQL(ish) request.

But, it hasn't been around for long and doesn't have any options for hosting, and the JS client library is pretty basic - so you need to do a lot to have something a little more abstracted like Mongoose.

I enjoyed learning something new and will use it again - but unless I need queries that traverse many relationships, I'll probably use Postgres.

[+] tylertreat|7 years ago|reply
My experience with graph DBs: schemas usually become implicit and ad hoc, minimal tooling support and even less mature tooling, difficult to explore and understand your data, performance is often poor, memory usage can be problematic because they usually pull the graph into RAM, and there are few people with real experience (comparatively to traditional DBs) which can create a negative feedback loop.

I have yet to run into a use case where a graph provided more value than a relational model. I'm sure they exist, but I haven't found them yet.

[+] adamfeldman|7 years ago|reply
What sort of problems are well-suited to graph databases? In other words, what are some scenarios where I will run into trouble using Postgres?

edit: I've previously looked into ArangoDB.com, Dgraph.io, JanusGraph.org, and Cayley.io (to run on top of CockroachDB). I understand all of these are scalable distributed systems, and Postgres is not (CitusData.com aside). Do the benefits of these other systems mainly come when you outgrow single-node Postgres (which has JSONB for "document" storage, PostGIS.net, Timescale.com, etc)?

edit 2: where can I find more technical, concrete examples than https://neo4j.com/use-cases?

[+] chrisseaton|7 years ago|reply
> What sort of problems are well-suited to graph databases?

If you have a graph, then a graph database, with built-in graph algorithms, will be able to run operations on your graph without pulling all the data out to a client. I'm not an expert in PostgreSQL but I don't think it has any graph algorithms?

[+] dogweather|7 years ago|reply
How about hierarchical data? E.g., a country's laws / statutes? Thousands of text files organized in a hierachy. I've resorted to relational denormalizing and hacks to get decent performance. So I'm wondering if a graph database would be a better fit.

E.g., I frequently need to query, "What is the list of ancestors from the object to the top of the tree?"

In a relational system, this needs to be stored in some kind of data structure, which is redundant. But theoretically in a graph database, it'd be a fast O(log n) query if I'm not mistaken.

[+] jkern|7 years ago|reply
My understanding is that graph databases are better suited to handling many-to-many relationships
[+] tiuPapa|7 years ago|reply
Good question, I should probably add this to my original question. Personally, I am trying to build a recommendation engine of sorts and I think Graph DB is suited for this but I am no expert.
[+] dustingetz|7 years ago|reply
I gave a talk about this at ClojureNYC in 2017, the second half of the talk is about modeling graphs in various popular databases (SQL, Neo4j, Mongo, Datomic) and the problems you encounter https://s3.amazonaws.com/www.dustingetz.com/Getz+2017+Datomi...

Datomic (an immutable database for doing functional programming in the database) is central to my startup http://www.hyperfiddle.net/ , I don't think Hyperfiddle is possible to build on other databases that exist today. The future lies in immutability, full stop.

Datomic : databases :: git : version control

[+] overdrivetg|7 years ago|reply
GrapheneDB (Neo4j) on Heroku here - relatively small scale project so far (1000's of users) but very easy to use, no problems, great support. If your problem space is a graph, using a graph will make your life easy once you get over the GraphQL learning curve.

We're on Rails and so use the Neo4j.rb gem which has been around for quite a while and also has a ton of work and support around it. The Ruby DSL for it makes it as easy as you would expect in Rails for most basic relationships and queries, and you can access more advanced features or drop into GraphQL as needed.

For our use case, a graph DB was definitely easier than trying to manage relationships and categories in a relational DB but it will definitely depend on your use case. Good luck!

[+] pklee|7 years ago|reply
Couple of quick points a. If you are primarily dealing with categorical data - strings as opposed to numbers, graphs are pretty good for storage, retrieval and visualization. Categorical - genes, diseases etc. and require a lot of graph algorithms eigenvalue, shortest path etc. Biggest difference in querying is - in SQL you say "what" you want, in SPARQL / Gremlin you say "how" you want it i.e. what relationships to take b. Graph as a representation format shines, but as a storage mechanism, have not found it to be optimal. Many go for graph as a layer on top of RDBMS c. RDF is better in terms of standardization instead of prop. Graph Database. This is because you can arbitrarily decide what should be a vertex vs. what should be a property. In things like Neo4J it gets fixed once you decide. Virtuoso comes pretty close since it implements RDF on a RDBMS (my limited understanding) d. It is good for representing knowledge / metadata (atleast RDF) but again I would stay away from representing data. e. Your choice of graph algorithms typically ends up being what comes prepackaged (say gremlin etc.), or you take it intermediary and use algorithms there (Networkx / igraph (igraph is awesome)) or writing your own (this is not trivial typically) f. Many pointed about the schema, I actually think this is the advantage of RDF. My typical workflow is to start with RDF, do my basic stuff on RDF until I have a good understanding of what are the queries and therefore optimal schema and then migrate to RDBMS as needed. Trying to do large scale on RDF on a laptop infra is not optimal

I would try to use some combination of RDBMS with runtime graph like igraph. YMMV.

[+] mindcrash|7 years ago|reply
Postgres not suitable? Postgres is probably the most powerful multimodel data store with the lowest TCO on the market today.

Postgres can be used for columnar data, as a graph database (using https://github.com/bitnine-oss/agensgraph), as a timeseries database (using https://github.com/timescale/timescaledb), and as a KV store (which is astoundingly simple to do using its builtin jsonb column type)

In fact at this point the only thing other than Postgres I would look at is FoundationDB due to the fact that (although it takes some time) you can model and run ANY kind of data store on top of it.

[+] starptech|7 years ago|reply
Stay away from Orientdb it is a super hyped multi-model database but it's unreliable and hard to maintain. I worked with it for 2 years incl. paid professional support.
[+] mazeminder|7 years ago|reply
Can you elaborate a bit on the problem(s) you encountered?
[+] donatj|7 years ago|reply
We used Neo4j in a prototype rebuild of our primary application for about a month before switching back to MySQL. In that time it crashed and lost all of our data several times. We found it untenable.

My faith in it was soured at that point. That said, this was probably 5ish years ago now, so I cannot speak to how much it's stabilized since then.

[+] jonahss|7 years ago|reply
RedisGraph! it’s a module added to redis. very simple and performant, they supply a docker image already running it and clients exist in most languages.
[+] emerged|7 years ago|reply
I wanted to use it, but it was not compatible with cluster redis on AWS. That seems ridiculous to me
[+] souenzzo|7 years ago|reply
I'm using datomic for 2 years and it's awesome. When I look to others graphdb, I see that they don't have a powerful query engine.
[+] dragonne|7 years ago|reply
I've endured Datomic for 4 years and I really wonder what the folks who enjoy it are doing, because for me it us utter misery.

- It's miserably slow (if you wish to contradict this statement, please provide numbers) - Consumes gobs of memory (export our data to JSON and it's orders of magnitude smaller) - Full text search will consume all your CPU cores for if you given it a short query (seriously, don't touch this feature, it is a basket full of footguns) - Resource leaks (the Cassandra backend used to leak full databases!)

I've been up to 3:00 a.m. dealing with bugs in Datomic. What use cases does it actually work for?

[+] snorremd|7 years ago|reply
The datalog like query language combined with immutability is quite nice. I'm no expert in Datomic yet, but it is pretty nice that the queries are simple Clojure data structures. I can use regular old Clojure code to build queries dynamically (if needed).
[+] slifin|7 years ago|reply
It's awesome seems like an understatement :D
[+] agentofoblivion|7 years ago|reply
Understand the difference between graph querying and graph processing. If you have a question like, “give me all the friends of friends of person X that share a college”, then that’s a query that a graphdb would be helpful for. Saying, “find clusters of nodes based on relationships of types x, y, and z” is a processing job. You might need to query the graphdb to get the graph data that’s then loaded into a graph processing engine.

So be clear on exactly what you’re trying to do.

[+] inersha|7 years ago|reply
I've been using Neo4j to store the bitcoin blockchain. Bitcoin transactions have a graph structure, and so by storing the entire blockchain in a graph database you can easily query for connections between different bitcoin addresses.

If you're interested, I've done some explanation of it here: http://learnmeabitcoin.com/neo4j/

My experience with Neo4j has been a good one. The database is currently around 1TB and runs continuously without a problem. It's fast enough to use it as a public blockchain explorer, whilst simultaneously keeping up with importing all the latest transactions and blocks on the network.

It took some time to get the hang of the Cypher query language to get it to do what I want, but the browser it comes with is handy for learning via trial and error. I found the people on the Neo4j slack channel to be incredibly helpful with my questions.

[+] thomaseng|7 years ago|reply
Consider using Arangodb, a multi-model db. Document store, and graph-db in one. Has a query language (aql) that is easy to understand. Joins and relations are easy to accomplish. Version 3.4 has many new features, like full text search and geojson.