Ask HN: What was your experience using a graph database?
161 points| tiuPapa | 7 years ago
Edit: What sort of problems are well-suited to graph databases? In other words, what are some scenarios where something like Postgres is not suitable anymore?
161 points| tiuPapa | 7 years ago
Edit: What sort of problems are well-suited to graph databases? In other words, what are some scenarios where something like Postgres is not suitable anymore?
[+] [-] jjguy|7 years ago|reply
Design your data schema first, then design your queries and finally your data lifecycle pipeline. Run some estimates on the order of magnitude for inserts, query rates, query types and storage sizes - then compare those numbers to the real-world perf of the various graphdb solutions. In general, compared to more typical solutions, you have more expensive inserts, query costs and storage sizes in exchange for more expressive queries. There aren't many application where those cost tradeoffs make sense.
Source: Twice now (2012 and 2018) I've reviewed available graphdbs for storage of enterprise security data when doing the initial platform technology selection. Both times the team fell back onto more traditional approaches.
[+] [-] dajohnson89|7 years ago|reply
neo4j is the most mature solution I found (in the Java space). if you want to use something else go for it, but you may be surprised at the low quality.
op: I strongly recommend implementing most/all of your pipeline using graph & non-graph approaches. choose the graph approach iff you can demonstrate with hard evidence that it makes sense.
[+] [-] usgroup|7 years ago|reply
Calculating over or walking over graphs sucks because there is usually a better, less brute way for any particular query.
Unless you have a set of use cases that require the ability to query across near enough random and unindexable subsets of a graph (eg Facebook), you’ll probably be better off with a DB and a spot of flattening.
[+] [-] aasasd|7 years ago|reply
If I were interested in them again now, I would try new and fancy solutions first, to see if there are nosql-level performance improvements in the graph db space.
[+] [-] allochthon|7 years ago|reply
Not to detract from your general point, but curious whether you looked at Dgraph in your analysis. It's quite fast and was built for speed.
https://dgraph.io/
[+] [-] bane|7 years ago|reply
Yes! When I was younger I worked on a problem once that needed to compute some very basic graph metrics. My seniors tried to do the work in an early graph database and it was a disaster. It turns out literally just reading in the lines from a file and counting things got the job done in a few seconds.
They refused to use the results until they were coming out of the graph database because "just in case we needed other metrics". We never needed the other metrics.
[+] [-] Scarbutt|7 years ago|reply
[+] [-] maxdemarzi|7 years ago|reply
Neo4j is a great database if you learn how to use it and are willing to get your hands dirty every once in a while (write Java). I keep a blog at maxdemarzi.com on the things you can do with it. See the dating site blog series it may be relevant to what you are doing.
We have thousands of videos, slideshares, blog posts trying to teach graphs. If you take the time to learn you will be successful. If you connect with us on Slack and ask you will be successful.
If you treat it like an rdbms you will fail. See https://m.youtube.com/watch?v=oALqiXDAYhc for a primer and see https://m.youtube.com/watch?v=cup2OyTfrBM for the crazy stuff you can do that most DBs can’t.
[+] [-] xtracto|7 years ago|reply
No exception today: I used Neo4J at a previous startup, but after using the "free" non-scaled version of it, we got into a hard bottleneck due to a lack of scalability. When we looked to scale Neo4J, we almost had a heart attack when seeing the price. Being this a 50 people "developing country" startup we could not afford to pay the very steep prices.
[+] [-] espeed|7 years ago|reply
See previous discussions on GraphBLAS https://hn.algolia.com/?query=GraphBLAS&sort=byPopularity&pr...
[+] [-] michelpp|7 years ago|reply
https://www.youtube.com/watch?v=xnez6tloNSQ
I've never seen any advantage to graphdbs over relational models until I saw this talk. Raising graph analysis to the level of linear algebra is brilliant.
[+] [-] ww520|7 years ago|reply
Thanks for the link.
[+] [-] ignoramous|7 years ago|reply
[+] [-] remingtonc|7 years ago|reply
Oh and the ability to query for shortest path and similar graph computations, and the DB does all the heavy lifting, is super nice.
[+] [-] Yliaho|7 years ago|reply
https://docs.arangodb.com/3.4/AQL/Views/
[+] [-] robertkluin|7 years ago|reply
One of the projects was a business workflow application centered around validating business processes by collecting and reporting on process data—think manufacturing quality control. A graph database was used in an attempt to allow application users more control in defining their workflows and give them more expressive semantic reporting abilities. We tried several graph databases. In reality what happened was that the scheme became implicit and performance was truly awful. The choice of a graph db was a strategic decision; we wanted to enable a different user experience. We probably could have done this project in 20% of the time with a standard database and wound up with a better result.
I have also worked on a problem related to storing and retrieving graph data for image processing. The graph db was obscenely slow and inefficient despite the data models being actual graphs.
Both of the projects I worked on involved people who are experts in graph databases. The level of nuance and complexity was astounding. Even simple tasks like trying to visualize the data became monumentally complex.
My takeaway from both of these experiences was that unless you intend to ask questions about the relationships, a graph might not be a very good fit. Even in that case, other databases will likely perform just as well.
(edit: add last sentence)
[+] [-] asark|7 years ago|reply
They burned a shitload of money on those bad decisions, on that and other products they'd previously stuck on Neo4j for no good reason, which were also seeing poor and unpredictable performance and having a rough time with immature supporting tools for the DB. Whole thing's closely related to the "we have big data, it's in the single digit GB range, so big! We need big data tools!" error, I think.
> My takeaway from both of these experiences was that unless you intend to ask questions about the relationships, a graph might not be a very good fit. Even in that case, other databases will likely perform just as well.
Precisely the same conclusion I reached, at least in the case of Neo4j. If the main thing you need to do is answer questions about graphs, it might be an OK DB to use. If the main thing you need to do is extract data from graphs, then you sure as hell don't want it as your primary datastore. Maybe—maybe—some kind of supplement to a SQL DB or whatever depending on your exact needs, but it shouldn't be what you're actually storing most of the data in.
[+] [-] mitchtbaum|7 years ago|reply
Also, if you're curious about this sort of database design, NASA shared some interesting work on XDF: TheExtensible Data Format Based on XML Concepts[0][1], which was part of their long-range Constellation Project[2] toolset for building, launching, and operating the Ares spacecraft. They detailed it in this NExIOM slideshow[3], which reading again after quite a while brings back some very good memories. Enjoy!
0: https://nssdc.gsfc.nasa.gov/nssdc_news/june01/xdf.html
1: https://github.com/sccn/xdf/wiki/Specifications
2: https://en.wikipedia.org/wiki/Constellation_program
3: https://step.nasa.gov/pde2009/slides/20090506145822/PDE2009-...
[+] [-] dustingetz|7 years ago|reply
[+] [-] good-idea|7 years ago|reply
But, it hasn't been around for long and doesn't have any options for hosting, and the JS client library is pretty basic - so you need to do a lot to have something a little more abstracted like Mongoose.
I enjoyed learning something new and will use it again - but unless I need queries that traverse many relationships, I'll probably use Postgres.
[+] [-] tylertreat|7 years ago|reply
I have yet to run into a use case where a graph provided more value than a relational model. I'm sure they exist, but I haven't found them yet.
[+] [-] adamfeldman|7 years ago|reply
edit: I've previously looked into ArangoDB.com, Dgraph.io, JanusGraph.org, and Cayley.io (to run on top of CockroachDB). I understand all of these are scalable distributed systems, and Postgres is not (CitusData.com aside). Do the benefits of these other systems mainly come when you outgrow single-node Postgres (which has JSONB for "document" storage, PostGIS.net, Timescale.com, etc)?
edit 2: where can I find more technical, concrete examples than https://neo4j.com/use-cases?
[+] [-] chrisseaton|7 years ago|reply
If you have a graph, then a graph database, with built-in graph algorithms, will be able to run operations on your graph without pulling all the data out to a client. I'm not an expert in PostgreSQL but I don't think it has any graph algorithms?
[+] [-] dogweather|7 years ago|reply
E.g., I frequently need to query, "What is the list of ancestors from the object to the top of the tree?"
In a relational system, this needs to be stored in some kind of data structure, which is redundant. But theoretically in a graph database, it'd be a fast O(log n) query if I'm not mistaken.
[+] [-] jkern|7 years ago|reply
[+] [-] tiuPapa|7 years ago|reply
[+] [-] dustingetz|7 years ago|reply
Datomic (an immutable database for doing functional programming in the database) is central to my startup http://www.hyperfiddle.net/ , I don't think Hyperfiddle is possible to build on other databases that exist today. The future lies in immutability, full stop.
Datomic : databases :: git : version control
[+] [-] overdrivetg|7 years ago|reply
We're on Rails and so use the Neo4j.rb gem which has been around for quite a while and also has a ton of work and support around it. The Ruby DSL for it makes it as easy as you would expect in Rails for most basic relationships and queries, and you can access more advanced features or drop into GraphQL as needed.
For our use case, a graph DB was definitely easier than trying to manage relationships and categories in a relational DB but it will definitely depend on your use case. Good luck!
[+] [-] pklee|7 years ago|reply
I would try to use some combination of RDBMS with runtime graph like igraph. YMMV.
[+] [-] mindcrash|7 years ago|reply
Postgres can be used for columnar data, as a graph database (using https://github.com/bitnine-oss/agensgraph), as a timeseries database (using https://github.com/timescale/timescaledb), and as a KV store (which is astoundingly simple to do using its builtin jsonb column type)
In fact at this point the only thing other than Postgres I would look at is FoundationDB due to the fact that (although it takes some time) you can model and run ANY kind of data store on top of it.
[+] [-] starptech|7 years ago|reply
[+] [-] mazeminder|7 years ago|reply
[+] [-] donatj|7 years ago|reply
My faith in it was soured at that point. That said, this was probably 5ish years ago now, so I cannot speak to how much it's stabilized since then.
[+] [-] unknown|7 years ago|reply
[deleted]
[+] [-] jonahss|7 years ago|reply
[+] [-] emerged|7 years ago|reply
[+] [-] souenzzo|7 years ago|reply
[+] [-] dragonne|7 years ago|reply
- It's miserably slow (if you wish to contradict this statement, please provide numbers) - Consumes gobs of memory (export our data to JSON and it's orders of magnitude smaller) - Full text search will consume all your CPU cores for if you given it a short query (seriously, don't touch this feature, it is a basket full of footguns) - Resource leaks (the Cassandra backend used to leak full databases!)
I've been up to 3:00 a.m. dealing with bugs in Datomic. What use cases does it actually work for?
[+] [-] snorremd|7 years ago|reply
[+] [-] slifin|7 years ago|reply
[+] [-] agentofoblivion|7 years ago|reply
So be clear on exactly what you’re trying to do.
[+] [-] inersha|7 years ago|reply
If you're interested, I've done some explanation of it here: http://learnmeabitcoin.com/neo4j/
My experience with Neo4j has been a good one. The database is currently around 1TB and runs continuously without a problem. It's fast enough to use it as a public blockchain explorer, whilst simultaneously keeping up with importing all the latest transactions and blocks on the network.
It took some time to get the hang of the Cypher query language to get it to do what I want, but the browser it comes with is handy for learning via trial and error. I found the people on the Neo4j slack channel to be incredibly helpful with my questions.
[+] [-] thomaseng|7 years ago|reply