I have spent a lot of time figuring out how to deal with a large graph a couple of years ago. My conclusion - there will never be such a thing as a "graph database". There are many efforts in this area, someone here already mentioned SPARQL and RDF, you can google for "triple stores", etc. There are also large-scale graph processing tools on top of Hadoop such as Giraph or Graphx for Spark.
For the particular project we ended up using Redis and storing the graph as an adjacency list in a machine with 128GB of RAM.
The reason I don't think there ever will be a "graph database" is because there are so many different ways you can store a graph, so many things you might want to do with one. It's trivial to build a "graph database" in a few lines of any programming language - graph traversal is (hopefully) taught in any decent CS course.
Also - the latest versions of PostgreSQL have all the features to support graph storage. It's ironic how PostgreSQL is becoming a SQL database that is gradually taking over the "NoSQL" problem space.
In my point of view, the fact that you can add an expert index very easily to a graph database written in a modern language (say no C/C++) makes it even easier to customize an existing graph database to suit your direct need. In turn, storage and runtime can be tunned more easily. Making so easy to have the performance you need. But at the end of the day not dealing with algreba is the best.
Years ago PostgreSQL already support recursive query, and in Oracle you have CONNECT BY. I have only used the recursive with once and it was just a quick demo, but my understanding is update is extremely expensive.
We're using TitanDB. One of the main benefits for us is that AWS has provided backend integration with DynamoDB. This affords you practically infinite and painless scaling on a pay-as-you-go model. Love it.
Depends on what kind of data and graph you are going to store/use. Neo4j is quite popular, cypher isn't very hard to learn, and it has lots of examples. Might be a good choice for a beginner.
There are multiple systems out there, however I have my doubts. It is important that your data does not get corrupted, and that your transactions will not get lost. Furthermore, speedups are possible with certain indices. That is why I personally would want to see some more safety/speed analysis and comparisons between the different systems.
(Full disclosure: I'm the author, we are VC backed) https://github.com/amark/gun is an Open Source graph database with Firebase like realtime synchronization.
Everybody's focused on graph databases here but let's talk about Cray! One of the most forward-thinking computer technology companies ever to exist is starting to get out there again. If they got a few hundred million dollars from an outside investor, they could do friggin' incredible things. They already do incredible things but not out there in the way it so easily could be.
Cray is a brand name that has been passed around between half a dozen companies (including Sun and SGI) dotted by various kinds of product reboots and commercial failures. Cool stuff but supercomputing isn't the most financially sound business it seems. The current name holder is the company previously called Tera, originally famous for making an aggressively multithreaded HPC computer.
I am huge fan a graph-y stuff. I did several iteration over a graph database written -- in Python -- using files, bsddb and right now wiredtiger. I also use Gremlin for querying. Have a look at the code https://github.com/amirouche/ajgudb.
I've seen people using graph databases as a general-purpose backing store for webapps/microservices. What are people's opinions about this?
My feeling is that graph databases are not suitable/ready for — for lack of a better term — the kind of document-like entity relationship graphs we typically use in webapps. Typical data models don't represent data as vertices and edges, but as entities with relationships ("foreign keys" in RDBMS nomenclature) embedded in the entities themselves.
This coincidentally applies to the relational model, in its most pure, formal, normal form, but the web development community has long established conventions of ORMing their way around this. The thing is, you shouldn't need an ORM with a graph database.
It introduces false dichotomy "graph vs relational".
In fact, most (if not all) graph algorithms can be expressed using linear algebra (with specific addition and multiplication). And matrix multiplication is a select from two matrices, related with "where i=j" and aggregation over identical result coordinates.
The selection of multiplication and addition operations can account for different "data stored in links and nodes".
So there is no such dichotomy "graph vs relational".
Strictly speaking yeah. Practically speaking: Not really true.
Just because something can be done, doesn't mean it can be done easily or well. I've done a lot of work with relational databases, and I love them for a lot of data sets. But I also have done a lot of work with graph databases - and they make working with graph shaped data a pleasure. I could do a graph in SQL, it's even moderately straight-forward in postgres these days by using WITH RECURSIVE - but it's still not as simple as just loading orient or arango for those tasks.
It's the same reason I keep multiple knives in my kitchen. Sure I could do everything with an 8" chef's knife, but the paring knife and the boning knife just make some tasks easier.
Of course you can express anything on top of a relational model. But for graphs such a representation would have been awfully inefficient. For this reason, CADs never even tried to switch to a relational data storage once that fancy new relational databases appeared, most of the professional CADs are still using good old graph databases.
Anybody know dgraph.io?
it's a Scalable, Distributed, Low Latency, High Throughput Graph Database over terabytes of structured data.
DGraph supports facebook GraphQL as query language, and responds in JSON and the storage engine is facebook rocksdb a very fast database.
see more in https://github.com/dgraph-io/dgraph
One of the biggest challenges in databases is handling concurrency and sharding, wish this would have talked a bit more about how that changes between a graph database and a relational database.
[+] [-] gtrubetskoy|10 years ago|reply
For the particular project we ended up using Redis and storing the graph as an adjacency list in a machine with 128GB of RAM.
The reason I don't think there ever will be a "graph database" is because there are so many different ways you can store a graph, so many things you might want to do with one. It's trivial to build a "graph database" in a few lines of any programming language - graph traversal is (hopefully) taught in any decent CS course.
Also - the latest versions of PostgreSQL have all the features to support graph storage. It's ironic how PostgreSQL is becoming a SQL database that is gradually taking over the "NoSQL" problem space.
[+] [-] amirouche|10 years ago|reply
[+] [-] yeukhon|10 years ago|reply
[+] [-] valhalla|10 years ago|reply
http://barabasilab.neu.edu/networksciencebook/downlPDF.html
[+] [-] GFK_of_xmaspast|10 years ago|reply
(FWIW, I had previously read some Barabasi papers and had come away seriously unimpressed, see also https://news.ycombinator.com/item?id=9555547)
[+] [-] rfreytag|10 years ago|reply
[+] [-] valine|10 years ago|reply
[+] [-] rail2rail|10 years ago|reply
https://aws.amazon.com/blogs/aws/new-store-and-process-graph...
[+] [-] kinow|10 years ago|reply
https://en.wikipedia.org/wiki/Graph_database#List_of_graph_d...
[+] [-] kawera|10 years ago|reply
https://github.com/google/cayley
[+] [-] emehrkay|10 years ago|reply
[+] [-] SanderMak|10 years ago|reply
[+] [-] timClicks|10 years ago|reply
[+] [-] iod|10 years ago|reply
https://www.arangodb.com
¹ https://www.arangodb.com/2015/10/benchmark-postgresql-mongod...
[+] [-] whazor|10 years ago|reply
[+] [-] karussell|10 years ago|reply
[+] [-] espeed|10 years ago|reply
See previous discussion: https://news.ycombinator.com/item?id=11197880
[+] [-] jerven|10 years ago|reply
There are more but these are opensource and I know them. And money more commercial ones.
[+] [-] d0ne|10 years ago|reply
[+] [-] rusabd|10 years ago|reply
[+] [-] marknadal|10 years ago|reply
[+] [-] AdamN|10 years ago|reply
[+] [-] fulafel|10 years ago|reply
[+] [-] rrrrtttt|10 years ago|reply
[+] [-] amirouche|10 years ago|reply
Also, I made an hypergraphdb, atom-centered instead of hyperedge focused in Scheme https://github.com/amirouche/Culturia/blob/master/culturia/c....
Did you know that Gremlin, is only srfi-41 aka. stream API with a few graph centric helpers.
edit: it's srfi 41, http://srfi.schemers.org/srfi-41/srfi-41.html
[+] [-] SloopJon|10 years ago|reply
http://www.cray.com/blog/how-cray-graph-engine-manages-graph...
[+] [-] lobster_johnson|10 years ago|reply
My feeling is that graph databases are not suitable/ready for — for lack of a better term — the kind of document-like entity relationship graphs we typically use in webapps. Typical data models don't represent data as vertices and edges, but as entities with relationships ("foreign keys" in RDBMS nomenclature) embedded in the entities themselves.
This coincidentally applies to the relational model, in its most pure, formal, normal form, but the web development community has long established conventions of ORMing their way around this. The thing is, you shouldn't need an ORM with a graph database.
[+] [-] TimPrice|10 years ago|reply
2-Instead, do graph DB engines try to break through bottlenecks for big data and analytics scenarios?
[+] [-] thesz|10 years ago|reply
In fact, most (if not all) graph algorithms can be expressed using linear algebra (with specific addition and multiplication). And matrix multiplication is a select from two matrices, related with "where i=j" and aggregation over identical result coordinates.
The selection of multiplication and addition operations can account for different "data stored in links and nodes".
So there is no such dichotomy "graph vs relational".
[+] [-] sophacles|10 years ago|reply
Just because something can be done, doesn't mean it can be done easily or well. I've done a lot of work with relational databases, and I love them for a lot of data sets. But I also have done a lot of work with graph databases - and they make working with graph shaped data a pleasure. I could do a graph in SQL, it's even moderately straight-forward in postgres these days by using WITH RECURSIVE - but it's still not as simple as just loading orient or arango for those tasks.
It's the same reason I keep multiple knives in my kitchen. Sure I could do everything with an 8" chef's knife, but the paring knife and the boning knife just make some tasks easier.
[+] [-] grandalf|10 years ago|reply
[+] [-] sklogic|10 years ago|reply
[+] [-] cbsmith|10 years ago|reply
Your argument is effectively because Haskell can be implemented in C, there is a false dichotomy between the two languages.
[+] [-] jmartins|10 years ago|reply
[+] [-] Xyik|10 years ago|reply