Amazon Neptune – Fast, reliable graph database built for the cloud

[+] lmeyerov|8 years ago|reply

Awesome surprise to see the embargo lifted -- sounds like I can now say the Graphistry team will be doing a follow-up talk at Amazon Re:Invent tomorrow (Thursday) on Amazon Neptune + Graphistry. We've been incorporating this into visual investigation workflows for security, fraud, health records, etc. They've been doing cool bits on the managed graph layer, and were early to graph GPU tech (Blazegraph team members), and our side starts bringing that kind of thinking to visual GPU analytics & workflow automation tech.

If you're in town and into this stuff, ping me at leo [at] graphistry, and would love to catch up Th/F for coffee+drinks. Also here + email, of course!

[+] kendallgclark|8 years ago|reply

If you actually read the docs, it's not Janus-based, it's based on BlazeGraph, which Amazon reportedly acquihired last year.

[+] jnwatson|8 years ago|reply

Is that public information? I don't see any press releases about it.

[+] rdslw|8 years ago|reply

Yet another amazon service to lock you in.

And then after two years, when you're no longer startup with 100usd bill, but bigger company, you're completly tied to a jungle of amazon products, and your exit strategy is very very costly.

clever amazon, clever.

[+] derefr|8 years ago|reply

There is some truth to this, but in a larger sense (on an ecosystem level, rather than from the perspective of an individual company), I can only be happy when AWS enters a new space. It makes that component into table-stakes in the IaaS game, which means every other big player is about to step up with their own offering as well, and the third-party SaaS and open-source self-hosted offerings in the same space all are going to heat up as well.

Consider the evolution of container hosting services: first we had PaaSes like Heroku with proprietary container formats; then we got Docker, but Docker Swarm was nascent and there was no serious Docker Swarm IaaS-cloud offering. But then, very quickly, AWS built ECS; Google responded with Kubernetes; and then Kubernetes became the open standard, made everyone forget about Docker Swarm, and took over (and is even replacing ECS now.)

That's what happens when AWS enters a space. And it's great.

[+] chiefalchemist|8 years ago|reply

Yes and no.

If you factor in the cost of not taking "the easy way" you'd likely never get past the 100usd phase.

You're point is valid. I'm just suggesting the lens / context isn't as one-sided as you've presented.

Put another way, plenty of startups and VCs would love to have "getting out from under AWS" at the top of their good problems to have list.

[+] lolive|8 years ago|reply

Just an off-topic comment: i am the maintainer of a visual query builder for SPARQL queries. cf http://datao.net

This tool proposes to design query patterns from a graph data model, via drag n drops. The tool can then compile the patterns as SPARQL, run them on an endpoint and format the results as map/forms/tables/graphs/HTML (via templating)/...

Another service of Datao (http://search.datao.net) proposes a search-engine view of those queries so you can type the textual representation of an object in any public SPARQL endpoint, and the service will list the queries currently available in Datao that can be applied upon this object. You can then run these queries with a click, and get the HTML templating of the query results.

Feel free to have a look at the website, if you find any interest in this tool. ANy feedback is welcome.

PS: Sorry for the poor quality of the videos. I manage this project on my spare time :)

[+] randomor|8 years ago|reply

Only had experience with Cypher, really liked it. It will be interesting to see how Neo4j responds to this. Regardless of tech specs, the fully-managed Neptune vs a community version on AWS Marketplace seems to give Neptune unfair advantage.

[+] mcphage|8 years ago|reply

> seems to give Neptune unfair advantage

What do you mean by "unfair" here?

[+] Graphguy|8 years ago|reply

Is this JanusGraph under the covers? Guessing since Neptune is a nod to Janus.

[+] igravious|8 years ago|reply

http://janusgraph.org/

Support for various storage backends:

   - Apache Cassandra®
   - Apache HBase®
   - Google Cloud Bigtable
   - Oracle BerkeleyDB

I don't understand how a database doesn't have its own native store. What exactly does a graph database actually do if it doesn't manage the data fed to it? Same is true for CayleyGraph† https://github.com/cayleygraph/cayley and proabably others.

†Plays well with multiple backend stores:

   - KVs: Bolt, LevelDB
   - NoSQL: MongoDB
   - SQL: PostgreSQL, CockroachDB, MySQL
   - In-memory, ephemeral

[+] brianbreslin|8 years ago|reply

Can someone explain to me in lamens terms what a graph database is?

[+] mcphage|8 years ago|reply

It's a database that's designed to store relationships between objects instead of just facts. It has efficient methods of following long chains of associations. So think of how you store tree structures in a relational database—there are a lot of different ways of doing it, and they're all frustrating. Storing trees is something graph databases do naturally.

[+] chatmasta|8 years ago|reply

It seems a lot of Amazon services are managed instances of open source applications. For example, commenters are suggesting this may be based on Janus. Elastic load balancers, at least originally, were likely based on haproxy. Etc etc.

Has anyone ever considered the licensing implications of this? How is amazon able to convert an open source product into a proprietary one and then charge for access to it?

Of course you can argue they’re charging for the infrastructure management, not the software itself. But that argument quickly breaks down as Amazon introduces new software, under new names, with a proprietary management interface over an open source core. Try to find the source code; you can’t.

And if you accept the premise that they’re just charging for hosting, then it leads to the question of why an open source project doesn’t reap any benefits from that hosting, or at the very least, from the management interface on top of it.

It seems like a better solution would be something akin to AWS marketplace, where open source projects are available to be hosted, and the maintainers can see some revenue from them.

It seems like unfair rent seeking behavior that amazon is able to slap a management interface on open source software and then charge for it under the guise of “hosting.”

[+] eitland|8 years ago|reply

> How is amazon able to convert an open source product into a proprietary one and then charge for access to it?

Totally no problem with liberal licensed open source software.

This is also the intended behaviour of such licenses.

Also many of those big bad commercial companies contribute back big time to a number of projects. Why? I guess sometimes because devs want to and also because it makes sense business wise so they don’t have to maintain the code themselves.

[+] SEJeff|8 years ago|reply

Here is a great article on just this. It is commonly known as the "gpl loophole", which RMS is entirely fine with. If you want to prevent this, you license the software the with Affero GPL, which explicitly forbids this.

http://radar.oreilly.com/2007/07/the-gpl-and-software-as-a-s...

Amazon / Google / etc are not redistributing the software as it is running on their servers in their environment, therefor, there is nothing wrong with the existing licenses.

[+] supergreg|8 years ago|reply

Don't free and open source licenses apply only during redistribution of the software? Unless it is licensed with the Affero GPL, just connecting to a service does not require its source code to be available. That is assuming Amazon modifies the software. If they don't, then there's nothing to argue.

Are they making money with software they didn't build? Yes, but so are we.

[+] connorelsea|8 years ago|reply

Time is money. It takes time to manage servers/infra - services like this let people make the choice between spending their time or their money managing infra. The category of managed infra is huge and goes beyond Amazon

[+] abalone|8 years ago|reply

So I get that this offers simpler paradigm for graph data, but how should we interpret the "fast & scalable" claim? Is it...

a) Slower than RDBMS/NoSQL but still pretty respectable, so it's a good choice for things like offline analysis.

b) About the same at RDBMS/NoSQL, so you could use it to handle production traffic if you want.

c) Faster, so you should definitely prefer it in production, e.g. for fetching upvotes and comments on posts.

[+] Varcht|8 years ago|reply

Why "Neptune"? Having a hard time riddling that name out.

[+] alexbilbie|8 years ago|reply

Two other well known graph databases are "Janus" and "Titan" both of which are named after ancient gods

[+] joak|8 years ago|reply

Are they using X1 ? https://aws.amazon.com/ec2/instance-types/x1/

For efficient graph DBs it's better to have a lot of ram and cores ...

[+] lolive|8 years ago|reply

Or they choose a horizontally-scalable architecture, a la TitanDB.

Btw, anyone knows how such solutions handle cross machine traversals? Are they schema-based? So the DB knows how to manage data locality and efficient joins/traversals?

[+] nicklasss|8 years ago|reply

Super excited about this!!!!! BUT The preview link (https://pages.aws.com/NeptunePreview.html) is broken, can anyone at AWS team help us with that?

[+] beebs_aws|8 years ago|reply

https://pages.awscloud.com/NeptunePreview.html

[+] unknown|8 years ago|reply

[deleted]

[+] appwiz|8 years ago|reply

It’s fixed now: https://pages.awscloud.com/NeptunePreview.html

[+] lolive|8 years ago|reply

1 point by lolive 14 hours ago [-]

I really hope Amazon will propose a facility to retrieve the RDFS data model of an endpoint in a uniform way.

[+] hmm_really|8 years ago|reply

What inferencing does it offer to RDF?

How would I bolt on an inference engine to this if none is offered, i.e. to provide OWL:RL?

[+] unknown|8 years ago|reply

[deleted]

[+] arthursilva|8 years ago|reply

It could be a modified JanusGraph frontend backed by DynamoDB.

[+] alexchamberlain|8 years ago|reply

For wider context here, is this leading the pack or do other public clouds have competing products already?

[+] Dryken|8 years ago|reply

Sadly they use Gremlin that is so often said to have poor performances

[+] makmanalp|8 years ago|reply

AFAIK gremlin is just a query language - it shouldn't have much to do with performance.

[+] chamakits|8 years ago|reply

I believe Gremlin is just the query language. There is an original backend that implemented it, which might be what you are thinking has performance issues. But the query language intrinsically doesn’t have issues I don’t thinks.

[+] lolive|8 years ago|reply

Currently I work on a project with Neo4J and Cypher. And I miss some of Gremlin tricks to optimize some graph traversals (for example to stop some sub-traversal when a given limit of matches have been reached).

[+] danburbridge|8 years ago|reply

[deleted]

[+] bdcravens|8 years ago|reply

Interesting that it doesn’t support GraphQL, but rather Gremlin and SparQL. Surely that will impact adoption.

[+] exogen|8 years ago|reply

I love GraphQL and use it quite a bit, but the "graph" part of it is a bit of a misnomer given all the existing graph database and query technologies. It doesn't really offer anything in terms of interacting with RDF triples or making complex graph queries. It has no relational algebra semantics or ability to query relationships between arbitrary nodes, which is what folks using graph databases typically want.

(I didn't downvote you though, it's a common misconception.)

[+] chamakits|8 years ago|reply

I believed that GraphQL is more of an protocol for interacting with an api, and not for graph databases, which is what Gremlin and SparQL are used for.

[+] jahewson|8 years ago|reply

GraphQL is not a graph database query language. It's an alternative to REST.

[+] lolive|8 years ago|reply

The lack of support of OpenCypher is also a bit intriguing.

[+] obi1kenobi|8 years ago|reply

GraphQL can in fact be a graph query language -- my company has built and open-sourced a tool that makes that possible. Here's a blog post that describes how it works: https://blog.kensho.com/compiled-graphql-as-a-database-query...

If you want to try it out:

  pip install graphql-compiler

[+] lolive|8 years ago|reply

I second all the comments about GraphQL not being a graph query language. But we must agree that 90% of the quries we run on a graph database could be modelized with GraphQL. (retrieve nodes of a given type plus some of their properties). For the 10% other percent, Gremlin or SPARQL are then the way to go.

72 comments