New Query Language for Graph Databases to Become International Standard

[+] tannhaeuser|6 years ago|reply

There is an ISO-standardized graph query language: Prolog and its decidable fragment Datalog, both widely used (relatively) for decades. Will the new language be based on it?

Another question is whether the model having driven ISO standardization in the past (software vendors working together to create a large, visible, and diverse market) is still relevant in post-standard cloud times. I sure hope it is, but we haven't seen public demand for standards (with the exception of web standards) for well over a decade now.

[+] carapace|6 years ago|reply

Kind of a tangent, but there are some folks working on a "model-driven graph database" written in Prolog, TerminusDB:

https://github.com/terminusdb

https://medium.com/terminusdb

See also: Categorical Query Language (CQL) https://www.categoricaldata.net/

Not Prolog, but it's a mathematical treatment of DBs with Category Theory.

A paper by these folks was mentioned in a sib comment ( https://news.ycombinator.com/item?id=21005452 ) "Algebraic Property Graphs" Joshua Shinavier, Ryan Wisnesky (Submitted on 11 Sep 2019) Last week!

> In this paper, we use algebraic data types to define a formal basis for the property graph data models supported by popular open source and commercial graph databases. Developed as a kind of inter-lingua for enterprise data integration, algebraic property graphs encode the binary edges and key-value pairs typical of property graphs, and also provide a well-defined notion of schema and support straightforward mappings to and from non-graph datasets, including relational, streaming, and microservice data commonly encountered in enterprise environments. We propose algebraic property graphs as a simple but mathematically rigorous bridge between graph and non-graph data models, broadening the scope of graph computing by removing obstacles to the construction of virtual graphs.

[+] jrumbut|6 years ago|reply

I think the cloud times renews the need for standards. Ten years ago devs in the linux/BSD world had the vast majority of the code that would make up our stack, from OS kernel to the front end (with perhaps some proprietary driver or external API call here or there).

The code was the standard, which traded easy readability for complete accuracy and transparency.

[+] dreamcompiler|6 years ago|reply

Very true. Any graph query language that's not clearly derived from Prolog is unworthy of the title. Prolog is a lousy way to store data, but it's by far the most elegant way to query a database, and the only sensible way to query a graph.

[+] b3tt3rw0rs3|6 years ago|reply

Standards happen when the time is right. Property graphs have been in the making and matured for the last 10+ years, driven by Neo4j, other vendors, and the community and are going to stay. It's a sign of success that the SQL standards committee recognized this development by starting the GQL project.

[+] unknown|6 years ago|reply

[deleted]

[+] alfiedotwtf|6 years ago|reply

Ha. When I was reading Prolog books this year, I couldn’t help but think how cool would it be to hook it up with a Neo4j backend. With your comment, I guess I wasn’t the only one thinking this

[+] sickcodebruh|6 years ago|reply

I spent many years working on a Neo4j Ruby gem and really grew to love Cypher, their query language. I always found it phenomenally expressive, one of those things that seems simple (almost trivial) at first but was extremely flexible and powerful when you needed more from it. It's highly readable, easy to teach, and intuitive in a way that I never found SQL to be.

It's been years since I've worked with the product and while I don't miss Neo4j, I do miss the query language. It's a little unclear to me how GQL will incorporate Cypher but I hope the initiative is successful if for no other reason than a selfish one: I'd love Cypher to be around if I ever wind up using a GraphDB again.

[+] vosper|6 years ago|reply

> It's been years since I've worked with the product and while I don't miss Neo4j, I do miss the query language

Would you mind expanding on what you don't miss about Neo4j?

[+] simplify|6 years ago|reply

I've built an app using Neo4j and have the same impression. Cypher is such a good query language. I wish I could use it over SQL in pretty much every situation.

[+] the_duke|6 years ago|reply

So, is there a draft spec yet? I can't find anything.

Also, the name is of course justified, but it will be a mess to search for due to (Facebook) GraphQL.

[+] sbarzowski|6 years ago|reply

Seriously, people should not choose names which are already taken in the same general area of technology. Even if it "makes sense". The whole point of naming things is to refer to things more or less unambiguously.

[+] dragonwriter|6 years ago|reply

> Also, the name is of course justified, but it will be a mess to search for due to (Facebook) GraphQL.

Google's GQL [0], which is older than either, doesn't help the searchability, either.

[0] https://cloud.google.com/datastore/docs/reference/gql_refere...

[+] b3tt3rw0rs3|6 years ago|reply

The GQL project just started. Since it is going to be an ISO standard, the specification is only available for members while under development but may be purchased from ISO once final (Same as for SQL, though you can find copies on the internet).

GQL will be a declarative language in the spirit of existing property graph query languages like Cypher, so that gives you an idea. I'm sure as the project proceeds, various artefacts (software or otherwise) will become freely available.

If you want to dig deeper, gqlstandards.org links to some documents that are copyright Neo4j and have been submitted to the standards process.

https://drive.google.com/drive/folders/16CUhVI1PQ4hBlhD80_Ys...

[+] np_tedious|6 years ago|reply

Also google has a query language for some of its Cloud data products with that name https://cloud.google.com/datastore/docs/reference/gql_refere...

[+] baddox|6 years ago|reply

Or even a single code sample? It sounds like they've decided simply that "the international standard will be called 'GQL'" but they have no idea what the language will be (or they're keeping it a secret, which would be even more troubling).

[+] olefoo|6 years ago|reply

https://s3.amazonaws.com/artifacts.opencypher.org/M14/railro...

See http://www.opencypher.org/resources for more.

[+] WhitneyLand|6 years ago|reply

Can we have a search uniqueness tag, like 'GQL19'?

We have search terms, meta data keywords, urls, but no short optimized for humans search uniqueness tag?

Seems like there should be some standard or convention at least, given it could be opt-in. It would help any site or content that referenced the tag and anything that didn't would still work as well as usual.

[+] irrational|6 years ago|reply

Is GQL already used elsewhere? As long as nobody calls it graphql we should be fine, right? Nobody says structured ql or even structured query language, its just sql.

[+] blumomo|6 years ago|reply

That's quiet an unfortunate name clash with the existing GraphQL language in a similar domain.

[+] gizmo385|6 years ago|reply

I think GQL from Neo4J has been around for quite a while before GraphQL.

[+] rambojazz|6 years ago|reply

What's wrong with SPARQL? What advantages has this over SPARQL?

[+] jerven|6 years ago|reply

Very few real advantages on a technical level.

Practically, the PG model has many syntactic advantages over the equivalence in expression power RDF+reification. (See excitement over RDF*, which is can be pure syntactic sugar) Syntax is important for usability.

I don't believe that PG in Cypher or GQL will be significantly more expressive than SPARQL 1.1. And in any case are quite different from the tinkerpop model.

I believe it is essential for Neo4J as a growing company that they move beyond their own Cypher to something that is more defined and critically allows them to check a "we are a standard" box on big deals. OpenCypher has solid adoption but lacks coherence between implementations. i.e. same data same query, different result.

Still a more grounded GQL will allow Neo4j competitors to gain on them.

[+] moxious|6 years ago|reply

SPARQL is purpose built for the RDF world where you're mixing and matching a zillion different vocabularies, all of which have to be painstakingly declared and name spaced every time.

For most of us working not on the "semantic web", we typically only have 1-2 vocabularies, which is our data model, and SPARQL is super clunky to use.

[+] baq|6 years ago|reply

i can read Cypher having almost zero experience with it, i have no idea what a slightly-more-than-trivial SPARQL query does.

[+] b3tt3rw0rs3|6 years ago|reply

SPARQL requires buy-in into the world of the semantic web even when all you want to do is store and query graph data.

Also, property graphs wouldn't have managed to get the traction they have if SPARQL would have been sufficient. SPARQL simply suffers from being designed in a way that does not sufficiently address the needs of application developers, in expressivity, ease of use, let alone allowing easy migration of existing relational data by sharing the same type system with SQL.

[+] amirouche|6 years ago|reply

SPARQL doesn't allow unbound recursive queries.

[+] smitty1e|6 years ago|reply

SPARQL suffers from Not Invented Here Syndrome.

[+] waffle_ss|6 years ago|reply

SPARQL is for RDF data, which not every graph database conforms to.

[+] pimmen|6 years ago|reply

SPARQL is not as easy to read.

I can show SQL or Cypher to some of my product managers who have experience with Excel, and they actually sort of get it. That’s not the case with SPARQL.

[+] okram|6 years ago|reply

There is something really special about the graph database space. For as long as the space has been around (15 or so years), every vendor and dedicated practitioner has taken solid jabs at trying to realize "the best way" to think about graph traversals.

This behavior seems particular to the graph space (vs. document, wide-column, relational, key/value, etc.). While this speaks to the complexity of the type of problems you can solve with graphs, thinking back, I believe this was a cultural anomaly. When it was Neo4j, OrientDB, TinkerPop: the language trifurcation occurred.

I'm excited that Neo4j is continuing to take the query language seriously. In an age when software development is about making it easy for the 90% of developers out there with REST APIs, GraphQL, and overly SQL'd embeddings, ... graph is still searching for "that best way."

I, personally, have moved on from language-level. However, our new work is going to help my fellow data system colleagues get there languages exposed to as many developers as possible regardless of data model. It is important to me that people can come to respect the numerous ways in which we think about data and has the language we use is so important. The difference between living in Plato's Cave or not.

In an effort to support query languages in general, I'll be working on mm-ADT designing a new cluster-oriented virtual machine architecture for storage, processing, and query language developers. I see a veritable Tower of Babel on the horizon!

Congrats Neo4j on reaping the benefits of your hard work. I hope our work will converge in for a positive collaboration in 2020.

[+] sandGorgon|6 years ago|reply

why did Tinkerpop's Gremlin not work out ? anyone has a summary of the discussion from a language design perspective ?

A lot of the Google-able references talk about how Gremlin is more optimizable than Cypher, etc.

[+] maxdemarzi|6 years ago|reply

Gremlin was written by a genius level developer to be used by other genius level developers. There are maybe a handful of Gremlin experts in the entire world and less than 100 that are any good at it.

It is extremely powerful, but after a few lines, the mental acrobatics needed to understand what the query does is beyond your average developer.

My first paid Neo4j gig 7 years ago was writing a rules engine in gremlin. It was about 25 lines of code. If you were to ask me today what each of those lines did, I would be at a loss. So would anyone who didn't live in those specific queries day in and day out.

Graph adoption was severely limited by its use. Cypher can be learned in a day, and "business people" can look at a cypher query and understand what is going on for the most part.

It takes about a week to "bolt on" Gremlin to any database. I've done it myself, that's why you see it so often. It takes months to be any good at it.

[+] amirouche|6 years ago|reply

Gremlin traversal language is a piece of a complete database query as run by TinkerPop's Rexster database. You can see it as a lazy sequence or stream API (think srfi-41 or r7rs scheme generators) with sugar syntax optimized for property graphs.

To take complete advantage of TinkerPop Rexster you really need to embed the Gremlin DSL inside a Turing Complete language (like groovy) and execute that.

I think Gremlin failed because a) the similar look to SQL of cypher queries b) long running and massive marketing campaign by the company behind cypher. c) since tinkerpop developers were hired by the company behind Cassandra, tinkerpop (and Janus graph) have lost momentum.

All this narrow data expert systems that persist data on-disk (!) are doomed to fail! The future is ordered key-value store and multi-model databases with ACID transactions.

[+] The_rationalist|6 years ago|reply

Gremlin is not a failure, it is supported by far more databases (e.g the latest one from Microsoft https://docs.microsoft.com/en-us/azure/cosmos-db/graph-intro... ) and has far more users than opencypher. It is far faster on average.

It's imperative while Cypher is declarative. Mostly: if you want the most performant and expressive langage: choose gremlin. If you want the easiest one and what you implement is standard and not very complex, then use Cypher.

[+] eranation|6 years ago|reply

I have 101 level of experience with both. Cypher is amazingly intuitive and simple, Gremlin, not as much. Just from a dumb user perspective (like me), Cypher left more like Python, Gremlin more like C++. Both are great, just different learning curve and entry bar.

[+] samcodes|6 years ago|reply

I have used Gremlex (Gremlin in Elixir) for querying a DB that supports Gremlin (Neptune) and found it really pleasant.

https://github.com/Revmaker/gremlex

[+] unknown|6 years ago|reply

[deleted]

[+] mehrdadn|6 years ago|reply

I'm rather uninitiated on this... what's the difference between a graph database and a traditional relational database that makes them need different query languages?

[+] lpghatguy|6 years ago|reply

Should GQL be pronounced with a hard or soft G?

Is "geequel" going to take off akin to "sequel" for SQL?

[+] vkhn|6 years ago|reply

oh good, I wonder if they realize developers frequently refer to GraphQL as gql.

[+] RocketSyntax|6 years ago|reply

So will neo4j switch to gql? and will spark support switch to gql?

[+] Vaslo|6 years ago|reply

So if I want to learn this GQL, where do I even start? I'm also confused about the naming, is there more than one language that could be called GQL?

[+] paul009|6 years ago|reply

Isn't xquery basically a graph query language similar to GraphQL? If so, why are we not using xquery to query objects and sub-objects?

[+] maitredusoi|6 years ago|reply

Anyone know the difference with ArangoDB's AQL ? (I haven't use neo4j as I use ArangoDB)

[+] peterwwillis|6 years ago|reply

Huh.

Now if only we could do this for configuration management, service mapping/scheduling/coordination, resource allocation, monitoring, alerting, logging, access control, artifact packaging, and execution pipelines.

[+] The_rationalist|6 years ago|reply

What advantages/limitations does it have compared to SHACL? https://en.m.wikipedia.org/wiki/SHACL

[+] westurner|6 years ago|reply

Graph query languages are nice and all, but what about Linked Data here? Queries of schemaless graphs miss lots of data because without a schema this graph calls it "color" and that graph calls it "colour" and that graph calls it "色" or "カラー". (Of course this is also an issue even when there is a defined schema; but it's hardly possible to just happen to have comprehensible inter or even intra-organizational cohesion without e.g. RDFS and/or OWL and/or SHACL for describing (and changing) the shape of the data)

So, the task is then to compile schema-aware SPARQL to GQL or GraphQL or SQL or interminable recursive SQL queries or whatever it is.

For GraphQL, there's GraphQL-LD (which somewhat unfortunately contains a hashtag-indeterminate dash). I cite this in full here because it's very relevant to the GQL task at hand:

"GraphQL-LD: Linked Data Querying with GraphQL" (2018) https://comunica.github.io/Article-ISWC2018-Demo-GraphQlLD/

> GraphQL is a query language that has proven to be a popular among developers. In 2015, the GraphQL framework [3] was introduced by Facebook as an alternative way of querying data through interfaces. Since then, GraphQL has been gaining increasing attention among developers, partly due to its simplicity in usage, and its large collection of supporting tools. One major disadvantage of GraphQL compared to SPARQL is the fact that it has no notion of semantics, i.e., it requires an interface-specific schema. This therefore makes it difficult to combine GraphQL data that originates from different sources. This is then further complicated by the fact that GraphQL has no notion of global identifiers, which is possible in RDF through the use of URIs. Furthermore, GraphQL is however not as expressive as SPARQL, as GraphQL queries represent trees [4], and not full graphs as in SPARQL.

> In this work, we introduce GraphQL-LD, an approach for extending GraphQL queries with a JSON-LD context [5], so that they can be used to evaluate queries over RDF data. This results in a query language that is less expressive than SPARQL, but can still achieve many of the typical data retrieval tasks in applications. Our approach consists of an algorithm that translates GraphQL-LD queries to SPARQL algebra [6]. This allows such queries to be used as an alternative input to SPARQL engines, and thereby opens up the world of RDF data to the large amount of people that already know GraphQL. Furthermore, results can be translated into the GraphQL-prescribed shapes. The only additional requirement is their queries would now also need a JSON-LD context, which could be provided by external domain experts.

> In related work, HyperGraphQL [7] was introduced as a way to expose access to RDF sources through GraphQL queries and emit results as JSON-LD. The difference with our approach is that HyperGraphQL requires a service to be set up that acts as a intermediary between the GraphQL client and the RDF sources. Instead, our approach enables agents to directly query RDF sources by translating GraphQL queries client-side.

All of these RDFS vocabularies and OWL ontologies provide structure that minimizes the costs of merging and/or querying multiple datasets: https://lov.linkeddata.es/dataset/lov/

All of these schema.org/Dataset s in the "Linked Open Data Cloud" are easier to query than a schemaless graph: https://lod-cloud.net/ . Though one can query schemaless graphs with SPARQL, as well.

For reference, RDFLib has a bunch of RDF graph implementations over various key/value and SQL store backends. RDFLib-sqlachemy does query parametrization correctly in order to minimize the risk of query injection. FOR THE RECORD, SQL Injection is the CWE Top 25 #1 most prevalent security weakness; which is something that any new spec and implementation should really consider before launching anything other than an e.g. overly-verbose JSON-based query language that people end up bolting a micro-DSL onto. https://github.com/RDFLib/rdflib-sqlalchemy

Most practically, I frequently want to read a graph of objects into RAM; update, extend, and interlink; and then transactionally save the delta back to the store. This requires a few things: (1) an efficient binary serialization protocol like Apache Arrow (SIMD), Parquet, or any of the BSON binary JSONs; (2) a transactional local store that can be manually synchronized with the remote store until it's consistent.

SPARQL Update was somewhat of an out-of-scope afterthought. Here's SPARQL 1.1 Update: https://www.w3.org/TR/sparql11-update/

Here's SOLID, which could be implemented with SPARQL on GQL, too; though all the re-serialization really shouldn't be necessary for EAV triples with a named graph URI identifier: https://solidproject.org/

5 star data: PDF -> XLS -> CSV -> RDF (GQL, AFAIU (but with no URIs(!?))) -> LOD https://5stardata.info/en/

[+] The_rationalist|6 years ago|reply

Neo4j would become the standard while being far less used than Gremlin? This is nonsense isn't it?

[+] pcr910303|6 years ago|reply

Hmm... There's this big, great, 'perfect', heavyweight graph query language (GQL) that is on process of standardized while an alternative (GraphQL) language is more readable (much more lightweight syntax IMO), has gained much more traction, etc...

While GQL and GraphQL's target is different(one is for interacting with Graph DBs while the other one is interacting with backends), there is a lot of overlap ongoing, and I just can't erase the feeling of the overlap between XML & JSON (where while XML was more 'perfect', JSON won the war).

Edit: Ok, GraphQL is insufficient for GraphDB Querying. Thanks for everyone's clarification.

166 comments