The GitHub GraphQL API

[+] niftich|9 years ago|reply

Neatly coinciding with GraphQL's announcement 'Leaving technical preview' [1].

In hindsight, sending a query, written in a query language from client to server seems obvious. So obvious, that I think I've seen it before...

  select login, bio, location, isBountyHunter
  from viewer
  where user = ?

It's ironic to me that it took Facebook reinventing SQL (or a graph database equivalent thereof [2][3][4]) and Github embracing it to legitimize this practice, since if you were doing this before, you were judged in the eyes of your peers and clients for not being "RESTful" (the fake-REST kind [5]), as if everyone was just itching to PUT and DELETE blobs of JSON of your poorly mapped resources to quasi-hardcoded, templated [6][7] URLs.

What's old is new again, but this time I'll take it.

[1] http://graphql.org/blog/production-ready/ [2] https://neo4j.com/developer/cypher-query-language/#_about_cy... [3] http://tinkerpop.apache.org/ [4] https://www.w3.org/TR/sparql11-query/ [5] https://news.ycombinator.com/item?id=12479370#12480408 [6] http://swagger.io/specification/#pathTemplating [7] http://raml.org/developers/raml-200-tutorial#parameters

[+] gjtorikian|9 years ago|reply

> Neatly coinciding with GraphQL's announcement 'Leaving technical preview'

We coordinated with Facebook to do that. ;)

[+] koko775|9 years ago|reply

No, what's old is old again. FQL was a SQL-like syntax and it was totally busted. GraphQL isn't just a structured query language, it's a query language on a unified hierarchical datastructure, and clients can make use of this hierarchy and unification to issue queries local to data and batch them to optimize query size. This in turn lets you have both an effective cache and a minimal distance between query and output to UI.

[+] electrum|9 years ago|reply

Before GraphQL, the public Facebook APIs had FQL, which provided a SQL-like language that allowed doing server-side projections and joins:

    SELECT uid, name, pic_square
    FROM user
    WHERE uid = me() OR
      uid IN (SELECT uid2 FROM friend WHERE uid1 = me())

[+] esfandia|9 years ago|reply

Also reminds me of CMIP and GDMO in old ITU network management standards in the mid 90s. They were RESTful, object-oriented, and you could make some pretty expressive queries with them. The standards failed probably because they were way ahead of their times: too many new concepts, too complicated compared to SNMP, and the documents were a very boring read.

[+] skybrian|9 years ago|reply

The main thing is making it secure and scalable. The old client-server infrastructure for database apps was not designed to be deployed over the Internet.

[+] rottyguy|9 years ago|reply

except they removed the physical knowledge and just went with logical associations

[+] brblck|9 years ago|reply

Special shout out to the open-source contributors and members of the community who helped us build this:

https://github.com/rmosolgo/graphql-ruby

https://github.com/shopify/graphql-batch

https://github.com/github/graphql-client

https://github.com/graphql

We <3 your work and are thrilled to have built this with you!

Please make sure to give us feedback during this alpha stage! https://platform.github.community/

[+] mcx|9 years ago|reply

Are you guys using rails w/ graphql-ruby? Would love to see a blog post with more details about the backend implementation!

[+] salex89|9 years ago|reply

I'm impressed, but for other reasons. For once, I have no idea how to properly implement this. I mean, it really looks like a lot of trouble mapping this from GraphQL to.. .SQL? And what if the system is using some kind of NoSQL database which does not really have a very verbose query language, if any? Complexity just seems to explode. Somehow I feel there is also a risk for the client to make a quite sub-optimal query. So, probably some kind of policy should be implemented. All in all, there is a level of management ability that looks lost to me with if GraphQL is implemented improperly, and to be honest, it looks like it is easy not to be. I'm really looking forward to some book or guide, since the implementation is puzzling to me.

[+] postila|9 years ago|reply

Does anybody considered this problem at all? (Giving too much flexibility to client and allowing non-optimal queries like joining several big tables or data collections w/o proper index support.) It's so weird that all materials I saw about GraphQL hushed up this question which is essential for the future of this technology.

And it's so similar to ORM's issues all the industry experienced past 20 years. But perhaps more dangerous due to the public nature of many APIs.

[+] lacker|9 years ago|reply

When you implement a GraphQL server, you don't map GraphQL to SQL. Instead, for each object type in your API, you define how to resolve each field, and you can use one of the various GraphQL server libraries to go from those object types to serving a whole API.

I would get more specific but it depends on which programming language you want to use. Check out the code examples & links to libraries in different languages on http://graphql.org/code/

[+] aturek|9 years ago|reply

I'd love to hear how Github is doing ACL here. We came up with a pretty neat solution on my team, which we have not yet open-sourced, for JS. But it was a lot of first-principles design work; there don't seem to be any good examples.

This was pretty much all the documentation we had, and it's more a design analysis of edge-vs-node authorization: https://medium.com/apollo-stack/auth-in-graphql-part-2-c6441...

Edit: Our eventual solution looked a lot like

    class SomeTypeOfResolver {
      @allowIfAny(rule1, rule2, rule3)
      someProperty;

      @allowIfAll(rule4, rule5)
      otherProperty = defineRetrieverFunction();
    }

[+] Kwastie|9 years ago|reply

Since the announcement of GraphQL I've been waiting for some 'real world' apis. (Sure the Star Wars GraphQL apis are fun)

Does anyone know any best practices if you want to adopt this is in an existing application using a relation database (i.e. PostgreSQL). I don't know how to implement this without causing N+1 queries. (or Worse).

For example:

{ Post {

    title,
    content,
    Author {
      name,
      avatar,
    },

    Comments(first:10) {

.. } } }

A naive implementation would cause a lot of query, for each "edge" a query.

[+] brblck|9 years ago|reply

Yup. That's something we can handle internally, under the hood to batch database requests into a single query for all edges. This problem is actually easier to solve in GraphQL then it is with a traditional REST API.

[+] pcsanwald|9 years ago|reply

you can use a helper library like the wonderful graphql-sequelize [https://github.com/mickhansen/graphql-sequelize] which helps you get from an ORM like sequelize to graphql queries/results fairly easily.

[+] biscarch|9 years ago|reply

You can use something like dataloader[0] or haxl[1] to help with this issue.

[0]: https://github.com/facebook/dataloader

[1]: https://github.com/facebook/Haxl

[+] xentronium|9 years ago|reply

In rails/graphql combo it's resolved the same way you usually resolve n+1 queries: you make 1 request per type of edge (i.e. one request for posts, one request for authors, on request for comments)

[+] marbletiles|9 years ago|reply

Fascinating this. But glosses a bit over the cost of generating bespoke responses to every request. Wonder how it works if expensive queries are implied in the request. You also need smarter caching, I imagine.

[+] WorldMaker|9 years ago|reply

The implication in this article and elsewhere is that that the most expensive queries happened anyway, but as sequences and patterns of requests before. I imagine that what is lost in caching common requests is gained back by being able to pattern analyze the bespoke requests as a whole, prioritize and even quota them specifically by the type of bespoke requests, much more so than you could by just guessing the pattern of incoming REST URL hits.

Just as a relational database itself can sometimes generate better query plans from its query analyzer if you feed it what you are really after in one big, slow query that narrows to very specific rows rather than lots of small queries that return lots of rows quickly. Amortized against the database's time (CPU, memory) and bandwidth that slightly slower query is still sometimes a big win for overall performance.

(Given that most GraphQL services are typically backed by relational databases, it should probably not be a surprise the savings sometimes get passed right along.)

[+] brblck|9 years ago|reply

This definitely enables a lot of opportunity to do both smarter querying and smarter caching on the back-end.

While you can indeed perform larger, more complex requests, GraphQL by nature forces queries to explicitly ask for everything you want to get back. As a result, we're not wasting any capacity giving you back a bunch of data for an entire object that you don't need like we would in a normal REST API request.

The thing that I'm most excited about with all this is the fact that we're building new GitHub features internally on GraphQL as well. This means that unlike a traditional REST API, there will no longer be any lag time between features in GitHub and the GitHub API.

API is a first-class product now. API consumers get features as soon as everyone else!

Please make sure to give us feedback during this alpha stage! https://platform.github.community/

[+] niftich|9 years ago|reply

The dirty little secret of "RESTful" APIs is that everyone was pretty much generating bespoke responses to every request anyway out of data coming out of a datastore.

Nobody has static files sitting on a server anymore except for static auxiliary assets (CSS, scripts, fonts) or if you're actually running a website with static content, which is exceedingly rare. Everyone has some kind of request router that parses the URL and the body and figures out what to do next, makes a query to a backing database, then assembles and massages the response to make it look like the mediatype the client expects.

[+] brblck|9 years ago|reply

One really great feature of GraphQL (which GitHub doesn't support yet in this alpha, but we plan to) is the ability to store queries for execution later. This lets us optimize and plan for the data and volume of requests being generated from a given query. Other soon-to-be-added features like subscriptions where we only return the data you need when it changes help a lot on this front as well.

[+] avitzurel|9 years ago|reply

I've been looking into implementing something like this @ Gogobot as well. This eliminates the need for all `/v/` type API versioning. The client requests what it needs for this request.

Experimenting with this we often saw 50-70% reduction in the payload being sent to the clients in some requests. If I only need the first, last and avatar from the User object there's no need for my response payload to suffer because other requests need 30 fields from the same object.

Implementing this without causing a lot of N+1 queries is the tricky part and that's where we're currently investing most of our time.

Awesome to see Github adopting this and releasing it to the public API.

[+] brblck|9 years ago|reply

I theory a GraphQL API can operate verison-less utilizing things like deprecation notices and field aliasing to smooth over any rough edges. Once we see calls on a certain thing reach zero and sustain that level, we can actually remove it and never have to bump a version anywhere.

That's the dream. We'll see how reality plays out.

For reference, we actually launched with some deprecated fields (see "databaseId" on the "Issue" type -- database IDs will be phased out for global relay IDs eventually) if you want to see what they look like.

[+] dylants|9 years ago|reply

They mentioned using https://github.com/shopify/graphql-batch which was designed to solve this problem. This is similar to https://github.com/facebook/dataloader which solves the same problem for javascript.

[+] vcarl|9 years ago|reply

Awesome to see GraphQL get some mainstream adoption, hopefully this leads to some more community tools for consuming it :) Relay is an awesome concept, but the learning curve is pretty steep.

[+] dan_ahmadi|9 years ago|reply

Check out these other API consumption clients: http://graphql.org/code/#graphql-clients

[+] tiles|9 years ago|reply

Is this then the GitHub v4 API? Should we expect the REST API to be deprecated in the future?

[+] dan_ahmadi|9 years ago|reply

GraphQL increases speed: "Using GraphQL on the frontend and backend eliminates the gap between what we release and what you can consume"

[+] repole|9 years ago|reply

I'm intrigued by GraphQL, but I don't understand what separates it from passing "fields" and "embeds" parameters in a REST API. I don't see what about it would be inherently easier to implement either.

I've sparingly in my free time been working on a project that does exactly this with a REST API[1]. It's in an entirely unfinished state, but the linked documentation is a decent example of the types of queries possible.

[1] http://drowsy.readthedocs.io/en/latest/querying.html

[+] niftich|9 years ago|reply

For me, it's easiest to pretend that GraphQL is a DSL (a domain-specific language) that offers special syntax to make it easier to implement certain things.

- You can pretend that each GraphQL query is a JSON object (which, it actually is)

- You can pretend that each GraphQL schema that you declare is actually a JSON-Schema document, which some people use to specify in a machine-readable way your API's inputs and outputs will look

- You can pretend that each GraphQL resolver, which the piece of code you have to write (on the server) to actually dig up the result of a query, is a function that parses your incoming JSON, validates it against your JSON-Schema, and then reaches out to your datastore to produce a result. You'd then have to construct another JSON document which matches your response schema, stuff the data in it, and return that to the user. Except that in GraphQL, you only have to supply the resolver, the rest is handled by the framework.

You can of course do this by hand and many people do (most obviously when you see APIs that include arguments like "operator=eq" or "limit=100" or "page=25"), but GraphQL gives you the tools to do this with less effort, and end up with a cleaner API by passing everything in the query body. And the GraphQL server saves you from having to manually build up the JSON text of every single response.

Reading through your docs, I've seen this style of API in enterprise settings where there was a backing relational database and the designers were basically trying to expose the underlying database through HTTP. It can get the job done, but GraphQL gives you nicer abstractions, a cleaner way of passing parameters, and conveniences like a real type system (known both on the client and server side) and you only have to supply your resolver function implementations.

[+] honzajde|9 years ago|reply

GraphQL API Explorer wants this permission:

Public and private This application will be able to read and write all public and private repository data. This includes the following:

Code Issues Pull requests Wikis Settings Webhooks and services Deploy keys

Why'O whyai?

[+] helfer|9 years ago|reply

Because GitHub's GraphQL API will let you do all of those things via the GraphQL API Explorer ;-)

[+] netghost|9 years ago|reply

I think it's because you can use the GraphQL mutations to do things like create comments, etc.

[+] kalleboo|9 years ago|reply

You don't trust GitHub with access to your GitHub repositories?

[+] wehadfun|9 years ago|reply

Slightly off topic but GraphQL vs Odata?

[+] robzhu|9 years ago|reply

They're similar; both technologies allow clients to specify the data they need. I would say that GraphQL is more flexible and has a strong type system. For example, the filter semantics are defined within OData, while in GraphQL, you define how your data can be filtered within your schema.

As a personal opinion, I also feel OData exposes an API that is too tightly coupled to the persistence layer. GraphQL objects and properties are all backed by arbitrary "resolver" functions, which means you can stitch together multiple/legacy backends to generate your response.

[+] wehadfun|9 years ago|reply

What method do you use to get data. You can't put a body in GET methods so I assume with GraphQL you use POST to get data?

[+] andyfleming|9 years ago|reply

They use POST ( https://github.com/github/graphql-client/blob/9e1fa16cf88de4... ) which, as far as I know, is common for query APIs like this.

66 comments