top | item 29430720

Pg_GraphQL: A GraphQL Extension for PostgreSQL

431 points| samwillis | 4 years ago |supabase.com | reply

99 comments

order
[+] akulkarni|4 years ago|reply
As co-founder of another database company in the PostgreSQL ecosystem, I have to say that I'm really impressed with the quality and velocity of launches from the Supabase team. Nothing else to add, except please keep up the great work!
[+] kiwicopple|4 years ago|reply
thanks Ajay, we're also big fans of yours here at Supabase :)
[+] cmrajan|4 years ago|reply
I've explored multiple Graphql solutions for PostgreSQL in the past and couldn't convince of the resolvers based approach and always felt that an extension would be the right solution. The closest I've come across and used so far is Graphjin https://github.com/dosco/graphjin . Thanks Supabase for the fantastic extension!
[+] sporkland|4 years ago|reply
I enjoyed Sam Newman's take on this capability [1]

> Great to see AWS providing direct data coupling as a service. /s

> This service allows you to directly map a GraphQL endpoint to a database table. It’s like putting getters and setters on an object and claiming your encapsulating private variables. The end result is coupling between GraphQL clients and the underlying datasource.

> Information hiding is a key concept in independent change. Can I change the provider (of the GraphQL) endpoint independent of the clients? Directly exposing internal data structure makes this very difficult.

And [2]:

> So a few people have asked why I have this snarky response. What is my problem with this service? Well, to be clear, it’s not an issue with GraphQL, it’s an issue with direct coupling with underlying datasources #thread

> The service as advertised makes it simple to map a GraphQL definition against a database. Now, what’s the problem with this? Well, the devil here is in the detail. But fundamentally it comes down to how important information hiding is to you.

> ... see the thread for more ...

[1] https://twitter.com/samnewman/status/1346541251617828877 [2] https://twitter.com/samnewman/status/1346749556583780352

[+] Twisol|4 years ago|reply
I don't think Sam is wrong, exactly -- and that second thread does add some of the missing nuance -- but I do think there's a key missing piece here.

He's absolutely right that you shouldn't couple directly to the underlying representation. But Postgres lets you transparently define views that can be queried (and, with a little more elbow grease, updated) just like any other table. You can provide decoupling from within the database, and do so on-demand as your domain and your data model evolve.

I don't enjoy planting a separate bespoke API server on top of the database. Usually you end up lifting many of the same capabilities the database already has (auth, batching, ...) to your custom API, so a lot of the server is just boilerplate. Many API operations are natural consequences of your data model; there's little business or engineering value-add once you've settled on the latter, you're just writing glorified FFI bindings.

Lazy engineering will cause problems no matter what architectural stack you end up using. But a state-first architecture doesn't have to mean a complete loss of loose coupling -- it just means different techniques for achieving it.

[+] lstamour|4 years ago|reply
We could use PG views or functions to make our own abstraction layer inside the database and handle security better, etc. But putting that complicated database rebuild aside, just because we can publish GraphQL APIs from the database doesn’t mean we have to: it would be interesting to use GraphQL as a communications layer from backend to database too. Just as frontend-to-backend could use GraphQL, so too could backend-to-database communication, but with different schemas (data models) between frontend and backend. Having a backend GraphQL server between the frontend and database could allow for connection pooling, schema versioning and abstraction, for proxying multiple databases, handling business logic and perhaps validating inputs, caching or managing compute tasks.
[+] e12e|4 years ago|reply
I think that (on paper at least) it makes perfect safe to allow graphql to be your only hammer (or turtle). I don't think federating graphql over bespoke swahger/json, sql, redis, soap is inherently better - expose graphql and federated graphql:

https://www.apollographql.com/docs/federation/v2/

Smack graphql on your postgres, and on your ldap, and your graphdb - and federated them as graphql?

[+] AtNightWeCode|4 years ago|reply
For front-end development where you already using GQL as a layer on top of content I can see the use case for this.

A lot of old systems have this tight coupling between databases and back-end code. I would advice against going down that path.

GraphQL was designed to solve the flaws in the utterly poor service design at Facebook. Do not forget that.

[+] lucasyvas|4 years ago|reply
There is no way every cloud provider won't jump on this as soon as it's stable for their Postgres offerings.
[+] darksaints|4 years ago|reply
Hopefully so. It is awesome that supabase is releasing this as open source, as they could have easily kept this proprietary. I probably won't ever be able to use supabase for my primary job (the company is obnoxiously invested in an Azure/AWS hybrid cloud). But I do have some profitable side projects with the ability to choose my own service providers, and this sort of developer goodwill really goes a long way.
[+] n0w|4 years ago|reply
As someone that has used Hasura very heavily in the past, I can see the operational benefit of not needing to manage one more service.

The talk about memory constraints seems a bit odd though. The extension is still going to require memory and without profiling the alternatives it seems odd to say: "we won't use any more memory with an extension". Especially when you say "these established/stable alternatives fulfill all our feature requirements". In practice I always found Hasura to be reasonably lightweight (I can't speak for PostGraphile).

I think there's an advantage to running this API layer as an additional service in front of the DB though. Then it can act as an API gateway to more than just your database.

Also, the generation of a "Relay style" schema is off putting for me. I'm really not a fan of the style and was one of the biggest advantages for me in using Hasura.

[+] nikivi|4 years ago|reply
Curious how it compares to https://www.graphile.org/postgraphile/
[+] underbluewaters|4 years ago|reply
I had to make sure someone had plugged postgraphile here. It's a great system. What impresses me most about it is how it grows with the complexity of your application. There are so many thoughtful points of extensibility built in, I always have confidence that I can go into the docs and find a way to do what I need to.

That reminds me, I need to figure out how to get my employer to sponsor the project...

[+] oliverrice|4 years ago|reply
its very similar in goals to postgraphile

as samwillis mentioned, the memory footprint is tiny, which is a big perk for supabase's platform (or if you're self hosting) but its also fully language agnostic which opens up lots of options for extensibility:

For simple use-cases you can expose the graphql functionality over http using a PostgREST as described here https://supabase.github.io/pg_graphql/quickstart/

but, if you want more configuration like adding in middleware or wiring it up to an existing backend application, you can do that from any programming language that can connect to postgres, rather than only javascript

[+] samwillis|4 years ago|reply
The article suggests they built it so that by running it within the DB it would have less compute overhead and they could match GraphQL requests to SQL transactions 1:1.
[+] wiradikusuma|4 years ago|reply
I use Dgraph, a "native" GraphQL DB. But I still transform the incoming/outgoing JSON to Java object and transforming the object again to/from the DB. Why? 2 things: Business Logic and Security/Access Control. Yes I can put (some) Business Logic in Dgraph DB, but it feels "leaky" like Stored Procedure (maybe because I come from Java). I feel it's tiring and stupid, but I don't know a better solution.

I reckon for this extension, the business logic uses Stored Procedure and Access Control uses PG's user role? Many apps I know simply have 1 user "myappuser" (or even default user) to access its DB.

[+] Twisol|4 years ago|reply
> Many apps I know simply have 1 user "myappuser" (or even default user) to access its DB.

Sure; either those apps don't need to differentiate access between their users, in which case one role is sufficient, or they reimplement their own auth system, in which case you'd use Postgres' own rather robust auth system instead. It comes down to the needs of the domain; you'll solve the problems differently depending on what approach you take, but you need to solve the same problems.

Yes -- I've found it very tiring, as you put it, to keep reimplementing the same boilerplate in every API server just to lift the operations my database can already support out to an HTTP frontend. Postgres' auth means I don't have to make or press into service a separate auth system, and there are multiple ways to handle business logic orthogonally.

Stored procedures and triggers work well, but are synchronous within the current transaction, and sometimes simply don't map well to the domain needs. You can also use the AWAIT and NOTIFY statements to set up asynchronous external workers. I find this has a positive effect on the data model, as you're forced to consider what states a system will pass through during an asynchronous flow.

[+] andrewingram|4 years ago|reply
Whilst I don't like the paradigm of generating a GraphQL schema from your database (or vice-versa!), I appreciate this one for having a specific goal of being able to run as part of a small database VM. So congrats on the release!
[+] nick__m|4 years ago|reply
Ideally you don't expose the schema directly, you make views in another schema and expose that instead of exposing your real tables.
[+] oliverrice|4 years ago|reply
author here! happy to answer questions
[+] andrew_|4 years ago|reply
We use Postgraphile heavily, it drives the entirety of our API. A few feature questions; Is there any plan to support, or has there been any conversation around:

- custom directives that run custom code

- custom routes on the server that do custom things? Or perhaps proxying to another app to handle alternate/custom routes

- schema injection, custom resolver logic

[+] thelastbender12|4 years ago|reply
> After tallying the resources reserved for PostgreSQL, PostgREST, Kong, GoTrue, and a handful of smaller services, we were left with a total memory budget of ... 0 MB

This was a really interesting bit to me, could you give detail on how the memory usage gets split up across these? Thank you.

[+] nwienert|4 years ago|reply
Congrats on release. We use Hasura but haven’t been happy with their speed of iteration, lack of communication of various large bugs, and lack of M1 support / communication.

But we do use a lot of their more advanced features like being able to use aggregates in sorts, aggregates in results, custom functions, etc. What are your plans there and will you have a public roadmap?

[+] ptrwis|4 years ago|reply
Am I correct in saying that for read queries, you essentially translate the GraphQL query into nested jsonb_agg calls?
[+] httgp|4 years ago|reply
Very excited about!

Do you have performance comparisons for the same datasets with Hasura / Graphile?

[+] samwillis|4 years ago|reply
Hi Oliver, Supabase team,

Love what you are doing with Supabase!

Quick question, have you considered building any kind of local mirroring system for offline mobile app/PWA, say on top of SQLite? Something like Realm for mongodb or PouchDB/Couchbase Lite for CouchDB/Couchbase.

It would obviously need devs to add some extra columns to tables for tracking, and a way to define the merge/overture characteristics for each column, but it would be awesome to have something like that!

It’s something I have thought about building for a while but never found the time. (I want to combine it with Yjs for collaborative offline rich text editing)

[+] kiwicopple|4 years ago|reply
just want to give a big shout out to Oli - his implementation really ingeneous, leveraging multiple parts of the Supabase stack while also being agnostic enough to work natively inside Postgres (or with other tools)

{supabase team}

[+] config_yml|4 years ago|reply
So you're calling Graphql via PostgREST, but I didn't ready about why you're including Graphql in the first place. Isn't it orthogonal to PostgREST which is already used in your stack?
[+] rlili|4 years ago|reply
Does this work in YugabyteDB/CockroachDB?
[+] kaspermarstal|4 years ago|reply
I've been looking for something like this for a long time, so cool! Great work!

Does it work with TimescaleDB?

[+] rubyist5eva|4 years ago|reply
how do you implement search/sort against something like this?
[+] darksaints|4 years ago|reply
This is awesome! Really creative approach to the problem. Does this also support custom datatypes that can be marshalled to json? For example, PostGIS geometries?
[+] brap|4 years ago|reply
That’s interesting. How does it deal with authentication and authorization? It it handles those well, it could be a real “serverless” solution for most CRUD apps.
[+] clessg|4 years ago|reply
As the sibling comments point out, it uses Postgres Row Level Security (RLS). For an approachable introduction to how Supabase's auth works:

https://supabase.com/docs/guides/auth

> 1. A user signs up. Supabase creates a new user in the auth.users table.

> 2. Supabase returns a new JWT, which contains the user's UUID.

> 3. Every request to your database also sends the JWT.

> 4. Postgres inspects the JWT to determine the user making the request.

> 5. The user's UID can be used in policies to restrict access to rows.

> Supabase provides a special function in Postgres, auth.uid(), which extracts the user's UID from the JWT. This is especially useful when creating policies.

For instance, say you have a `todos` table and want to make it so users can only read their own todos - you could have an RLS policy `todos.user_id = auth.uid()`. Afterward, `SELECT * FROM todos` will only return the authenticated user's todos. (Equivalent to manually issuing `SELECT * FROM todos WHERE todos.user_id = auth.uid()`.)

There's also `auth.role()` so you can easily restrict access by role: `auth.role() = "admin"`

[+] oliverrice|4 years ago|reply
The extension is compatible with your existing row level security policies. If you connect to the database as a role like `authenticated` then those policies will be applied.

The columns and tables that are visible are also controlled by the role.

One cool thing about that approach is you could run e.g. an admin API and a user facing API all from that same endpoint by executing as different postgres roles!

[+] dljsjr|4 years ago|reply
They mention in the article that auth is handled by reusing the existing row level security you'd already use to secure the DB (i.e. `CREATE POLICY`).
[+] IceDane|4 years ago|reply
While this is super neat, this and the amplify thread can't help but make me wonder:

Is everyone really this okay with their API always being 1-to-1 with their DB models?

In my experience, that kind of setup is only viable for the smaller, simpler projects and otherwise you always run into something where you'd really prefer to have a layer between you and your database.

I am currently using graphql at work and this is a very hard requirement for us. Our database schema is not translated literally into graphql(or the other way around) and this is very intentional. The whole idea of the API layer is to be able to make changes to your internals without breaking all your consumers.

This was a problem with amplify(though the least of the problems we had) and it seems to me this is also a problem here, as it is with Hasura and postgrest.

[+] Apaec|4 years ago|reply
How does it compare with https://github.com/solidsnack/GraphpostgresQL? It's like 7 years older and it takes the same approach, it seems.
[+] solidsnack9000|4 years ago|reply
GraphpostgresQL is no longer maintained and hasn't been updated for many years.

I am the person who wrote it. It was a proof-of-concept, written in PL/pgSQL. This made it fairly easy to set up and test but made maintenance and contributions very difficult.

[+] oliverrice|4 years ago|reply
Very cool! I was not familiar with this project. It does takes a similar approach to query building. It is described in the README as an alpha POC
[+] xrd|4 years ago|reply
I've been using Hasura extensively and love it. I'm curious what the delta is between the two. It's so simple to run Hasura on dokku and instantly get an amazing UI for postgres with the bonus of graphql. It would take a lot for me to switch.
[+] piaste|4 years ago|reply
> It would take a lot for me to switch.

We were pretty happy with Hasura but had to switch to Postgraphile due to poor multi-database support, bummer.

(Postgraphile is not as polished as Hasura in some ways, but since it can used as a library, it's easy to dynamically create N instances of it with different configurations at runtime. Hasura required our ops team to define a new instance of the service in the docker-compose.yml file for each database)

[+] ndejaco|4 years ago|reply
I see this as a neat way of exposing a graphql interface but a pattern with a number of limitations that are traded off against the goal of reducing the network latency. Definitely useful for users who want a single datasource and only want to expose access through GraphQL (cannot scale db resources indepedently of the api access layer). Wondering if the graphql engine could make use of the postgres stats collector for optimizing graphql resolver execution ex) resolver dispatch scheduling or pre-execution query optimization?
[+] cdaringe|4 years ago|reply
Heck ya! I can’t wait to try it in whatever my next project is.

I’d like to learn more about the security of the functionality. I’m not talking about how to apply pg roles/privs/RLS, but rather my perceived risk of how hard it is to write stuff safely in C. That is, who is handling inbound http requests, pg or the extension? What confidence do I have that my db mem is safe from imperfect extension logic?

[+] the_arun|4 years ago|reply
What are the use cases we could use a direct plugin to database for GraphQL? I mean where do we write business logic before data becomes GraphQL?
[+] ancharm|4 years ago|reply
How long before someone just goes into the Postgres source code and creates true native support for GraphQL
[+] foldr|4 years ago|reply
Naive question: If your entire DB schema is automatically reflected into a GraphQL schema, then how is this better than just having the client send SQL queries directly to the backend? Does GraphQL make it easier to do this securely?