top | item 37896392

(no title)

This is an interesting project, but I'd be unlikely to use it, for a couple of reasons:

- In my experience the performance of the stateful DB server has been the biggest bottleneck when scaling - it's much easier to scale the stateless application servers which sit between end user requests and your DB in a traditional architecture. So usually I'm wanting to move as much work as possible away from the DB in order to squeeze the most out of it before needing to shard the DB or move to a different solution, rather than moving more responsibilities into the DB.

- It's frankly pretty scary to load a C extension into postgres which is opening ports and parsing requests etc - bugs in it could crash the server or open security holes, and if you were able to exploit a vulnerability you'd be able to grab any of the data in the DB and easily exfiltrate it. This would be less of an issue if using this for an internal service which isn't directly exposed to the internet, but it still could make it easier for an attacker to escalate their access. (This isn't a 'it should be rust' comment really, even if this was in rust it would still be pretty worrying).

- Even if think you only need simple CRUD actions, over time you tend to need more and more logic around those actions. Authentication, verification, triggering processes in other systems, maybe you make schema changes and need to adapt requests from old clients, etc. It's really nice to have a more heavyweight application server where you can implement that logic - I'm pretty skeptical that row level permissions, triggers etc will be able to cleanly handle all those as you add new requirements over time. This applies also to other tools for more directly exposing your DB ( e.g. PostgREST ). IMO starting off using a tool like this is really just setting you up to have to do a pretty painful rebuild later on.

Am I missing something here, maybe I have misunderstood the intended use case?

discuss

lovasoa|2 years ago

These are good points, but I'd like to nuance some of them. Disclaimer: I work on an SQL web application builder [1] that shares a lot of the philosophy behind Omnigres.

- about scaling: you have to get very far before saturating a single postgres server. A lot of applications certainly do get to that point, but most don't. And once you get there, scaling postgres is definitely more work than scaling a stateless service, but it also gives you a lot more in terms of performance, reliability and further scalability.

- about C being scary: as a rust afficionado, I am not going to contradict you. But postgres itself is already C, and @yrashk is not just any C developer. Notably, he contributes to postgres itself.

- about managing complexity: postgREST, Omnigres, hasura, SQLPage and other tools that simplify building directly on top of the database never require exclusive access to the database. You can always put some of the complexity outside if you need to, when you need to.

[1] https://sql.ophir.dev

factormeta|2 years ago

>- about scaling: you have to get very far before saturating a single postgres server. A lot of applications certainly do get to that point, but most don't. And once you get there, scaling postgres is definitely more work than scaling a stateless service, but it also gives you a lot more in terms of performance, reliability and further scalability.

Thank you for saying this. Mostly when postgres as a platform is brought up, the horizontally scaling ppl will often mention the parent thread. I have being developing since Apple computer had floppy disk, and there is rarely many situations where I need to saturating a single postgres server.

And even if one did get to the situation where that happens, with introduction of hydra, or other postgers columnar db, we can just put that in. Most user will never get to a point where they need to saturating a single postgres server. And also keep in mind when processing large row data, writting stuff in middleware is just not as efficent or fast as in postgres when it has native access to data and data manipulation.

jordanthoms|2 years ago

These are also good points you've raised.

On scaling, yeah a single postgres server can handle a lot. For us we were well past the million user mark before running into serious issues. However, a lot of how we were able to keep postgres working for us as we grew was by shifting work from postgres to our stateless services like I alluded to before - e.g. making our SQL queries as simple to execute as possible even if it means more work for the client to piece the parts back together.

If everything had been running inside the database we wouldn't have had that option and we'd probably have hit scaling limits much earlier - I guess we could have split off the traffic to the highest traffic endpoints and have those handled by a separate service calling the PG db, but then you get into issues with keeping the authentication etc consistent.

Re security - yep, PG is already using C to parse untrusted inputs from the network, which is also scary, but it's (hopefully) well reviewed and mature code - and even so, I wouldn't want to expose PG's usual wire protocol port to the internet, so it's hard to imagine exposing HTTP from postgres to the wild west.

Ultimately it probably is just a question of the sort of project it's being used for - if it's for something that's not going to get need to get to larger scales, handle a lot of complexity over time, or pass security reviews and your main goal is simplicity, then maybe an approach like this is a good option. I've just found that things tend to start off looking small and simple and then turn out to be anything but, so I'd rather run `rails new` and point it at a standard PG server - which would be just as simple and productive when you are starting out, and can keep scaling as your customer base and team size grows up to the size of Shopify, Github, or Kami (shameless plug).

dougmoscrop|2 years ago

Avoiding scaling the stateful part is also a path to hammer-nail syndrome - you start using less and less of the database system because you keep pulling things out since that's the only place you've established the ability to add CPU capacity and with that come a host of new and old issues.