Stored Procedures as a Back End

[+] Raidion|5 years ago|reply

If I make the change in code, I can roll it out, test it in prod, and then slowly migrate traffic over in very standard and well defined ways.

If I deploy to the database and something goes wrong, I need to really trust my rollback scripts. If the rollback scripts go haywire, you're in a tough spot. If that happens to code, you can literally just move all the traffic to the same thing that was working before, you don't quite have that luxury with the database.

You can have a bunch of servers, you really only can have one database. This means the database should be very sensitive to changes because it's a single point of failure. I don't like modifying or changing single points of failure without very good and well tested reasons.

Could you version your procs and have the new version of the code call the new procs? Sure, but now you have to manage deployment of both a service, and the database, and have to handle rollover and/or A/B for both. If my logic is in there service, I only have to worry about rolling back the service.

Database logic saves you a little bit of pain in development for a ton of resilience and maintenance costs, and it's not worth it in the long run IMO. Maybe it's because the tooling isn't that mature, but until I can A/B test an atomic database, this doesn't work for a lot of applications.

[+] chaps|5 years ago|reply

Not sure I follow in the context of the post. The post is about a database that can be wiped and effectively recreated from scratch. There wouldn't really be a need for rollback scripts when you can just rerun the entire workflow from scratch. Even easier if you have a hot/warm/cold setup.

In this sort of setup, changing a db function isn't unlike making a change to db-interfacing code in that it's not changing the underlying data, just the interface to it.

I've built similar systems that instead of making changes to db functions, changes are made to materialized views. The underlying data doesn't change, but its presentation does. No huge need for rollbacks. It works really well!

[+] jrumbut|5 years ago|reply

For years I've wanted to try the idea in the article, particularly for a data intensive but not high scale project. It's never been worth the hassle though.

Still, thinking about it has led me to see how we are not really leveraging the capabilities of feature rich RDMSes like Portgres. A judicious use of user defined functions can help, particularly for polyglot companies/microservices where the database can ensure certain routines are standardized across all apps.

[+] Roboprog|5 years ago|reply

Presumably you can revert to a previous tag/label in source control and reload the previous procs/funcs to roll back versions.

The apps I have been working on are monoliths which run at about 2% CPU on the server, so there is no A/B comparison runs in production (e.g. on January 1, some law goes into effect and you WILL implement it). I would happily see ORM usage curb stomped.

[+] danielheath|5 years ago|reply

> If I make the change in code, I can roll it out, test it in prod, and then slowly migrate traffic over in very standard and well defined ways.

Advanced methods (like blue/green deploys) are invented by people with high-traffic requirements and slowly filter down to everyone else until they are normal practice. 15 years ago server configurations were ordinarily painstakingly handcrafted and somewhat unique.

We can reimagine those techniques around a database without too much difficulty.

> You can have a bunch of servers, you really only can have one database. This means the database should be very sensitive to changes because it's a single point of failure. I don't like modifying or changing single points of failure without very good and well tested reasons.

This is a strong argument; I agree that very good reasons are required. I have seen scenarios where the reasons are IMO quite good, but agree this isn't a widely-applicable technique.

> Could you version your procs and have the new version of the code call the new procs? Sure, but now you have to manage deployment of both a service, and the database, and have to handle rollover and/or A/B for both. If my logic is in there service, I only have to worry about rolling back the service.

It has become common to have (and version/rollback/etc) each of web-frontend, web-backend code, web-backend engine, database schema, database engine.

Cutting that back to web-frontend, database http plugin, database schema & database engine is IMO a plausible win some of the time.

I've been noodling around for awhile with the concept of a precompiler for stored procedures which alters their names to include a hash of their code.

This would give you a safe way to seat multiple versions alongside one another.

* The frontend build process would pull in the hashes so it knew which version to call * A cron-job could delete the old ones once they're (manually marked?) no longer in use - this would be the most hairy bit since it's got to be 100% sure it's not in use anymore.

Rolling back then constitutes putting the old version of the frontend live so it'll refer to the previous stored procedures.

[+] moksly|5 years ago|reply

I’m struggling with a lot of things in this. Why couldn’t you test your stored procedures? Assuming you use an ORM, do you test it’s logic? Because if you don’t, then what’s the difference?

Why wouldn’t you have more than one database? What if you’re global?

Why would a database ever be a single point of failure?

Why wouldn’t you be capable of rolling it back, or test in anyway that’s different from what you do with your other code? Is it perhaps because you don’t know the tools to do so?

Why would database logic save you pain in development? Typically utilising stores procedures would make development less straightforward because you split your logic up, but you do it because database servers are a lot better and a lot more efficient at giving you the data you need.

Don’t get me wrong. I only use stored procedures, or even any SQL tooling, when they are the only way to get efficient or persistent data, but it’s not because or any of the reasons you list. It’s because 95% or the developers we hire don’t really know how to use databases.

A good example is an issue one of juniors had recently. We have a lot of data organised in what is essentially a tree structure where the nodes only know their parents. He needed to pick any node and gets its children + related rows from a range of tables in three databases. The task itself was simple enough, except the tree has millions and millions of nodes and even more millions of related rows, and he needed the data fast. He had hacked around, trying different stuff like loading it gradually, caching it between uses and so on, but he couldn’t get it fast enough. We made a recursive stored procedure that was capable of giving him the data almost instantly, because thats what database servers do. In a couple of years, when this needs tweaking, however, it’s very likely that someone like me will be needed to look at it, because recursive SQL isn’t something a lot of our developers do, so in any other case, doing it in C# would’ve been better.

[+] XorNot|5 years ago|reply

They way I've seen this done is you version the namespace of the stored procedures (and include the git commit). So at any given time you're definitely only calling the exact function you want.

[+] emmanueloga_|5 years ago|reply

There's a trend of versioned databases that could address these issues. I've seen at least Dolt [1] for relational:

"Dolt is a data format and a database. The Dolt database provides a command line interface and a MySQL compatible Server for reading and writing data."

... and TerminusDB [2] for graphs:

"TerminusDB is an open-source knowledge graph database that provides reliable, private & efficient revision control & collaboration."

I don't know if Dolt supports stored procedures, but in any case versioning a database would be an amazingly powerful and convenient thing, if it can be done without sacrificing too much DB performance.

Worth investigating and has the potential of changing the way we write apps.

--

1: https://www.dolthub.com/docs/tutorials/installation/

2: https://terminusdb.com/

[+] johbjo|5 years ago|reply

There can be many databases. They can be running in parallel with the same SQL inputs. You can have provisions for this in your middle ware, but there is also stuff for this in the db replication. Replaying SQL logs on test instances. It is possible to run quite sophisticated testing in this sort of setup.

Relying on ORM doesn't make a difference, since you are still sending DDL commands to a live database.

Using stored procedures, views, and triggers makes it easier to add abstraction between your middle ware and the database structure.

But on the whole, I agree it's probably not recommendable to anyone feeling hesitant about it.

[+] ComputerGuru|5 years ago|reply

I fully agree and treat my databases the same way. However, there’s no reason a stored procedure couldn’t be treated the same way with a random weight determining which version of a stored procedure is run.

But I still wouldn’t.

[+] qaq|5 years ago|reply

Put your functions in a schema e.g. v1, v2 etc. You get same thing you have with code. To test on prod data you take a snapshot and start an instance from a snapshot.

[+] zikzak|5 years ago|reply

I have taken various runs at stored procedure versioning and agree with you. The database is the ultimate sacred resource. Change things at your peril and test like crazy.

[+] mdoms|5 years ago|reply

I have worked on an application that heavily used sprocs for business logic. I would not wish it on anyone. It will seem fine (and actually it will seem better - faster and easier to manage) while your project is small.

In a few years when you need to make some major changes you will see why it was a bad idea, and it will be too late. Have fun modifying hundreds or thousands of sprocs when you need to make a large-scale change to the structure of your data, because SQL won't compose. Have fun modifying dozens of sprocs for each change in business logic, because SQL won't compose. I guarantee you will have mountains of duplicated code because SQL won't compose.

[+] cneu|5 years ago|reply

21 years ago, I joined a young team that wrote an ERP, with all business logic in PL/SQL. Customers are still amazed how fast these guys are going. 10 years ago I joined Oracle to work on APEX (created in 1999), which is a web app development and runtime environment, written in PL/SQL. We estimate that there are about 260K developers world-wide who are developing apps with APEX. All of their business logic is also on the server. Call me biased because I am working for the man, but my data points are completely different.

[+] jandrewrogers|5 years ago|reply

Back in the mid-90s, this was how many of the high-end web apps were built, using Oracle. Despite its limitations, and Oracle license fees (but back then you were paying for Sun servers anyway), it worked surprisingly well and was a reasonable model for many database-driven web applications. Some of these systems were used for mission-critical global operations handling enormous amounts of money and complex business. It had a lot of advantages versus other web app stacks at the time (commonly a mountain of Perl).

Oracle was happy to sell database licenses but made zero effort to develop this way of using the database as a first-class capability. They were astonished that it even worked as well as it did, they did not expect people to take it that far.

For some types of applications, something along these lines is likely still a very ergonomic way of writing web apps, if the tooling was polished for this use case.

[+] FlyingSnake|5 years ago|reply

Using a database as a backend comes up time and again and developers are sharpely divided over this concept. I feel that developers who started their careers before the advent of NoSQL DBs/MEAN stack/etc had to go through the rigor of fine tuning their SQL queries and witness the powers (and quirks) of a performant SQL DB. These devs are usually a fan of using DB-as-a-backend. Personally, I am a big fan of PostgREST and I apply it whenever I find a use case that suits it. I find it quite pragmatic and it just shines when it plays to it's strength. Pragmatic developers should took a pragmatic look at such approaches and try to use the powerhouse that are SQL DB engines.

Shameless plug: I've written an article to showcase it's philosophy and how easy it is to get started with PostgREST.

[^1]: https://samkhawase.com/blog/postgrest/postgrest_introduction...

[+] pjmlp|5 years ago|reply

I am on the stored procedure side, and see no value in wasting network bandwidth and client resources for what the RDMS should do itself.

[+] CoreFailure|5 years ago|reply

This is a wonderfully in depth tutorial, thank you for taking the time to write this up! I’ve debated using PostgREST for projects before and I think this will push me to use it on my next one.

[+] astine|5 years ago|reply

"“Database should store data not logic!” — well, no. This is called data-base, not data-bag. Databases are specialized engines to deal with, you guessed it, data. And they are extremely efficient at it. Not to mention - majority of the job backends do, is dealing with data. This is good for prototyping. Unfortunately more often than not — prototypes lands on production with updated connection string and debug=False."

Or, as I like frame the issue, there's a difference between a datastore and a database. Datastores/persistence engines store your data in manner that you can get it back later. Think of a key-value store for example. A database, however, assists you with the management of your data, including helping to ensure correctness of the data, tracking changes, among other things. For most systems lot of work is already managed in-database whether you like it or not.

[+] jrochkind1|5 years ago|reply

Everything old is new again. This was a common way to write client-server software in the pre-web world, clients talking directly to a central rdbms.

I guess the one true thing in software dev is the cycle/pendulum keeps rotating/swinging. Often without people realizing it's swinging back instead of brand new!

[+] sixdimensional|5 years ago|reply

I almost can't believe we are talking about this now. I mean that in the sense of, I have built applications every possible way - when you work in a big company and inherit one of everything, you have to do that often.

In my past, I built one application that was fully database driven, and at the time (2004-2005) it would have been difficult to pull off without stored procedures and table driven logic - fully maximizing the power of SQL - especially in the timeframe I had to do it (<3 months). I mean, I pushed the technology HARD (expert system for fraud detection that worked hand-in-hand with basic machine learning).

I will never forget how I was derided by people for that choice - even though, that system is still running today and working well, in the bowels of an acquirer. I mean, literally, I was derided to the point of getting imposter syndrome for feeling that I made a choice that others regarded as so limiting.

The truth is, I learned everything else about distributed applications and databases because of being derided in that way. Ultimately, I now know how to architect things many different ways and can choose when I feel it is appropriate. I also know not to let the negativity of others prevent success.

I am not sure the moral of this story. If you can build a system, and it serves its purpose well and for a long time, and it works and provides the needed value.. it really may not matter. But, you can choose sometimes, and if there is one truth it must be that there is not always only one way to do something. Try to pick the right tool for the job as best as you are able.

Don't think you're dumb just because other people don't like your idea. Keep your mind open, and be willing to learn, but if you can make it work, and prove it works, you are just as right as anyone else.

Maybe that's the moral of the story.

[+] codeulike|5 years ago|reply

Yes its funny to see this idea come back. It just needs a snappy name. MoSQL instead of NoSQL, something like that.

[+] kemiller2002|5 years ago|reply

Really anything can be a good or a bad idea depending on the problem domain and how it's implemented. He doesn't really go into how he made it work too closely, so it's easy to say it was a success. The real interesting thing to understand would be, "Why was it a success versus other approaches?" What really made it a better choice vs other approaches? What was the team make up? He says that it made the code easy to understand. To whom? Interesting the team made it work, but not enough to say, "Sure let's try it without a lot more forethought."

[+] loldot_|5 years ago|reply

Exactly! If you're writing a C compiler, it would probably be a bad idea to have that logic in a database, though it could be fun to try. On the other hand if your application is summarizing a lot data by different dimensions, it would probably be a bad idea to transfer all that data over the network only to summarize it in an application layer. Most applications have a little bit of both though and so will need both an application layer and database layer.

[+] tacticaldev|5 years ago|reply

I'm currently part of a team trying to UNDO this very concept. It sounded great in the beginning, but after years of headaches and wasted cash; we're building the backend in something else. I wasn't part of the original team. Nor was I part of the decision to migrate off of the system. I just know that for my employer, it was a bad decision those many years ago...

[+] gnat|5 years ago|reply

Could you be more specific about the problems you're hoping to fix/benefits you're hoping to gain by dragging the backend logic out of the database?

[+] pjmlp|5 years ago|reply

Sure, because wasting network bandwith for going through needless data on the client is such a clever idea.

[+] loldot_|5 years ago|reply

What I've found being the problem with this approach is that sql does not lend itself very well to composability. Sure, you can make functions, views and stored procedures and compose them. But when you start actually composing queries of these parts the could lead to different execution plans having wildly different performance characteristics.

Also tooling around SQL, i.e. refactoring tools and debuggers, is not great - if even available at all.

[+] Brentward|5 years ago|reply

Not that this is an important part of the post, but in general parsing the output of ls like the author does in the for loop is a bad idea[1].

I think the one-liner would be better as

  for i in [0-8]*/*.sql; do psql -U <user> -h localhost -d <dbname> -f $i ; done

or even better as something like

  find . -name "*.sql" -exec psql -U <user> -h localhost -d <dbname> -f {} \;

[1] http://mywiki.wooledge.org/ParsingLs

[+] RussianCow|5 years ago|reply

For those who looked at list item #9 ("no code versioning") and thought the author's makeshift solution to migrations seems awful, I highly recommend sqitch[0], an extremely simple but well thought out migrations tool that is platform- and database-agnostic. I'm using it on a side project with success, and haven't run into any major hurdles yet.

[0]: https://sqitch.org/

[+] claytonjy|5 years ago|reply

I'm using this in production for our core Postgres database and am quite happy with it. We've used it to "devops the database" in that all migrations are deployed via CI jobs triggered by git commits.

I do sometimes wonder if Liquibase or Flyway would be easier, but I love the encouragement to write _migration tests_, which do not always look like the unit tests you might write with pg_unit.

Something we struggle with is naming our migrations; the name you assign a migration is the only organizational control you have over it, and it becomes essentially impossible to introspect the current state of the database through the migrations alone.

[+] marcosdumay|5 years ago|reply

The one problem this technique has is that data and software normally have two very different speed and correctness requirements. That means that data and software should evolve following different procedures, what heavily implies that you want different people working at them, or at least you want people to look at them at different times.

For that, you really want independent access controls and CI tooling.

Of course, you can't separate much if you are a 1 or 2 people team at early stages of some project. And it may help you move faster.

But:

> "Rebuilding the database is as simple as executing one line bash loop (I do it every few minutes)"

This denounces a very development-centric worldview where operations and maintenance don't even appear. You can never rebuild a database, and starting with the delusion that you will harm you on the future.

[+] taffer|5 years ago|reply

For this reason you should always separate data and code and put them into separate schemas. In such a setup, deploying new code essentially means dropping and re-creating the code schema in a single transaction.

[+] capableweb|5 years ago|reply

> You can never rebuild a database, and starting with the delusion that you will harm you on the future.

That seems like a silly assumption. Who's to say you can't use databases that you can rebuild from scratch whenever you want? What about append-only logs built with CRDTs and the other ways?

[+] kevincox|5 years ago|reply

The primary issue I have seen with stored procedures is how you update them. I would be curious how they manage that.

Generally when releasing new code you want to do a gradual release so that a bad release is mitigated. It would be possible by creating multiple functions during the migration and somehow dispatching between them in PostgREST but I would be interested to see what they do.

The other obvious concern is scaling which was only briefly mentioned. In general the database is the hardest component of a stack to scale, and if you start doing it do more of the computation you are just adding more load. Not to mention that you may have trouble scaling CPU+RAM+Disk separately with them all being on a single machine.

[+] jandrewrogers|5 years ago|reply

Many scaling problems in databases are because the computation is not done in the database. Outside of exotic cases, good database engines are rarely computation-bound in a competently designed system. By pushing computation outside of the database, it tends to increase the consumption of database resources that actually are bottlenecks, typically effective bandwidth (both memory and I/O).

As a heuristic, moving computation to the data is almost always much more scalable and performant than moving the data to the computation. The root cause of most scalability issues is excessive and unnecessary data motion.

[+] taffer|5 years ago|reply

> The primary issue I have seen with stored procedures is how you update them. I would be curious how they manage that.

1. You put your stored procedures in git.

2. You write tests for your stored procedures and have them run as part of your CI.

3. You put your stored procedures in separate schema(s) and deploy them by dropping and recreating the schema(s). You never log into the server and change things by hand.

[+] capableweb|5 years ago|reply

> Generally when releasing new code you want to do a gradual release so that a bad release is mitigated

Make each stored procedure not depend on any other procedure. If you change one procedure, make sure that every existing procedure can work with both the current and the future version. Once compatible, start upgrading procedures one by one. Add in monitoring and automatic checking of the values during deployment, and you have yourself a system that can safely roll out changes slowly.

I'm not advising anyone to do this, it sounds horribly inefficient to me, but it's probably possible with the right tooling.

[+] marmaduke|5 years ago|reply

I think read replicas scale easily, so if it’s views and such then there is no problem. If you have to compute stuff during writes, then it’s hard, but probably solved by normalizing the data.

[+] dventimi|5 years ago|reply

Many commentators seem to be getting hung up on "stored procedures." That's understandable given the title of the original post, but it's also unfortunate because it mixes independent concerns: business-logic in the database, and using stored procedures. You can do the former without the latter, and in my experience it works better:

* custom data types and domains

* use schema and namespaces

* foreign key constraints

* check constraints

* default values

* views

* functions (not stored procedures)

* triggers

In my experience, you can get a clean, efficient, easily-understood data model and application with those ingredients, without ever having to touch a stored procedure, a looping construct, or a conditional statement.

[+] stmw|5 years ago|reply

Is it just me or is this literally the worst architecture idea that's popped up here? I've seen this done in a few large projects, including some extremely famous products, and it produced endless grief for the engineering team - and ultimately the organization. Among other things, it makes performance scaling much harder to achieve, makes software deployments really scary, and makes it almost impossible to migrate to a different database vendor. It is true that it can be done. But I really don't think you want to.

[+] aszen|5 years ago|reply

I'm very curious about this pattern, but does the tooling exist to make this feasible across a team. One of the big problems with SQL views is their lack of discoverability. The lack of tooling also plagues SQL based systems, I would miss goto definition, find all references and sensible autocompletion. I am curious why no one has built suitable tooling around all this, because it is a great idea for lot's of scenarios

[+] sql_server_fan|5 years ago|reply

Redgate software produce tools for SQL Server that solve some of these problems

[+] throwaway_pdp09|5 years ago|reply

> Also typical database uses 96% of the time for logging, locking etc. and 4% percent for actual data operations

That's a strange claim, in my work it's always been 100% data ops.

> “Letting people connect directly to the database is a madness” — yes it is.

why?

> Killing database by multiple connections from single network port is simply impossible in real life scenario. You will long run out of free ports

Depends entirely on the workload. And it's very possible. All too entirely so.

[+] _benj|5 years ago|reply

In my experience I actually got A LOT more from this approach using postgraphile/Hasura in front of my DB and later moved to dosco/super-graph in order to add custom functionality that was kind of a pain to do on the DB.

I really liked the mixed approach of have the DB do everything that it can and have a light backend that can handle http requests, api callbacks, image manipulation and whatever else.

[+] BenjieGillam|5 years ago|reply

Out of interest (as the PostGraphile maintainer) did you look into https://www.graphile.org/postgraphile/make-extend-schema-plu... for extending the PostGraphile schema to do whatever you need, or were you specifically looking to implement the extensions in Go?

[+] laudable-logic|5 years ago|reply

Maybe sorta off-topic, but does anybody have any stories to tell about successes/fails with the .NET CLR in MSSQL?

I worked on a project that made moderate use of this. Worked alright; biggest problem was convincing DBAs/IT to enable it.

https://en.m.wikipedia.org/wiki/SQL_CLR

[+] somurzakov|5 years ago|reply

I can't think of an example where CLR integration would be beneficial.

Probably integrating ML model inference, written in ML.NET could be use case, but we have SQL with ML Server with r/python support now, so.

The problem with CLR is that you need to know and understand sql engine internals in order to write good C# code for CLR integration, otherwise your clr code will be blocking the sql engine

[+] dang|5 years ago|reply

jgrahamc wrote a fascinating comment about how Cloudflare started this way: https://news.ycombinator.com/item?id=22883548

192 comments