Most applications as a whole are absolutely stateful. Individual components of them might not be (app servers are stateless with the DB/Redis containing all state), but the whole app from an external client's perspective is stateful.
If we're talking about reliability/outage recovery, we're considering the application as one single unit visible from the external client's perspective - so everything including the DB (or equivalent stateful component) must be redundant.
Sadly this is also where a lot of cloud-native tooling and best practices fall short. There are endless ways to run stateless workloads redundantly, but stateful/CAP-bound workloads seem to be ignored/handwaved away.
I've seen my fair share of stacks that are doing the right thing when it comes to the easy/stateless parts (redundancy, infinite horizontal scalability), but everyone kinda ignores the elephant in the room which is the CAP-bound primary datastore that everything else depends on, which isn't horizontally scalable and its failover/replication behavior is ignored/misunderstood and untested, and they only get away with it because modern HW is reliable enough that its outage/failover windows are rare enough that the temporary misunderstood/unexpected/undefined behavior during those flies under the radar.
That’s a pretty pedantic interpretation of the word application. In the context of software owned by most teams, that they may decide to run on single vs multiple hosts most applications are absolutely stateless. Most applications outsource state to another system, like a relational database, a managed no-SQL store, or an object store.
And so no, most teams don’t need to worry about the hard problems you bring up.
Is it really an application if it’s not stateful? Maybe you’re managing the state client-side which makes it easier but I wouldn’t call a plain website an application, or am I missing something?
At the smallest level, even every byte of an in-flight HTTP request is still state. State, and for that matter "uptime" really depend on what the application/service ultimately does and what the agreement/SLA with the end-customer is.
The correct high-availability solution should take business requirements into account and there is no silver bullet. Running everything on a $5 VPS is no silver bullet, but neither is your typical "cloud-native" "best practice" stack that everyone keeps cargo-culting which often leads to unnecessary cost while leaving many hard questions (such as replicating CAP-bound stateful databases) unanswered.
Nextgrid|1 year ago
If we're talking about reliability/outage recovery, we're considering the application as one single unit visible from the external client's perspective - so everything including the DB (or equivalent stateful component) must be redundant.
Sadly this is also where a lot of cloud-native tooling and best practices fall short. There are endless ways to run stateless workloads redundantly, but stateful/CAP-bound workloads seem to be ignored/handwaved away.
I've seen my fair share of stacks that are doing the right thing when it comes to the easy/stateless parts (redundancy, infinite horizontal scalability), but everyone kinda ignores the elephant in the room which is the CAP-bound primary datastore that everything else depends on, which isn't horizontally scalable and its failover/replication behavior is ignored/misunderstood and untested, and they only get away with it because modern HW is reliable enough that its outage/failover windows are rare enough that the temporary misunderstood/unexpected/undefined behavior during those flies under the radar.
ardel95|1 year ago
And so no, most teams don’t need to worry about the hard problems you bring up.
echoangle|1 year ago
Nextgrid|1 year ago
The correct high-availability solution should take business requirements into account and there is no silver bullet. Running everything on a $5 VPS is no silver bullet, but neither is your typical "cloud-native" "best practice" stack that everyone keeps cargo-culting which often leads to unnecessary cost while leaving many hard questions (such as replicating CAP-bound stateful databases) unanswered.