top | item 33187996

(no title)

_jezell_ | 3 years ago

The implementation makes some weird choices like rebuilding a bunch of services like DNS, cert, weird dependency on SQLite. Wish people would stop reimplementing Kubernetes and just build on top of it.

I think "per-user" is probably the wrong killer feature for something like this. Much more potential in shared distributed processes that support multiple users (chat, CRDT/coauthoring). Appears that the underlying layer can probably do that.

In any case, super cool idea, and I hope something like this lands in the serverless platforms from all the major cloud providers. It's always been mind blowing to me that Google Cloud Functions supports websockets without allowing you to route multiple incoming connections from different users to the same process. That simple change would unlock so many useful scenarios.

discuss

paulgb|3 years ago

Thanks for taking the time to look through the architecture. There are definitely some choices that would have seemed weird to me when we set out to build this, but that we did not make lightly.

We actually initially built this on Kubernetes, twice. The MVP was Kubernetes + nginx where we created pods through the API and used the built-in DNS resolver. The post-MVP attempt fully embraced k8s, with our own CRD and operator pattern. It still exists in another branch of the repo[1].

Our decision to move off came because we realized we cared about a different set of things than Kubernetes did. For example, cold start time generally doesn’t matter that much to a stateless server architecture (k8s’ typical use), but is vital for us because a user is actively waiting on each cold start. Moving away from k8s let us own the scheduling process, which helped us reduce cold start times significantly. There are other things we gain from it, some of which I’ve talked about in this comment tree[2]. I will say, it seemed like a crazy decision when I proposed it, but I have no regrets about it.

The point of sqlite was to allow the “drone” version to be updated in place without killing running backends. It also allows (but does not require) the components of the drone to run as separate containers. I originally wanted to use LMDB, but landed on sqlite. It’s a pretty lightweight dependency, it provides another point of introspection for a running system (the sqlite cli), and it’s not something people otherwise have to interact with. I wrote up my thought process for it at the time in this design doc[3].

You’re right about shared backends among multiple users being supported by Plane. I use per-user to convey that we treat container creation as so cheap and ephemeral you could give one to every user, but users can certainly share one and we’ve done that for exactly the data sync use case you describe.

[1] https://github.com/drifting-in-space/plane/tree/original-kub...

[2] https://news.ycombinator.com/item?id=32305234

[3] https://docs.google.com/document/d/1CSoF5Fgge_t1vY0rKQX--dWu...

POPOSYS|3 years ago

Hi Paul, thanks for your explanation - you should add that to the documentation, e.g. in a chapter "Why not K8S?".

Also you should give some advice about how to deploy when the default for deploying apps in an organization is K8S, what might be not too exotic nowadays. Will Plane need it´s own cluster? Does it run on top of K8S? How is the relation to K8S in general for a deployment scenario?

THANKS!

_jezell_|3 years ago

https://developer.ibm.com/articles/reducing-cold-start-times...

Knative has solved most of those pod start time problems since it’s dealing with a similar scenario, unless 0.008s startup time isn’t good enough for you.

schainks|3 years ago

It's funny how SQLite gets so much flak, but every time I've used it in production, it just _worked_.

sandGorgon|3 years ago

u can plug in your own scheduler - https://kubernetes.io/blog/2020/12/21/writing-crl-scheduler/

I think you would get a much higher long term payoff with a custom scheduler. Dask does something like this. Both on scheduling and when it has to "Drain".