top | item 38685393

Fly Kubernetes

272 points| ferriswil | 2 years ago |fly.io

168 comments

order

dangoodmanUT|2 years ago

This is really exciting, but there are a few things they will certainly have to work through:

*Services:*

Kubernetes expects DNS records like {pod}.default.svc.cluster.local. In order to achieve this they will have to have some custom DNS records on the "pod" (fly machine) to resolve this with their metadata. Not impossible, but something that has to be take into account.

*StatefulSets:*

This has 2 major obstacles:

The first is dealing with disk. k8s expects that it can move disks to different logical pods when they lose them (e.g. mapping EBS to an EC2 node). The problem here is that fly has a fundamentally different model. It means that it either has to decide not to schedule a pod because it can't get the machine that the disk lives on, or not guarantee that the disk is the same. While this does exist as a setting currently, the former is a serious issue.

The second major issue is again with DNS. StatefulSets have ordinal pod names (e.g. {ss-name}-{0..n}.default.sv.cluster.local). While this can be achieved with their machine metadata and custom DNS on the machine, it means that it either has to run a local DNS server to "translate" DNS records to the fly nomenclature, or have to constantly update local services on machines to tell them about new records. Both will incur some penalty.

benpacker|2 years ago

Am I understanding correctly that because they map a “Pod” to a “Fly Machine”, there’s no intermediate “Node” concept?

If so, this is very attractive. When using GKS, we had to do a lot of work to get our Node utilization (the percent of resources we had reserve on a VM actually occupied by pods) to be higher than 50%.

Curios what happens when you run “kubectl get nodes” - does it lie to you, or call each region one Node?

btown|2 years ago

GKE Autopilot is an attractive option here if you don't want to worry about node utilization and provisioning. Effectively you have an on-demand infinitely-sized k8s cluster that scales up and down as you need new pods. Some caveats, but it's an incredible onramp if you're coming from a Heroku or similar PaaS and don't want to worry about the infrastructure side of things: Github Actions building images and deploying a Helm chart to GKE Autopilot is a remarkable friendly yet customizable stack. Google should absolutely promote it more than it does. https://cloud.google.com/kubernetes-engine/docs/concepts/aut...

kuhsaft|2 years ago

The node would be a virtual-kubelet. You can check out the virtual-kubelet GitHub repo for more info.

Interestingly, there are already multiple providers of virtual-kubelet. For example, Azure AKS has virtual nodes where pods are Azure Container Instances. There’s even a Nomad provider.

> So that’s what we do. When you create a cluster, we run K3s and the Virtual Kubelet on a single Fly Machine.

So probably a cluster per region. You could theoretically spin up multiple virtual-kubelets though and configure each one as a specific region.

> Because of kine, K3s can manage multiple servers, but also gracefully runs on a single server, without distributed state.

This would mean the control-plane would be on a single-server without high-availability? Although, I suppose there really isn’t any state stored since they are just proxying requests to the Fly Machine API. But still, if the machine went down your kubectl commands wouldn’t work.

chologrande|2 years ago

> Had to do a lot of work to get node utilization ... higher than 50%

How is this the schedulers fault? Is this not just your resource requests being wildly off? Mapping directly to a "fly machine" just means your "fly machine" utilization will be low

verdverm|2 years ago

> we had to do a lot of work to get our Node utilization ... over 50%

Same, a while back you had to install cluster-autoscaler and set it to aggressive mode. GKE has this option now on setup, though I think anyone who's had to do this stuff knows that just using a cluster-autoscaler is never enough. I don't see this being different for any cluster and is more a consequence of your workloads and how they are partitioned (if not partitioning, you'll have real trouble getting high utilization)

robertlagrant|2 years ago

I wonder how it copes with things like anti-affinity rules, where you don't want two things running on the same physical / virtual server for resilience reasons.

arccy|2 years ago

if it is pod per vm, that would make it like EKS Fargate

arccy|2 years ago

is GKS some amalgamation of GKE and EKS

corobo|2 years ago

Is this still a limitation for Fly k8s?

> A Fly Volume is a slice of an NVMe drive on the physical server your Fly App runs on. It’s tied to that hardware.

Does the k8s have any kind of storage provisioning that allows pods with persistent storage (e.g. databases) to just do their thing without me worrying about it or do I still need to handle disks potentially vanishing?

I think this is the only hold-up that stops me actually using Fly. I don't know what happens if my machine crashes and is brought back on different hardware. Presumably the data is just not there anymore.

Is everyone else using an off-site DB like Planetscale? Or just hoping it's an issue that never comes up, w/ backups just in case? Or maybe setting up full-scale DB clusters on Fly so it's less of a potential issue? Or 'other'?

tptacek|2 years ago

Not speaking for the FKS case, but in general for the platform: when you associate an app with a volume, your app is anchored to the hardware the volume is on (people used to use tiny volumes as a way to express hard-locked region affinity when we were still using Nomad). So if your Fly Machine crashes, it's going to come back on the same physical as the volume lives on.

We back up volumes to off-net block storage, and, under the hood, we can seamlessly migrate a volume to another physical (the way we do it is interesting, and we should write it up, but it's still also an important part of our work sample hiring process, which is why we haven't). So your app could move from one physical to another; the data would come with it.

On the other hand: Fly Volumes are attached storage. They're not a SAN system like EBS, they're not backed onto a 50-9s storage engine like S3. If a physical server throws a rod, you can lose data. This is why, for instance, if you boot up a Fly Postgres cluster here and ask us to do it with only one instance, we'll print a big red warning. (When you run a multi-node Postgres cluster, or use LiteFS Cloud with SQLite, you'd doing at the application layer what a more reliable storage layer would do at the block layer).

asim|2 years ago

And fly becomes the standard cloud provider like everyone else. I think this transition is only natural. It's hard to be a big business without catering to the needs of larger companies and that is the operation of many services, not individual apps.

tptacek|2 years ago

Nothing is changing for anybody who doesn't care about K8s. If you're not a K8s person, or you are and you don't like K8s much, you shouldn't ever touch FKS.

therein|2 years ago

I used Fly for some projects, I really like it.

But once again, for many of my projects, I still need my outbound IPs to resolve to a specific country. I can't have them all resolve to Chicago, US in undeterministic ways.

I would be willing to pay an additional cost for this but even with reserved IPs, I am given IPs that are labelled as Chicago, US IPs by GeoIP providers even for non US regions.

verdverm|2 years ago

If they are reluctant and only do it because they have to, are they really the right vendor for managed k8s?

What about them makes for a good trade-off when considering the many other vendors?

tptacek|2 years ago

We're not a K8s vendor. We're a lower-level platform than that. If all you care about is K8s, and no part of the rest of our platform is interesting to you --- the global distribution and Anycast, the fly-proxy features, the Machines API --- we're not a natural fit for what you're doing.

We were surprised at how FKS turned out, which is part of why we decided to launch it as a feature and all of why we wrote it up this way. That's all.

grossvogel|2 years ago

I'm excited about this as a way to configure my Fly.io apps in a more declarative way. One of my biggest gripes about Fly.io is that there's a lightly documented bespoke config format to learn (fly.toml), and at the same time there's a ton of stuff you can't even do with that config file.

I love Kubernetes because the .yaml gives you have the entire story, but I'd _really_ love to get that experience w/o having to run Kubernetes. (Even in most managed k8s setups, I've found the need to run lots of non-managed things inside the cluster to make it user-friendly.)

frenchman99|2 years ago

Probably good for people already used to fly or interested in fly for other reasons, that could also use k8s ?

Sometimes you just want to run k8s without thinking too much about it, without having all the requirements that gcp have answers to.

szundi|2 years ago

If their reluctance were based on valid reasons that they handled in a unique way - might be good. In theory.

paxys|2 years ago

Maybe a got fit for someone who is reluctant to use Kubernetes but has to for whatever reason.

motoboi|2 years ago

There is a very high price to pay when going with your own scheduling solution: you have to compete with the resources google and others are throwing at the problem.

Also, there is the market for talent, which is non-existent for fly.io technology if it's not open source (I see what you did here, Google): you'll have to teach people how your solution works internally and congratulations, now you have a global pool of 20 (maybe 100) people that can improved it (if you have really deep pockets, maybe you can have 5 Phd). Damn, universities right now maybe have classes about Kubernetes for undergrad students. Will they teach your internal solution?

So, if a big part of your problem is already solved by a gigantic corporation investing millions to create a pool of talented people, you better take use of that!

Nice move, fly.io!

nojvek|2 years ago

What if it’s really not that complicated, and by adding more people you make it more complex. So complex that you need even more people to maintain that complexity?

I love fly.io for rethinking some of the problems.

kuhsaft|2 years ago

How does this handle multiple containers for a Pod? In a container runtime k8s, containers within a pod share the same network namespace (same localhost) and possibly pid namespace.

The press release maps pods to machines, but provides no mapping of pod containers to a Fly.io concept.

Are multiple containers allowed? Do they share the same network namespace? Is sharing PID namespace optional?

Having multiple containers per pod is a core functionality of Kubernetes.

remram|2 years ago

You can use mount namespaces, or even containers in your VM. Maybe that's how?

javaunsafe2019|2 years ago

Why should you do this - sounds like an antipattern to me

thowrjasdf32432|2 years ago

Great writeup! Love reading about orchestration, especially distributed.

> When you create a cluster, we run K3s and the Virtual Kubelet on a single Fly Machine.

Why a single machine? Is it because this single fly machine is itself orchestrated by your control plane (Nomad)?

> ...we built our own Rust-based TLS-terminating Anycast proxy (and designed a WireGuard/IPv6-based private network system based on eBPF). But the ideas are the same.

very cool, is this similar to how Cilium works?

nathancahill|2 years ago

Man, I just wish they'd work on stability. Fly.io is an amazing offering. But it's so buggy, it's almost more headache than it's worth trying to build PaaS-flavored software on it. Even the Fly docs are "buggy" since they mostly transitioned to v2 Machines but the docs are still a mix of Nomad and Machines.

There's so much power on the platform with Flycast, LiteFS and other clever ways to work with containers. If it was 90% stable I'd consider it a huge win.

rozenmd|2 years ago

I agree - I find if you pick the "mainstream" regions like IAD you get close to 100% uptime, like what you see from my 3rd-party status page here: https://flyio.onlineornot.com/

Once you start deploying in SIN/CDG etc you start to get really weird instability (and this is on v2 machines).

edude03|2 years ago

I'm confused about what this is actually offering (also very tired due to some flight problems; anyway)

To me, I'd imagine kubernetes on fly as running kind (kubernetes in docker) with fly converting the docker images to firecracker images OR "normal" kubernetes api server running on one machine then using CAPI/or a homegrown thing for spinning up additional nodes as needed.

So, what's the deal here? Why k3s + a virtual kublet?

tptacek|2 years ago

You can certainly boot up your own K8s cluster, any way you'd like to, just by enlisting a bunch of Fly Machines and configuring them yourself. A Fly Machine is just a VM, and you have root in the VM. You can set up systemd, you can set up Docker, you can run kubelets on all your Machines.

The thought here is: Fly.io already does a lot of the things any K8s distribution would do. If you were to boot up a complete K8s distribution on your own Fly Machines, running oblivious to the fact that they were on Fly.io, you'd be duplicating some of the work we'd already done (that's fine, maybe you like your way better, but still, bear with me).

So, rather than setting up a "vanilla" K8s that works the same way it would if you were on, like, Hetzner or whatever, you can instead boot up a drastically stripped down K8s (based on K3s and Virtual Kubelet) that defers some of what K8s does to our own APIs. Instead of a cluster of scheduling servers synchronized with Raft, you just run a single SQLite database. Instead of bin-packing VMs with Docker and a kubelet, you just run everything as an independent Fly Machine.

We took the time to write about this because it was interesting to us (I think we expected a K8s to be more annoying for us to roll, and when it was easier we got a lot more interested). There are probably a variety of reasons to consider alternative formulations of K8s!

siliconc0w|2 years ago

Always look forward to reading the fly.io blog write-ups. As much as people hate it, K8s has become the defacto operating system for the cloud so it makes sense to support it.

0xbadcafebee|2 years ago

I like the discussion on scheduling. One of the things I've thought recently is that, since there's no one model of how an app or system should work, nor one network architecture, there shouldn't be one scheduler.

Instead, I think the system components should expose themselves as independent entities, and grant other system components the ability to use them under criteria. With this model, any software which can use the system components' interfaces can request resources and use them, in whatever pattern they decide to.

But this requires a universal interface for each kind of component, loosely coupled. Each component then needs to have networking, logging, metrics, credentials, authn+z, configuration. And there needs to be a method by which users can configure all this & start/stop it. Basically it's a distributed OS.

We need to make a standard for distributed OS components using a loosely coupled interface and all the attributes needed. So, not just a standard for logging, auth, creds, etc, but also a standard for networked storage objects that have all those other attributes.

When all that's done, you could make an app on Fly.io, and then from GCP you could attach to your Fly.io app's storage. Or from Fly.io, send logs to Azure Monitor Logs. As long as it's a standard distributed OS component, you just attach to it and use it, and it'll verify you over the standard auth, etc. Not over the "Fly.io integration API for Log Export", but over the "Distributed OS Logging Standard" protocol.

We've got to get away from these one-off REST APIs and get back to real standards. I know corporations hate standards and love to make their own little one-offs, but it's really holding back technological progress.

verdverm|2 years ago

You're basically describing Kubernetes and why it has become so popular

robertlagrant|2 years ago

> I know corporations hate standards and love to make their own little one-offs, but it's really holding back technological progress.

Corporations create standards all the time, either directly or through standards bodies, that they also fund. You can already push logs with syslog, or transform them with Beats then push them; you can already attach storage from elsewhere, etc etc. It's just often a bad idea to for performance and data movement cost reasons.

I don't see the major technological progress this holds back, and if you think technological progress is a measure of how much corporations hate standards, then by that logic, based on the last 50 years of utterly insane progress, they must love standards.

rileymichael|2 years ago

Having little experience with k3s, how big of a workload (“nodes” aka virtual kubelets, pods, crds, etc) can you have before saturating the non-HA control plane becomes a concern?

figassis|2 years ago

This looks interesting, but I run a bare metal k8s cluster over wire guard for independence. Not willing to rely on a nonstandard api/platform. Current provider annoys me and I’m shutting down nodes the next day. Probably could not do that on FKS.

tootie|2 years ago

This is impressive, but also seems to fly in the face of their raison d'etre. I don't even bother with k8s on AWS because it's too complex for even a mid-size operation. Isn't the point of PaaS to obscure complexity?

tptacek|2 years ago

We're not replacing Fly.io and the Fly Machines API and the Fly Launch stuff in `flyctl` with FKS. FKS is just there for people who want a K8s interface. If you're not interested in K8s at all, you shouldn't touch FKS.

swozey|2 years ago

Most of these PaaS are just abstracting their k8s away from you in the end anyway. But they'd never tell you that, they need to be able to switch back to Mesos or whatever the market heads to in 10 years without scaring customers.

Dowwie|2 years ago

Wouldn't it have cost less to enhance the Nomad scheduler rather than move to, and enhance, Kubernetes?

This aside, Fly is in a position to build its own alternative to K8s and Nomad from scratch, so maybe it will?

tptacek|2 years ago

We absolutely have not moved to K8s. We've just added a feature that lets you run K8s, in a particularly simple configuration, if K8s is what you want. If you weren't already interested in using K8s, you shouldn't touch FKS.

The ordinary way someone would boot up an app on Fly.io is to visit a directory in their filesystem with a Rails or Django or Express app or something, or a Dockerfile, and just type `flyctl launch`. No K8s will be involved in any way. You have to go out of your way to get K8s on Fly.io. :)

imjonse|2 years ago

They have for their infrastructure, as I understood from this and previous blogs. This is for their user-facing offering. It makes sense if people are using other cloud K8S solutions and want to migrate without rethinking too much of their existing architecture.

alpb|2 years ago

I kind of miss the point of this. So if I'm reading this right, fly.io practically only exposes the Pods API, but Kubernetes is really much more than that. I'm not very familiar with any serious company that directly uses Pods API to launch containers, so if their reimplementation of Pods API is just a shim, and they're not going to be able to implement ever-growing set of features in Kubernetes Pod lifecycle/configuration (starting from /logs, /exec, /proxy...) why even bother branding it Kubernetes? Instead they could do what Google does with Cloud Run (https://cloud.run/) which Fly.io is already doing?

I don't know why would anyone would be like "here's a container execution platform, let me go ahead and use their fake Pods API instead of their official API".

tptacek|2 years ago

This is a good comment. More like this!

Right now, the immediate things you'd get out of using FKS are:

* The declarative K8s style of defining an app deployment, and some of the K8s mechanics for reconciling that declaration to what's actually running. We did most of this stuff before when we were backed on Nomad, but less of it now with Fly Machines. If you missed having a centralized orchestrator, here's one.

* Some compatibility with K8s tooling (we spin up a cluster, spit out a kubeconfig file, and you can just go to town with kubectl or whatever).

This is absolutely not going to let you do everything you can possibly do with K8s! Maybe we'll beef it up over time. Maybe not many people will use it, because people who want K8s want the entire K8s Cinematic Universe, and we'll keep it simple.

Mostly: we wrote about it because it was interesting, is all that's happening here.

I think you asked a super good question, and "I don't know, you might be right" is our genuine answer. Are there big things this is missing for you? (Especially if they're low-hanging fruit). I can (sort of) predict how likely we are to do them near term.

kuhsaft|2 years ago

I think there’s potential here.

It is Kubernetes since they are running k3s as the control-plane. It’s not just an implementation of the Pod API, it’s an implementation of kubelet which handles logs/exec/etc APIs. The rest of the Kubernetes API is part of the control-plane on k3s.

The only major issue I see is persistent volume support, but persistent volumes in Kubernetes were always a bit flaky and I’ve always preferred to use an externally managed DB or storage solution.

gigapotential|2 years ago

Nice!

Was there an internal project name for this? Fubernetes? f8s? :D

tptacek|2 years ago

The internal project name was FKS. How could you do better than fks? :)

qdequelen|2 years ago

Do you handle high throughput volumes? I would need this for testing to host a database service at scale.

4ggr0|2 years ago

i definitely want to try this! never really worked with kubernetes, because it always seemed too complicated, for what i needed. after using fly.io for my first real web project in a while, they do seem to provide exactly what i want from a "hoster".

Kostic|2 years ago

Well, that's a surprise. Glad to see that the team is flexible and willing to change. :)

imjonse|2 years ago

Apples to oranges, but it has a similar vibe to when Deno added npm compat eventually.

znpy|2 years ago

> But, come on: you never took us too seriously about K8s, right?

What a strange way to admit they were wrong.

tptacek|2 years ago

Is that what we did here? You get that this is just a `flyctl` feature and some Dockerfiles, right? You could have built FKS yourself by forking `flyctl`.

xgbi|2 years ago

I have so many questions, it is a very good article!

My most important one is this: can I build a distributed k8s cluster with this?

I mean having fly machines in Europe, US and Asia acting as a solid k8s cluster and letting the kube scheduler do its job?

If yes then it is better than what the current cloud offerings are, with their region-based implementation.

My second question is obviously how is the storage handled when my workload migrates from the US to Europe: so I still profit from NVME speeds? Is it replicated synchronously?

Last but not least: does it support RWM semantics?

If all the answers are yes, kudos, you just solved many folk’s problems.

Stellar article, as usual.

k__|2 years ago

Wen custom OS?

netshade|2 years ago

I am a current Fly customer (personal and work), and have been happy with the service. Will likely be trying this out. That said, the marketing tone of this final part of the blog:

> More to come! We’re itching to see just how many different ways this bet might pay off. Or: we’ll perish in flames! Either way, it’ll be fun to watch.

is like nails on the chalkboard for me.

tick_tock_tick|2 years ago

Why? If you're using something like Fly you should 100% always have a fallback plan ready. You are gambling using smaller players to get cheaper services or some other benefit the big players don't offer in exchange for the very real possibility of a random day they announce 30 days til they permanently shutdown with zero migration path.

I don't think it's in poor taste to acknowledge exactly what everyone should understand and be prepared for.

tptacek|2 years ago

Interesting! We're mostly not kidding about that. We launched in 2020 with a scheduler that looks a lot like how K8s works†. We ran into scaling issues. Instead of scaling a globally coordinated "eye in the sky" scheduler, like Nomad and K8s offers, we relaxed a constraint ("when you ask to run a job, we'll move heaven and earth to put it somewhere") and wound up with a totally different scheduling model (a market-based system that bids on resources, where requests to place jobs are all effectively fill-or-kill limit orders).

This was a bet. We're bullish about this bet! Even without K8s, having core scheduling be "less reliable" but with a simpler, more responsive interface puts us in a position to do some of the "move heaven and earth" work that K8s and Nomad do in simpler components (like: we can write Elixir code to drive the scheduler).

But it might not pay off! That's what makes it a bet.

(see: comments on this thread asking why overengineered and wrote out own version of stuff; the expectation that you'd run a platform like Fly.io on standard K8s or Nomad is pretty strong!).

lkjadflkj4|2 years ago

Some people have a cheeky sense of humor. Counterpoint: I'm ok with it.

roozbeh18|2 years ago

I made the same bet with cloudways which is now owned by digitalocean. they filled a gap for me, and I was ok if they decided to close shop; I am glad it didn't go that direction, and they are part of a bigger company that also was once a small company, but they are now publicly traded. you make your bets...

hitpointdrew|2 years ago

> To keep things simple, we used Nomad, and instead of K8s CNIs, we built our own Rust-based TLS-terminating Anycast proxy (and designed a WireGuard/IPv6-based private network system based on eBPF).

That is quite the opposite of “simple”. That is in fact, overly complex and engineered.

xwowsersx|2 years ago

How do you know their own Anycast proxy isn't simpler than K8s CNIs? Building something yourself isn't necessarily overly complex or over engineered. Sometimes building a simple thing yourself is the way to simplicity when the only available options already built are very heavy/overkill or complex

tptacek|2 years ago

What part of it is overly complex and engineered? Maybe you're right, but it's hard to respond without a better idea of what you think our problem domain was.

swozey|2 years ago

This is all very common platform/infrastructure stuff for any PAAS. Even more-so as multi-tenant k8s (and nics, and nvmeOF, etc) isn't exactly one of the most supported or talked about things. Lots of secret sauce everywhere, but they have to do it in a lot of scenarios.

joshuamcginnis|2 years ago

Why should one use kubernetes? Or rather, at what point of an apps growth cycle does k8s become appropriate?

oceanplexian|2 years ago

Kubernetes is popular because it solves problems at a certain scale. It's not for super small environments because you need a number of infrastructure engineers to manage it. But if you have a few hundred or thousand employees and don't want to write your own orchestration, it makes sense.

That said, it's a questionable design choice when you get to a hyperscale environment, since all the primitives are extremely opinionated and have design and scalability issues with service discovery, networking, and so on. All the controllers had to be rewritten, we had to roll our own deployment system, our own service discovery system, our own load balancing, and so on. But if you reach this level, you're probably making a lot of money and can figure out how to solve your problems.

erulabs|2 years ago

Kubernetes is not really meant to assist apps themselves. It's a tool for organizations with multiple independent development teams which helps define a single source of truth for whats running where.

Kubernetes is a great fit for even extremely simple applications - assuming you have dozens to keep track of and dozens of developers who want to make changes to them.

jen20|2 years ago

> Or rather, at what point of an apps growth cycle does k8s become appropriate?

The real problem is that the point it becomes attractive to have something like Kubernetes is not too far from the point where Kubernetes becomes an overly-complex mess of disparate parts.

politelemon|2 years ago

I'd say, not in an app's growth cycle, but when an organization wants to manage and scale platforms for itself, on which it runs apps, is when k8s becomes appropriate. In other words, k8s is a platform builder.

thorawy7|2 years ago

I ditched k8s and imported an eBPF library into my project. When certain conditions are met I fork logic, and scale back as needed. I haz a v8-like engine built into my project.

Not needing a bloated black box sysadmin framework (aside from Linux itself, which is plenty bloated and over engineered) is a huge time saver. And the eBPF libs have a lot of eyes on them.

IMO sysadmin and devops are done for. They lasted this long to “create jobs”.

syrusakbary|2 years ago

This is one of the biggest footguns of a tech company I've seen in the last decade.

Time will tell if embracing the complexity of Kubernetes was a good play for them or not. But, in all honesty, I'm pretty sad to see this happening, although I'm sure they had their reasons.

jeromegn|2 years ago

We don't use k8s and you don't have to either. This is for current and future users who absolutely want k8s. We are a compute provider after all and making it easy to host a great variety of apps is good for our users.

whalesalad|2 years ago

Kubernetes is really epic and powerful if you actually take the time to understand it from first principles. Unfortunately people don't do this, and individuals without good networking/devops experience roll something half-baked out with a terrible deployment process, a mess of helm charts, etc... and it ends up being hated by everyone.

At FarmLogs (yc 12) we had a pretty righteous gitops (homegrown) kube platform running dozens of microservices. We would not have been able to move as quickly as we did and roll out so many different features without it. This was back when people had just started to adopt it. Mesos was still a contender (lmao). We were polyglot too - python/clojure mixture. Heck, we even ran an ancient climate model called APSIM that was built in c#/mono, required all kinds of ancient fortran dependencies etc and it worked like a charm on kube thanks to containers. We had dedicated internal load balancers behind our VPN for raw access to services and endpoints, like "microservice.internal.farmlogs.com" (this was before istio, fabric networks, all the incredible progress that exists now)

I recall Brendan Burns asking me to write up a blog post for the Kube blog about our success story, but unfortunately was so saddled with product dev work and managing the team that I never found time for it.

I will absolutely adopt K8s again one day (very soon) but you need to know how to harness its capabilities and deploy it correctly. Build your own Heroku that fits your business. Use the Kube API directly. It's really not hard. It gets hard due to all the crap in the ecosystem (helm, yaml files). Hitting API direct means no yaml =)

I am stoked to see Fly offering this.

vidarh|2 years ago

I'm guessing this is one of the areas where sticking to a vision loses out over winning the most business in the short term.