top | item 36744090

How I run my servers (2022)

430 points| ingve | 2 years ago |blog.wesleyac.com | reply

174 comments

order
[+] matthews2|2 years ago|reply
> In order to provide isolation, I run each service as its own unix user account.

systemd's DynamicUser feature could save some time here. It can allocate a uid, then create directories for logs/state with the correct permissions.

https://0pointer.net/blog/dynamic-users-with-systemd.html

[+] chlorion|2 years ago|reply
It's really easy to create new "system" users with systemd-sysusers too, if you need the uid to be persistent!

You just drop a small text file (often a single line) into /etc/sysusers.d/ with the information about the user, like username, home directory and whatever, and then invoke the sysusers command or service!

[+] cardamomo|2 years ago|reply
Thanks for sharing this! My server setup is similar to the one described in this article, minus isolating apps with separate users. I'll give dynamic users a go next time I tweak the setup.
[+] CHSbeachbum420|2 years ago|reply
Pretty standard to use separate service principals for each service/app. Should also use separate servers
[+] bob1029|2 years ago|reply
HTTP triggered cloud functions are my new favorite thing. They can evaporate complexity if you dance around the various vendors carefully enough. This is the only cloud-native abstraction that feels "end game" to me. I still haven't been able to deploy a cloud function and get the "runner" into a state where I'd have to contact support or issue arcane console commands. I've done well over 2000 deploys by now for just one app alone with a 100% success rate.

Performance is fantastic (using isolated app service plans in Azure), and I am completely over the ideological principals against per-request billing. Absolutely, you can do it cheaper if you own the literal real estate the servers are running inside of. Paying for flat colo fees makes per-request costs look ludicrous on the surface. But, achieving all of the other attributes of simple HTTP triggered functions in a DIY context is very challenging without also spinning up a billion dollar corporation and hiring 1k more people. Compliance, audits, etc are where it gets real super fast.

The "what about lock-in?" argument doesn't work for me anymore. HTTP triggers are a pretty natural interface to code against: "I've got a HTTP request for you to review, give me some HTTP response please". The only stuff that is vendor-specific are the actual trigger method signatures and specific contexts beyond HTTP, such as OIDC or SAML claims. You'd really have to go out of your way to design an HTTP trigger web app/API solution that is impossible to refactor for another vendor within a week or so.

If your business is a personal blog, then yeah I get it. It's more fun to buy a VM in Hetzner and get all artisanal about it. Also, if you are operating in a totally unregulated industry, perhaps you can make some stronger arguments against completely outsourcing the servers in favor of splitting hairs on margin vs complexity.

[+] susam|2 years ago|reply
I have a similar setup for my personal and project websites. Some similarities and differences:

* I use Linode VMs ($5/month).

* I too use Debian GNU/Linux.

* I use Common Lisp to write the software.

* In case of a personal website or blog, a static website is generated by a Common Lisp program. In case of an online service or web application, the service is written as a Common Lisp program that uses Hunchentoot to process HTTP requests and return HTTP responses.

* I too use systemd unit files to ensure that the website/service starts automatically when the VM starts or restarts. Most of my unit files are about 10-15 lines long.

* The initial configuration of the VM is coded as a shell script: https://github.com/susam/dotfiles/blob/main/linode.sh

* Project-specific or service-specific configuration is coded as individual Makefiles. Examples: https://github.com/susam/susam.net/blob/main/Makefile and https://github.com/susam/mathb/blob/main/Makefile

* I do not use containers. These websites have been running since several years before containers were popular. I have found that the initialization script and a Makefile have been sufficient for my needs so far.

* I use Nginx too. Nginx serves the static files as well as functions as a reverse proxy when there are backend services involved. Indeed TLS termination is an important benefit it offers. Other benefits include rate limiting requests, configuring an allowlist for HTTP headers to protect the backend service, etc.

* I have a little private playbook with a handful of commands like:

  curl LINK -o linode.sh && sh linode.sh
  git clone LINK && cd PROJECT && sudo make setup https
* The `make` targets do whatever is necessary to set up the website. This includes installing tools like Nginx, certbot, sbcl, etc., setting up Nginx configuration, setting up certificates, etc. Once the `make` command completes, the website is live on the world wide web.
[+] schemescape|2 years ago|reply
Which Common Lisp implementation do you use? If it’s SBCL, has memory usage been a problem? Edit: I see SBCL in one of your Makefiles.

I use a VPS with 512 MB of RAM, but each SBCL instance uses roughly 100 MB of RAM, so I can only have a couple services at once.

I’ve considered moving the lowest traffic services to CLISP, but it’s missing at least one feature I use (package—local nicknames).

[+] RunSet|2 years ago|reply
But however will you scale to 14 billion users when, one morning, waking up from anxious dreams, you discover that in bed you have been changed into a monstrous verminous 'berg?
[+] michaelsalim|2 years ago|reply
Over the years, I kept tweaking my setup and now settled with running everything as a docker container. The orchestrator is docker-compose instead of systemd. The proxy is caddy instead of nginx. But same as the author, I also write a deploy script for each project I need to run. Overall I think it's quite similar.

One of the many benefits of using docker is that I can use the same setup to run 3rd party software. I've been using this setup for a few years now and it's awesome. It's robust like the author mentioned. But if you need the flexibility, you can also do whatever you want.

The only pain point I have right now is on rolling deployment. As my software scales, a few second of downtime every deployment is becoming an issue. I don't have a simple solution yet but perhaps docker swarm is the way to go.

[+] elitan|2 years ago|reply
I do the same as you using Caddy.

To avoid downtime try using:

    health_uri /health
    lb_try_duration 30s
Full example:

    api.xxx.se {
      encode gzip
      reverse_proxy api:8089 {
        health_uri /health
        lb_try_duration 30s
      }
    }
This way, Caddy will buffer the request and give 30 seconds for your new service to get online when you're deploying a new version.

Ideally, during deployment of a new version the new version should go live and healthy before caddy starts using it (and kills the old container). I've looked at https://github.com/Wowu/docker-rollout and https://github.com/lucaslorentz/caddy-docker-proxy but haven't had time to prioritize it yet.

[+] 9dev|2 years ago|reply
I‘ve built up the software stack of the startup I work for from the beginning, and directly went for Docker to package our application. We started with compose in production, and improved by using a CD pipeline that would upgrade the stack automatically. Over time, the company and userbase grew, and we started running into the problems you mention: Restarting or deploying would cause downtime. Additionally, a desire to run additional apps came up; every time, this would necessitate me preparing a new deployment environment. I dreaded the day we’d need to start using Kubernetes, as I’ve seen the complexity this causes first-hand before, and was really weary of having to spend most of the day caressing the cluster.

So instead, we went for Swarm mode. Oh, what a journey that is. Sometimes Jekyll, sometimes Hide. There are some bugs that simply nobody cares fixing, some parts of the Docker spec that simply don’t get implemented (but nobody tells you), implementation choices so dumb you’ll rip your hair out in anger over, and the nagging feeling that Docker Inc employees seem incapable to talk to each other, think things through, or stay focused on a single bloody task for once.

But! There is also much beauty to it. Your compose stacks simply work, while giving you opportunities to grow in the right places. Zero-downtime deployments, upgrades, load balancing, and rollbacks work really well if you care to configure them properly. Raft is as reliable in keeping the cluster working as everywhere else. And if you put in some work, you’ll get a flexible, secure, and automatically distributed, self-service platform for every workload you want to run - for a fraction of the maintenance budget of K8s.

Prepare, however, for getting your deployment scripts right. I’ve spent quite a while to build something in Python to convert valid Docker-spec compose files to valid Swarm specs, update and clean up secrets, and expand environment variables.

Also, depending on your VPS provider, make sure you configure network MTU correctly (this has shortened my life considerably, I’m sure of it).

[+] efrecon|2 years ago|reply
I do the same. Swarm is the way to go since you already have compose files, but I have made the choice that it is not worth it. Until you hit scaling issues (as in many customers/users).
[+] dirkhe|2 years ago|reply
I built a similar setup but I don't like to push the images with docker save and docker import over ssh. Do you run your own registry?
[+] BossingAround|2 years ago|reply
How often do you rebuild your containers?
[+] arun-mani-j|2 years ago|reply
My physical server:

Podman Pods (which contains PostgreSQL database and the app) all running in localhost on ports > 5000 and Caddy running on 443 as reverse proxy.

I use systemd Timer to dump all the databases at 4:55 PM in a directory. Then there is DejaDup [1] which automatically backs up $HOME (with no cache files of course) at 5 PM daily to external HDD. This backup includes the database dumps.

The OS is Debian with GNOME Core [2] and a firewalld rule to allow only 80, 443 and a customized SSH port. The SSH is key based with no password auth.

The most boring way but it just works :D

1 - https://flathub.org/apps/org.gnome.DejaDup

2 - https://packages.debian.org/bookworm/gnome-core

[+] reidrac|2 years ago|reply
Out of curiosity, why do you run Gnome on your sever?
[+] AlexITC|2 years ago|reply
Any reason to use systemd timers instead of cron jobs?
[+] hk__2|2 years ago|reply
Nowadays I just use one server with a Dokku setup. It’s easy to manage, easy to deploy for devs (just git push, the Heroku way), and it has a lot of plugins so it takes max 10s to add a database or set HTTPS up.
[+] benkaiser|2 years ago|reply
I use this method also, super convenient to be able to git push. Adding apps, databases managing environment variables and managing domains is all very straight-forward.
[+] ValtteriL|2 years ago|reply
As a fan of simple setups, this looks enjoyable to work with! It is probably good enough for 99% of services.

I think I would use Ansible to setup the servers and use it for the deployment script as well.

This would document the servers and make deployment script perhaps simpler.

I wouldn't shy away from accessing the servers manually when debugging or checking things, though.

[+] oaiey|2 years ago|reply
People sometimes forget that CI/CD and effective server management was common practice before the cloud :)
[+] bryancoxwell|2 years ago|reply
> The server software is written in Rust. It's statically linked, and all of the html, css, config, secrets, etc are compiled into the binary.

I’ve recently taken to doing this in Go and absolutely love how easy it makes writing and deploying software that depends on static files.

[+] mdtusz|2 years ago|reply
Including secrets in the compiled binary seems questionable still - using env variables or a config is the "standard" way for secrets, and although it adds another step before you can run, it avoids the case of sharing your binary with someone and forgetting that you had compiled in some secret that goes unnoticed. Unpacking a binary to find strings is pretty trivial.

Having the static frontend assets baked in along with a default config is a huge boon though.

[+] d3nj4l|2 years ago|reply
I am not a fan of go but I find myself using it for this reason. Doing it with rust - especially cross compiling from mac to linux - is relatively painful, while with Go it is trivial and built into the go tool. It makes it so, so easy to remove any friction from finishing and deploying a side project.
[+] zX41ZdbW|2 years ago|reply
I'm doing the same way for https://play.clickhouse.com/play?user=play

But there is one question. The article says:

> I get my HTTPS certs from Let's Encrypt via certbot — this handles automatic renewal so I don't have to do anything to keep it working.

But I'm using cross-region setup with two servers and a geo-DNS. With this setup, the certbot only works for the server, located in the US, and I have to manually copy the certificates to the server in Europe. Any idea how to overcome this?

PS. Read about ClickHouse Playground here: https://ghe.clickhouse.tech/

[+] robmccoll|2 years ago|reply
Yes! I do the same. I serve my web applications from the same statically compiled service that serves the backend API. In CI, I run an npm build process then embed the output. Makes running a local test or demo instance a snap.
[+] AlexITC|2 years ago|reply
Interesting post, I liked the deploy script that keeps the app versioned in the server, which is helpful to do rollbacks.

I have been running a similar setup for many years with some differences:

1. Use `EnvironmentFile` on systemd to load environment variables instead of bundling secrets into the binary.

2. Set `LimitNOFILE=65535` on the service to avoid reaching the file open limit on the app.

3. Set `StandardError=journal` and `StandardOutput=journal` so that `journalctl` can display the app logs.

4. Use postgres instead of sqlite, DO is taking regular backups for me and postgres maintenance is almost null for simple apps.

5. Nginx can have password-protected endpoints, which are useful to expose the app logs without requiring to ssh into the VM.

6. Nginx can also do pretty good caching for static assets + API responses that barely change, this is very helpful for scaling the app.

At last, I use ansible but I'm considering if its worth it, replacing it seems simple and I'd be able to keep a single deploy file that runs faster than ansible.

[+] flagged24|2 years ago|reply
Once every 2 or 3 years I configure a new VPS with the latest Ubuntu LTS server release, install latest PostgreSQL, NGINX, Redis, Node.js. No containers, just standard Linux user accounts for each app. Pretty boring to be honest, but I don't have a problem that requires more complexity. I once tried a more complex distributed approach with load balancers and multiple VPN providers. Turned out the added complexity was the cause for instability and downtime.
[+] scottmas|2 years ago|reply
No one is talking about redundancy though. I love setups like this but prod environments need robust forms of redundancy. Cloud run, k8s, and their ilk are extremely distasteful I’ll grant you (the added complexity and cost almost never are worth it. And don’t get me started on the painfully slow prod debug cycles…) but the redundancy and uptime of them just can’t be beat with a setup like this.

Also, none of the solutions discussed here gracefully handle new connections on the new service while waiting for all connections to terminate before shutting down the old service. Maybe some of the more esoteric Ansible do idk.

I TRULY want the simplicity of setups like discussed here, but I can’t help but think it’s irresponsible to recommend them in non hobbyist scenarios.

[+] adamckay|2 years ago|reply
You have to decide whether the complexity and cost of a fully redundant system is worth it and consider it against what your SLA is, especially if your redundancy increases the risk of something going wrong because of that extra complexity.

From personal experience in B2B web apps, a lot of sales/business MBA type's will say they need 100% uptime, but what they actually mean is it needs to be available whenever their customer's users want to access it, and their users are business users that work 9-5 so there's plenty of scope for the system to be down (either due to genuine outage or maintenance/upgrades).

You've possibly also got the bonus of the people that use the app are different to the people that pay for it, so you've also got some leeway in that your system can blip for a minute and have requests fail (as long as there's no data loss), and that won't get reported up the management chain of the customer, because hitting F5 30 seconds later springs it back into life and so they carry on with their day without bother firing an email off or walking over to their bosses desk to complain the website was broken for a second.

At a previous company we deployed each customer on their own VM in either AWS or Azure, with the app and database deployed. It was pretty rare for a VM to fail, and when it did the cloud provider automatically reprovisioned it on new hardware, so as long as you configure your startup scripts correctly and they work quickly then you might be down for a few minutes. It was incredibly rare for an engineer to have to manually intervene, but because our setup was very simple we could nuke a VM, spin up another one and deploy the software back onto it in and be up and running again in under 30 minutes, which to us was worth the reduced costs.

[+] AlexITC|2 years ago|reply
> No one is talking about redundancy though. I love setups like this but prod environments need robust forms of redundancy

Not really, there are many kinds of apps that don't need such redundancy.

> Also, none of the solutions discussed here gracefully handle new connections on the new service while waiting for all connections to terminate before shutting down the old service. Maybe some of the more esoteric Ansible do idk.

I have dealt with this in the code with shutdown hooks on the server, waiting for existing requests to finish its processing and reject new requests, clients will just end up retrying, not all apps can accept this but many can.

[+] paulkre|2 years ago|reply
Is there any reason not to use Docker instead of systemd? I like managing services with a simple docker-compose.yml on my server. It has worked great so far but I wonder if there are some downsides that I am not aware of. Performance doesn’t seem to be an issue, right?
[+] sgarland|2 years ago|reply
My self-hosted servers are Debian on clustered Proxmox. I bake the images periodically with Ansible and Packer.

I used to have quite a few of them, then I shifted to K8s (or k3os, specifically), so now the only VMs other than the K8s nodes are my NAS, backup target, and a dev server. However, since Rancher has abandoned k3os and it’s forever stuck at upstream 1.21, I’m in the process of switching to Talos Linux for K8s. I have Proxmox running Ceph for me to provide block storage to pods.

My blog was running in a tiny EC2 with a Bitnami multi-WordPress AMI, but since everyone else sharing it with me quit, I shifted that out to GitHub Pages + Hugo. So far I like that quite a bit better, plus it’s free.

[+] kebsup|2 years ago|reply
My default for websites which do not require database is docker image + Google cloud run. Costs almost nothing, easy clickops deployment from GitHub, https managed, reasonably fast cold starts.
[+] kebsup|2 years ago|reply
Just to give some specific numbers: - 40 visits a day - costs 0.01 USD per month - cold start time: 350 ms - however request latencies are 99%: 120ms, 95%: 85ms, 50%: 5ms - there seems to be "idle" instance like 80% of the time The website with source: https://github.com/PetrKubes97/ts-neural-network
[+] ngshiheng|2 years ago|reply
generally speaking, isnt using vps a lot less expensive than a managed service like cloud run? im assuming the “websites” doesnt require to be always available 24/7 (hence the cold start is fine)?
[+] strzibny|2 years ago|reply
My setup is also kept simple and "basic."

Digital Ocean, Rocky Linux or Fedora, systemd services, Bash. I usually run Rails with PostgreSQL. I might use containers more going forwards although I haven't so far.

I wrote Deployment from Scratch exactly for showing how to deploy with just Bash.

[+] ashishb|2 years ago|reply
For web services, I would recommend Google cloud run, Azure container instances, or AWS Fargate for running containers directly. In most cases the price per service would be much lower than 5$/month - https://ashishb.net/tech/how-to-deploy-side-projects-as-web-...
[+] stephenr|2 years ago|reply
The Google calculator for cloud says anything but the tiniest configuration (256M memory) has a minimum $10/month charge just to exist.

A single instance container that runs 24*7 for a month with similar cpu/memory as the $6 droplet is $30/month, before you factor in network costs.

[+] zokier|2 years ago|reply
Idk about other clouds, but AWS Fargate pretty much requires an ELB which adds annoying fixed cost, so for tiny services bare ec2 can be cheaper. Maybe you can amortize the ELB costs over many services but its still something to take into account
[+] raybb|2 years ago|reply
Slightly related, is it feasible to run a syncthing node on something like cloud run with persistent storage attached? If you have an always on computer then it doesn't make sense but if you just have a laptop and phone that only sync now and then it seems like it could work but I haven't seen anyone talk about it.

One of the motivating factors is I had a cheap VPS as my syncthing node and it just stopped working one day and won't boot. I haven't had time to debug it and find out exactly why.

[+] quickthrower2|2 years ago|reply
I am about to embark on this myself. I was tossing up between DO's app platform (good: no server admin, bad: emphatical, lock in) or just renting a VM like this. This pushes me towards VM.

Setting up a python server environment seems to be hard work, with lots of steps (gunicorn and all that) but that said they make the point about using Docker. So maybe docker compose could take a lot of the pain out of it.

[+] riskable|2 years ago|reply
Apache Libcloud supports DigitalOcean:

https://libcloud.readthedocs.io/en/stable/compute/drivers/di...

So as long as you don't mind writing your deployment scripts using Python you can make them reasonably portable (though honestly every provider has a little bit of quirkiness that needs to be worked around but it's usually trivial stuff).

Should solve 95% of that "lock in" problem.

[+] AlexITC|2 years ago|reply
Go for it! It isn't as complex as it seems, then, you can decide whether it is worth it.

One advantage from DO is the regular backups.

Like you said, you can go for executing docker compose on the server if you want to do it fast.

[+] cpursley|2 years ago|reply
Check out render.com as well.
[+] trustingtrust|2 years ago|reply
Is there a way to buy droplets for a year at a time? Like 40$ for a year would be a sweet deal for the $4 droplet. Especially for things like PiHole and wireguard.
[+] xmodem|2 years ago|reply
If your goal is to minimise costs, some of the cheaper providers that have offers on https://lowendbox.com/ will have reasonable annual discounts.
[+] sgarland|2 years ago|reply
No idea about DO, but for years I bought a t3a.micro for about $30/year. If you committed to 3 years it got even cheaper.