Using Rust in non-Rust servers to improve performance

[+] jchw|1 year ago|reply

Haha, I was flabbergasted to see the results of the subprocess approach, incredible. I'm guessing the memory usage being lower for that approach (versus later ones) is because a lot of the heavy lifting is being done in the subprocess which then gets entirely freed once the request is over. Neat.

I have a couple of things I'm wondering about though:

- Node.js is pretty good at IO-bound workloads, but I wonder if this holds up as well when comparing e.g. Go or PHP. I have run into embarrassing situations where my RiiR adventure ended with less performance against even PHP, which makes some sense: PHP has tons of relatively fast C modules for doing some heavy lifting like image processing, so it's not quite so clear-cut.

- The "caveman" approach is a nice one just to show off that it still works, but it obviously has a lot of overhead just because of all of the forking and whatnot. You can do a lot better by not spawning a new process each time. Even a rudimentary approach like having requests and responses stream synchronously and spawning N workers would probably work pretty well. For computationally expensive stuff, this might be a worthwhile approach because it is so relatively simple compared to approaches that reach for native code binding.

[+] tln|1 year ago|reply

The native code binding was impressively simple!

7 lines of rust, 1 small JS change. It looks like napi-rs supports Buffer so that JS change could be easily eliminated too.

[+] sunshowers|1 year ago|reply

Depends on the situation, but posix_spawn is really fast on Linux (much faster than the traditional fork/exec), and independent processes provide fault isolation boundaries.

[+] VMG|1 year ago|reply

> You can do a lot better by not spawning a new process each time. Even a rudimentary approach like having requests and responses stream synchronously and spawning N workers would probably work pretty well

And with just a tiny bit of extra work you can give the worker an http interface.... Wait a minute.,.

[+] unknown|1 year ago|reply

[deleted]

[+] tialaramex|1 year ago|reply

Caveman approach has several nice features - I think I'd be tempted even if it didn't have better performance.

[+] eandre|1 year ago|reply

Encore.ts is doing something similar for TypeScript backend frameworks, by moving most of the request/response lifecycle into Async Rust: https://encore.dev/blog/event-loops

Disclaimer: I'm one of the maintainers

[+] internetter|1 year ago|reply

What's your response to this? https://github.com/encoredev/ts-benchmarks/issues/2

[+] isodev|1 year ago|reply

This is a really cool comparison, thank you for sharing!

Beyond performance, Rust also brings a high level of portability and these examples show just how versatile a pice of code can be. Even beyond the server, running this on iOS or Android is also straightforward.

Rust is definitely a happy path.

[+] jvanderbot|1 year ago|reply

Rust deployment is a happy path, with few caveats. Writing is sometimes less happy than it might otherwise be, but that's the tradeoff.

My favorite thing about Rust, however, is Rust dependency management. Cargo is a dream, coming from C++ land.

[+] xyst|1 year ago|reply

In my opinion, the significant drop in memory footprint is truly underrated (13 MB vs 1300 MB). If everybody cared about optimizing for efficiency and performance, the cost of computing wouldn’t be so burdensome.

Even self-hosting on an rpi becomes viable.

[+] marcosdumay|1 year ago|reply

It's the result of the data isolation above anything else attitude of Javascript.

Or, in other words, it's the unavoidable result of insisting on using a language created for the frontend to write everything else.

You don't need to rewrite your code in Rust to get that saving. Any other language will do.

(Personally, I'm surprised all the gains are so small. Looks like it's a very well optimized code path.)

[+] echoangle|1 year ago|reply

If every developer cared for optimizing efficiency and performance, development would become slower and more expensive though. People don’t write bad-performing code because it’s fun but because it’s easier. If hardware is cheap enough, it can be advantageous to quickly write slow code and get a big server instead of spending days optimizing it to save $100 on servers. When scaling up, the tradeoff has to be reconsidered of course.

[+] btilly|1 year ago|reply

That's because you're churning temporary memory. JS can't free it until garbage collection runs. Rust is able to do a lifetime analysis, and knows it can free it immediately.

The same will happen on any function where you're calling functions over and over again that create transient data which later gets discarded.

[+] leeoniya|1 year ago|reply

fwiw, Bun/webkit is much better in mem use if your code is written in a way that avoids creating new strings. it won't be a 100x improvement, but 5x is attainable.

[+] palata|1 year ago|reply

> If everybody cared about optimizing for efficiency and performance

The problem is that most developers are not capable of optimizing for efficiency and performance.

Having more powerful hardware has allowed us to make software frameworks/libraries that make programming a lot more accessible. At the same time lowering the quality of said software.

Doesn't mean that all software is bad. Most software is bad, that's all.

[+] jchw|1 year ago|reply

It's a little more nuanced than that of course, a big reason why the memory usage is so high is because Node.JS needs more of it to take advantage of a large multicore machine for compute-intensive tasks.

> Regarding the abnormally high memory usage, it's because I'm running Node.js in "cluster mode", which spawns 12 processes for each of the 12 CPU cores on my test machine, and each process is a standalone Node.js instance which is why it takes up 1300+ MB of memory even though we have a very simple server. JS is single-threaded so this is what we have to do if we want a Node.js server to make full use of a multi-core CPU.

On a Raspberry Pi you would certainly not need so many workers even if you did care about peak throughput, I don't think any of them have >4 CPU threads. In practice I do run Node.JS and JVM-based servers on Raspberry Pi (although not Node.JS software that I personally have written.)

The bigger challenge to a decentralized Internet where everyone self-hosts everything is, well, everything else. Being able to manage servers is awesome. Actually managing servers is less glorious, though:

- Keeping up with the constant race of security patching.

- Managing hardware. Which, sometimes, fails.

- Setting up and testing backup solutions. Which can be expensive.

- Observability and alerting; You probably want some monitoring so that the first time you find out your drives are dying isn't months after SMART would've warned you. Likewise, you probably don't want to find out you have been compromised after your ISP warns you about abuse months into helping carry out criminal operations.

- Availability. If your home internet or power goes out, self-hosting makes it a bigger issue than it normally would be. I love the idea of a world where everyone runs their own systems at home, but this is by far the worst consequence. Imagine if all of your e-mails bounced while the power was out.

Some of these problems are actually somewhat tractable to improve on but the Internet and computers in general marched on in a different more centralized direction. At this point I think being able to write self-hostable servers that are efficient and fast is actually not the major problem with self-hosting.

I still think people should strive to make more efficient servers of course, because some of us are going to self-host anyways, and Raspberry Pis run longer on battery than large rack servers do. If Rust is the language people choose to do that, I'm perfectly content with that. However, it's worth noting that it doesn't have to be the only one. I'd be just as happy with efficient servers in Zig or Go. Or Node.JS/alternative JS-based runtimes, which can certainly do a fine job too, especially when the compute-intensive tasks are not inside of the event loop.

[+] throwitaway1123|1 year ago|reply

There are flags you can set to tune memory usage (notably V8's --max-old-space-size for Node and the --smol flag for Bun). And of course in advanced scenarios you can avoid holding strong references to objects with weak maps, weak sets, and weak refs.

[+] beached_whale|1 year ago|reply

Im ok if it isnt popular. It will keep compute costs lower for those using it as the norm is excessive usage

[+] rwaksmunski|1 year ago|reply

Pretty sure Tier 4 should be faster than that. I wonder if the CPU was fully utilized on this benchmark. I did some performance work with Axum a while back and was bitten by Nagle algorithm. Setting TCP_NODELAY pushed the benchmark from 90,000 req/s to 700,000 req/s in a VM on my laptop.

[+] pjmlp|1 year ago|reply

And so what we were doing with Apache, mod_<pick your lang> and C back in 2000, is new again.

At least with Rust it is safer.

[+] ports543u|1 year ago|reply

While I agree the enhancement is significant, the title of this post makes it seem more like an advertisement for Rust than an optimization article. If you rewrite js code into a native language, be it Rust or C, of course it's gonna be faster and use less resources.

[+] mplanchard|1 year ago|reply

Is there an equivalently easy way to expose a native interface from C to JS as the example in the post? Relatedly, is it as easy to generate a QR code in C as it is in Rust (11 LoC)?

[+] baq|1 year ago|reply

'of course' is not really that obvious except for microbenchmarks like this one.

[+] echelon|1 year ago|reply

Rust is simply amazing to do web backend development in. It's the biggest secret in the world right now. It's why people are writing so many different web frameworks and utilities - it's popular, practical, and growing fast.

Writing Rust for web (Actix, Axum) is no different than writing Go, Jetty, Flask, etc. in terms of developer productivity. It's super easy to write server code in Rust.

Unlike writing Python HTTP backends, the Rust code is so much more defect free.

I've absorbed 10,000+ qps on a couple of cheap tiny VPS instances. My server bill is practically non-existent and I'm serving up crazy volumes without effort.

[+] kstrauser|1 year ago|reply

I’ve written Python APIs since about 2001 or so. A few weeks ago I used Actix to write a small API server. If you squint and don’t see the braces, it looks an awful lot like a Flask app.

I had fun writing it, learned some new stuff along the way, and ended up with an API that could serve 80K RPS (according to the venerable ab command) on my laptop with almost no optimization effort. I will absolutely reach for Rust+Actix again for my next project.

(And I found, fixed, and PR’d a bug in a popular rate limiter, so I got to play in the broader Rust ecosystem along the way. It was a fun project!)

[+] boredumb|1 year ago|reply

I've been experimenting with using Tide, sqlx and askama and after getting comfortable, it's even more ergonomic for me than using golang and it's template/sql librarys. Having compile time checks on SQL and templates in and of itself is a reason to migrate. I think people have a lot of issues with the life time scoping but for most applications it simply isn't something you are explicitly dealing with every day in the way that rust is often displayed/feared (and once you fully wrap your head around what it's doing it's as simple as most other language aspects).

[+] JamesSwift|1 year ago|reply

> Writing Rust for web (Actix, Axum) is no different than writing Go, Jetty, Flask, etc. in terms of developer productivity. It's super easy to write server code in Rust.

I would definitely disagree with this after building a micro service (url shortener) in rust. Rust requires you to rethink your design in unique ways, so that you generally cant do things in the 'dumbest way possible' as your v1. I found myself really having to rework my design-brain to fit rusts model to please the compiler.

Maybe once that relearning has occurred you can move faster, but it definitely took a lot longer to write an extremely simple service than I would have liked. And scaling that to a full api application would likely be even slower.

Caveat that this was years ago right when actix 2 was coming out I believe, so the framework was in a high amount of flux in addition to needing to get my head around rust itself.

[+] adamrezich|1 year ago|reply

Disclaimer: I haven't ever written any serious Rust code, and the last time I even tried to use the language was years ago now.

What is it about Rust that makes it so appealing to people to use for web backend development? From what I can tell, one of the selling points of Rust is its borrow checker/lifetime management system. But if you're making a web backend, then you really only need to care about two lifetimes: the lifetime of the program, and the lifetime of a given request/response. If you want to write a web backend in C, then it's not too difficult to set up a simple system that makes a temporary memory arena for each request/response, and, once the response is sent, marks this memory for reuse (and probably zeroes it, for maximum security), instead of freeing it.

Again, I don't really have any experience with Rust whatsoever, but how does the borrow checker/lifetime system help you with this? It seems to me (as a naïve, outside observer) that these language features would get in the way more than they would help.

[+] Dowwie|1 year ago|reply

Beware the risks of using NIFs with Elixir. They run in the same memory space as the BEAM and can crash not just the process but the entire BEAM. Granted, well-written, safe Rust could lower the chances of this happening, but you need to consider the risk.

[+] mijoharas|1 year ago|reply

I believe that by using rustler[0] to build the bindings that shouldn't be possible. (at the very least that's stated in the readme.)

> Safety : The code you write in a Rust NIF should never be able to crash the BEAM.

I tried to find some documentation stating how it works but couldn't. I think they use a dirty scheduler, and catch panics at the boundaries or something? wasn't able to find a clear reference.

[0] https://github.com/rusterlium/rustler

[+] voiper1|1 year ago|reply

Wow, that's an incredible writeup.

Super surprised that shelling out was nearly as good any any other method.

Why is the average bytes smaller? Shouldn't it be the same size file? And if not, it's a different alorithm so not necessarily better?

[+] djoldman|1 year ago|reply

Not trying to be snarky, but for this example, if we can compile to wasm, why not have the client compute this locally?

This would entail zero network hops, probably 100,000+ QRs per second.

IF it is 100,000+ QRs per second, isn't most of the thing we're measuring here dominated by network calls?

[+] munificent|1 year ago|reply

It's a synthetic example to conjure up something CPU bound on the server.

[+] jeroenhd|1 year ago|reply

WASM blobs for programs like these can easily turn into megabytes of difficult to compress binary blobs once transitive dependencies start getting pulled in. That can mean seconds of extra load time to generate an image that can be represented by maybe a kilobyte in size.

Not a bad idea for an internal office network where every computer is hooked up with a gigabit or better, but not great for cloud hosted web applications.

[+] nemetroid|1 year ago|reply

The fastest code in the article has an average latency of 14 ms, benchmarking against localhost. On my computer, "ping localhost" has an average latency of 20 µs. I don't have a lot of experience writing network services, but those numbers sound CPU bound to me.

[+] bdahz|1 year ago|reply

I'm curious what if we replace Rust with C/C++ in those tiers. Would the results be even better or worse than Rust?

[+] jinnko|1 year ago|reply

I'm curious how many cores the server the tests ran on had, and what the performance would be of handling the requests in native node with worker threads[1]? I suspect there's an aspect of being tied to a single main thread that explains the difference at least between tier 0 and 1.

1: https://nodejs.org/api/worker_threads.html

[+] pretzelhammer|1 year ago|reply

As the article mentions, the test server had 12 cores. The Node.js server ran in "cluster mode" so that all 12 cores were utilized during benchmarking. You can see the implementation here (just ~20 lines of JS): https://github.com/pretzelhammer/using-rust-in-non-rust-serv...

[+] tialaramex|1 year ago|reply

Doesn't "the 12 CPU cores on my test machine" answer your question ?

[+] bhelx|1 year ago|reply

If you have a Java library, take a look at Chicory: https://github.com/dylibso/chicory

It runs on any JVM and has a couple flavors of "ahead-of-time" bytecode compilation.

[+] unknown|1 year ago|reply

[deleted]

[+] bluejekyll|1 year ago|reply

This is great to see. I had my own effort around this that I could never quite get done.

I didn’t notice this on the front page, what JVM versions is this compatible with?

[+] Already__Taken|1 year ago|reply

Shelling out to a CLI is quite an interesting path because often that functionality could be useful handed out as a separate utility to power users or non-automation tasks. Rust makes cross-platform distribution easy.

[+] dyzdyz010|1 year ago|reply

Make Rustler great again!

[+] demarq|1 year ago|reply

I didn’t realize calling to the cli is that fast.

[+] kelnos|1 year ago|reply

I doubt it's actually calling out to the CLI (aka the shell); presumably it's just fork()ing and exec()ing.

On Linux, fork() is actually reasonably fast, and if you're exec()ing a binary that's fairly small and doesn't need to do a lot of shared library loading, relocations, or initialization, that part of the cost is also fairly low (for a Rust program, this will usually be the case, as they are mostly-statically-linked). Won't be as low as crossing a FFI boundary in the same process (or not having a FFI boundary and doing it all in the same process) of course, but it's not as bad as you might think.

[+] lsofzz|1 year ago|reply

<3

[+] bebna|1 year ago|reply

For me a "Non-Rust Server" would be something like a PHP webhoster. If I can run my own node instance, I can possible run everything I want.

[+] bluejekyll|1 year ago|reply

The article links to two PHP and Rust integration strategies, WASM[1] or native[2].

[1] https://github.com/wasmerio/wasmer-php

[2] https://github.com/davidcole1340/ext-php-rs

273 comments