Serving a high-performance blog solely from memory, using Rust

There may be further opportunities for improvement.

Chrome and Curl both report it takes about 1100ms to load the linked page's HTML, split about 50/50 between establishing a connection and fetching content. I'm not sure how the implementation works internally but that seems like a long time for a site served from memory and aiming to be "high-performance". The images bring the total time up to around 5.7s.

As a point of comparison, my site (nginx serving static content, on the 0.25 CPU GCP instance) serves the index page in 250ms. Of that, ~140ms is connection setup (DNS, TCP, TLS). The whole page loads in < 1000ms.

https://i.imgur.com/X4LDbWj.png

https://i.imgur.com/Ccwzmgz.png

One thing to remember is that when a server like nginx serves static content, it's often serving it from the page cache (memory). The author of Varnish has written at some length about the benefits of using the OS page cache, for example <https://varnish-cache.org/docs/trunk/phk/notes.html>. Some of the same principles can be applied even for servers that render dynamically (by caching expensive fragments).

xena|3 years ago

Author here. I wrote that post before I axed the CDN for my blog site itself. It was true at the time of writing, but it is not true anymore because I need to redo the CDN for the blog itself. All the images are CDNed with XeDN though.

Groxx|3 years ago

As a contrasting point: I'm consistently getting 150ms from their main domain, and 25-35ms from their cdn subdomain. I suspect most of your latency is from "the internet".

vinay_ys|3 years ago

After going to the end of a long post, I'm disappointed to not find any latency or throughput efficiency metrics. Author seems to claim he has a very popular high-traffic blog and it is super fast, faster than all the popular web servers serving static pages. Where's the performance data to prove this?

edit: web.dev measure gave this blog post url a performance score of 30/100 which is quite poor.

xena|3 years ago

Ripping out cloudflare made the metrics slower. I wrote this post before I ripped out cloudflare and it was accurate at the time of writing. It will be better once I can re-engineer things to be anycasted.

kixiQu|3 years ago

Author isn't a man (https://github.com/Xe)

deathanatos|3 years ago

web.dev seems to give it a poor score primarily because of the YouTube embed … so perhaps Google should heed its own advice?

Jabbles|3 years ago

It would be good if the post contained some data to justify its points, like a graph of loading times. Otherwise assertions like "So fast that it's faster than a static website." don't seem supportable.

I would have liked to see the actual results from this comparison: "I compared my site to Nginx, openresty, tengine, Apache, Go's standard library, Warp in Rust, Axum in Rust, and finally a Go standard library HTTP server that had the site data compiled into ram."

xena|3 years ago

I'm sorry but I have lost that data after some machines got reinstalled. I can attempt to recreate it, but that will have to wait for a future blogpost.

trh0awayman|3 years ago

I want to see this taken to the logical extreme. A real OS with actual drivers (no unikernel, no virtio) for a small set of hardware that only serves static pages. No need for virtual memory. Just hardcode the blog posts right into the OS and use the most minimal TCP stack you can make.

f_devd|3 years ago

I think that should be possible with Cosmopolitan Rust (https://ahgamut.github.io/2022/07/27/ape-rust-example/). It would create a Baremetal runnable ELF binary with just cosmopolitan libc statically linked, not sure about driver support though.

cbm-vic-20|3 years ago

Keep on going down that rabbit hole: burn it into an FPGA.

toast0|3 years ago

> No need for virtual memory.

If it's amd64, long mode requires a page table. Otherwise, a page table is handy so you can get page faults for null pointer dereferencing. Of course, you could do that only for development, and let production run without a page table.

My hobby OS can almost fill your needs though, but the TCP stack isn't really good enough yet (I'm pretty sure I haven't fixed retransmits after I broke them, no selective ack, probably icmp path mtu discovery is broken, certainly no path mtu blackhole detection, ipv4 only, etc), and I only support one realtek nic, cause it's what I could put in my test machine. Performance probably isn't great, but it's not far enough along to make a fair test.

erk__|3 years ago

I am actually not sure if a more minimal TCP stack would be the best, especially if you would need to handle packet loss because of congestion for example. For example recent work such as RACK-TLP gives certain workloads better performance, but it is not something you would have in a minimal TCP stack

filleokus|3 years ago

I guess Unikraft is that, kinda? https://github.com/unikraft/app-nginx

bhedgeoser|3 years ago

One approach is to run some kind of optimizer on a docker image that throws away everything that does not contribute to the end goal of yeeting text at http clients.

staticassertion|3 years ago

Seems like a case for a unikernel running on bare metal. No copying bytes across kernel/user, no context switching at all.

allan_s|3 years ago

It already exists in the embedded device word

I remember working in 2008 on a project for some geothermal devices that were spitting some IoT data on a "hardcoded" html page directly in the C code of the program, the device was using a chinese 8051-like CPU so you had no OS-per se

im_down_w_otp|3 years ago

With a bit of dedication you could probably get that put together with https://github.com/auxoncorp/ferros using Rust & seL4.

shrubble|3 years ago

I think you could do that with redbean, or very nearly so.

KptMarchewa|3 years ago

Why stop there? Bake your blog to hardware.

Thaxll|3 years ago

How can it be faster than a static page that is already in memory, the bytes are there you just send them over a socket? Transforming some template to rust code back to string buffer is somehow faster?

dec0dedab0de|3 years ago

How can it be faster than a static page that is already in memory, the bytes are there you just send them over a socket? Transforming some template to rust code back to string buffer is somehow faster?

I don't think the author is claiming it is faster than a static site stored in memory, they're saying it is faster than a traditional static site that loads files from the disk. At least that's how I read it.

Matthias247|3 years ago

My thoughts as CDN engineer:

It can be a tiny amount more efficient since an async disk IO implementation might dispatch the file read() call to a thread pool, wait for the result, and then send the data back to the client. Makes 2 extra context switches compared to sending data from memory. Now if the user is super confident that the data is hot and in page cache then a synchronous disk read will fix the problem. Or trying a read with RWF_NOWAIT and only falling back to a thread pool if necessary.

On the other hand rendering a template on each request also requires CPU, which might be either more or less expensive than doing a syscall.

All in all the efficiency differences are likely negligible unless you run a CDN which does thousands of requests per seconnd.

In terms of throughput to the end user it will make zero measurable difference unless the box ran out of CPU.

marginalia_nu|3 years ago

On the one hand, sure, you can probably squeeze some cycle or two out of buffering everything in memory. Even though your disk read is a memory read in all likelihood given how filesystem caching works, it's still an IO call, which isn't free.

Keeping everything in user space buffers might just be faster.

On the other hand, you're sending that sucker over network, and what you save doing this is most likely best counted in microseconds/request. It's piss in the ocean compared to the delay introduced even over a local network.

kuschku|3 years ago

With a static page generator, the file is still on disk. You've got a read cache for the disk, but that's not entirely reliable.

Philip-J-Fry|3 years ago

I was thinking the same. They said a Go precompiled version was faster, but was 200MB. Which I don't understand.

200MB of pages and assets, sure. Code? No. If you compile it into the binary then the storage is no worse than having a small binary and all the resources separate.

Taking a statically generated site and returning the raw bytes is 100% faster. The author said so themselves.

lijogdfljk|3 years ago

Well for one it's not static lol. I don't think they're claiming it's faster than a static website, are they?

greenhearth|3 years ago

The tech is cool, but some of the language is so cringy. For example, the statement "websites are social constructs" makes zero sense. You could say that websites are material objects of a symbolic network of computer languages, like physical paper money is a material, fetishized object of the social construct of money. Websites themselves are not constructed socially. Maybe the author means how websites are perceived, or conventions of web tech itself, is constructed socially?

xena|3 years ago

I don't have the time to get into a hardcore semiotics discussion at the moment, but basically I'm using words in the ways that normal people use words, which generally treats perception of the conventions of a thing as the thing itself. People do this mostly for convenience.

dawnbreez|3 years ago

A website is a social construct because it can only function by the agreement of everyone involved (i.e., we all agree on how to parse HTML).

The individual site may be constructed individually (maybe) but it can only work if the society of people-who-use-the-internet all agree to follow a series of conventions about how websites work; you can't start using \<soul\> instead of \<body\> and expect everything to work as normal, because the reason the \<body\> tag is used to define the body of a page is because we needed a way to make sure people can use a webpage without having to define an entire new language for each one.

NoraCodes|3 years ago

I don't agree with this at all. What makes one set of frequency changes over a wire a website and another a voice call? A big pile of socially constructed concepts, from written language to Unicode to TCP and HTML. The electrical impulses are physically real; the website is a construct and makes sense only in the context of society.

unknown|3 years ago

[deleted]

greenhearth|3 years ago

You can downvote all you want; this is the truth

Hitton|3 years ago

Talking about cringe, the furry stuff does in my opinion also negatively impact impression of the article. Also I distinctly remembered extremely similar blog written by someone with different name, but apparently author changed it yet again.

19h|3 years ago

You don't need Rust for this -- you can do the same in Go, Node, etc. In 2012 my cheap VPS had a crappy HDD share but fairly acceptable memory, so I rendered the Markdown files and stored them in a little structure, returning them directly from memory.

Everyone thought it was amazing even though it was just a dumb http server returning pages[req.path] :-) Latency was under 10ms which was pretty amazing for a 2012 KVM VPS.

NoraCodes|3 years ago

I don't think OP was implying that Rust was a requirement, just what was actually used in this case. And, indeed, OP gives some reasons that Rust might be preferable:

> And when I say fast, I mean that I have tried so hard to find some static file server that could beat what my site does. I tried really hard. I compared my site to Nginx, openresty, tengine, Apache, Go's standard library, Warp in Rust, Axum in Rust, and finally a Go standard library HTTP server that had the site data compiled into ram. None of them were faster, save the precompiled Go binary (which was like 200 MB and not viable for my needs). It was hilarious. I have accidentally created something so efficient that it's hard to really express how fast it is.

xani_|3 years ago

I did that in Go although it was "only" caching the markdown rendering - the page templates were written in Go (via some lib that gave tools to make that mangeable) and compiled with the app so the whole template building was blazingly fast.

spullara|3 years ago

Measuring the performance of a CDN isn't that interesting. This is about the fastest blog I have seen and it doesn't have a CDN in front of it:

https://www.lukew.com

manuelmoreale|3 years ago

I get the fun for a developer to set up something like this to experiment and learn new things. But I'm left with a question: why? Like, is there really a point aside for the aforementioned intrinsic dev fun?

There has to be a point of diminishing return. And again, I'm not discarding the dev side of things but it seems a lot of extra tooling and complexity cor not much gain.

whalesalad|3 years ago

I admire the OP's ability to use their blog as a rapid prototyping platform that is constantly growing and changing. Over engineering on a personal project like this is the whole point! Very cool.

I am too much of an OCD perfectionist and don't have the guts to ship this often.

xena|3 years ago

The trick is to do lots of little changes that are easy to do in isolation. Then do bigger changes later after you learn what you messed up.

I have CDO too but I work around it by sheer trolling with infrastructure, like my hacked up to hell CDN: https://xeiaso.net/blog/xedn

treffer|3 years ago

Website title: My Blog is Hilariously Overengineered to the Point People Think it's a Static Site

Seeing the initial comments here I think it would be better to go with the original title.

unknown|3 years ago

[deleted]

unknown|3 years ago

[deleted]

mkl95|3 years ago

You can build and deploy a blazingly fast blog within minutes with Django, Gunicorn and Nginx. This is cooler though.

pradn|3 years ago

Question for the author: do you have numbers to share about performance relative to other static site servers?

Great blog by the way :)

xena|3 years ago

I had the numbers at one point, but I have lost them. I can try to recreate them, but I'd probably have to use my old mac pro again to be sure the results are consistent.

AJRF|3 years ago

Is this the same author that made the talk about PAM recently? I really like his articles.

xena|3 years ago

Thanks!

I'm not a guy, I'd prefer if you used they to refer to me, but she works too.

The PAM one was a really fun talk to write. I need to finish that postmortem on how that talk went wrong.

allan_s|3 years ago

cppcms was (is?) using something similar , you write in a template language, and it get compiled into c++ code

http://cppcms.com/wikipp/en/page/main

https://github.com/Tatoeba/tatowiki the wiki of tatoeba.org ( https://en.wiki.tatoeba.org/articles/show/main# ) is written in it

Existenceblinks|3 years ago

Ah same as (precompiled + loaded into memory):

https://dashbit.co/blog/welcome-to-our-blog-how-it-was-made

apstats|3 years ago

This loaded pretty slowly for me (2 seconds) and also has aggressive page layout changes. It’s almost like for 99% of software the most important part is UX not the low level programming language that is chosen

robertlagrant|3 years ago

Without reading: why do Rust folks think it's better if they memorise a website and serve it, instead of using a computer?

HillRat|3 years ago

The borrow checker is much less strict if the data only lives inside your skull. Much harder to mutably borrow.

hit8run|3 years ago

Can you use more rust to serve the 7 readers of a blog? You know what: use caching or something that compiles to plain html (hugo, jekyll etc.). No need for hardcore memory optimization.

xani_|3 years ago

And then all the gains were entirely eaten by first hop to a network device. Speaking from experience as I did similar thing, although speed was not a concern, just perpetual annoyance with available tools for blogging.

forchune3|3 years ago

[deleted]

llllllllllll9|3 years ago

[deleted]

maxbond|3 years ago

It's pretty bad faith to post an inflammatory comment, initially with several errors like referring to the Ref<> stuct instead of Rc<>, add edits to complain about downvotes, and then steadily edit it to be more correct and reasonable while leaving in the complaints about downvotes. Leaving the impression that it was the current iteration that attracted the downvotes because "the truth hurts".

It suggests to me you know as well as we do that the downvotes were about snarkily expressing your views while making mistakes that might suggest you aren't all that familiar with rust in the first place, and not that you're expressing "forbidden thoughts."

mejutoco|3 years ago

There is value in having unsafe parts of a program clearly annotated (not just with comments). It is similar to how in some languages you annotate pure functions and they do not compile unless they are pure.

NoraCodes|3 years ago

> enjoy programming in rust

planning on it :)

130 comments