There may be further opportunities for improvement.
Chrome and Curl both report it takes about 1100ms to load the linked page's HTML, split about 50/50 between establishing a connection and fetching content. I'm not sure how the implementation works internally but that seems like a long time for a site served from memory and aiming to be "high-performance". The images bring the total time up to around 5.7s.
As a point of comparison, my site (nginx serving static content, on the 0.25 CPU GCP instance) serves the index page in 250ms. Of that, ~140ms is connection setup (DNS, TCP, TLS). The whole page loads in < 1000ms.
One thing to remember is that when a server like nginx serves static content, it's often serving it from the page cache (memory). The author of Varnish has written at some length about the benefits of using the OS page cache, for example <https://varnish-cache.org/docs/trunk/phk/notes.html>. Some of the same principles can be applied even for servers that render dynamically (by caching expensive fragments).
Author here. I wrote that post before I axed the CDN for my blog site itself. It was true at the time of writing, but it is not true anymore because I need to redo the CDN for the blog itself. All the images are CDNed with XeDN though.
As a contrasting point: I'm consistently getting 150ms from their main domain, and 25-35ms from their cdn subdomain. I suspect most of your latency is from "the internet".
After going to the end of a long post, I'm disappointed to not find any latency or throughput efficiency metrics. Author seems to claim he has a very popular high-traffic blog and it is super fast, faster than all the popular web servers serving static pages. Where's the performance data to prove this?
edit: web.dev measure gave this blog post url a performance score of 30/100 which is quite poor.
Ripping out cloudflare made the metrics slower. I wrote this post before I ripped out cloudflare and it was accurate at the time of writing. It will be better once I can re-engineer things to be anycasted.
It would be good if the post contained some data to justify its points, like a graph of loading times. Otherwise assertions like "So fast that it's faster than a static website." don't seem supportable.
I would have liked to see the actual results from this comparison: "I compared my site to Nginx, openresty, tengine, Apache, Go's standard library, Warp in Rust, Axum in Rust, and finally a Go standard library HTTP server that had the site data compiled into ram."
I'm sorry but I have lost that data after some machines got reinstalled. I can attempt to recreate it, but that will have to wait for a future blogpost.
I want to see this taken to the logical extreme. A real OS with actual drivers (no unikernel, no virtio) for a small set of hardware that only serves static pages. No need for virtual memory. Just hardcode the blog posts right into the OS and use the most minimal TCP stack you can make.
I think that should be possible with Cosmopolitan Rust (https://ahgamut.github.io/2022/07/27/ape-rust-example/). It would create a Baremetal runnable ELF binary with just cosmopolitan libc statically linked, not sure about driver support though.
If it's amd64, long mode requires a page table. Otherwise, a page table is handy so you can get page faults for null pointer dereferencing. Of course, you could do that only for development, and let production run without a page table.
My hobby OS can almost fill your needs though, but the TCP stack isn't really good enough yet (I'm pretty sure I haven't fixed retransmits after I broke them, no selective ack, probably icmp path mtu discovery is broken, certainly no path mtu blackhole detection, ipv4 only, etc), and I only support one realtek nic, cause it's what I could put in my test machine. Performance probably isn't great, but it's not far enough along to make a fair test.
I am actually not sure if a more minimal TCP stack would be the best, especially if you would need to handle packet loss because of congestion for example. For example recent work such as RACK-TLP gives certain workloads better performance, but it is not something you would have in a minimal TCP stack
One approach is to run some kind of optimizer on a docker image that throws away everything that does not contribute to the end goal of yeeting text at http clients.
I remember working in 2008 on a project for some geothermal devices that were spitting some IoT data on a "hardcoded" html page directly in the C code of the program, the device was using a chinese 8051-like CPU so you had no OS-per se
How can it be faster than a static page that is already in memory, the bytes are there you just send them over a socket? Transforming some template to rust code back to string buffer is somehow faster?
How can it be faster than a static page that is already in memory, the bytes are there you just send them over a socket? Transforming some template to rust code back to string buffer is somehow faster?
I don't think the author is claiming it is faster than a static site stored in memory, they're saying it is faster than a traditional static site that loads files from the disk. At least that's how I read it.
It can be a tiny amount more efficient since an async disk IO implementation might dispatch the file read() call to a thread pool, wait for the result, and then send the data back to the client. Makes 2 extra context switches compared to sending data from memory. Now if the user is super confident that the data is hot and in page cache then a synchronous disk read will fix the problem. Or trying a read with RWF_NOWAIT and only falling back to a thread pool if necessary.
On the other hand rendering a template on each request also requires CPU, which might be either more or less expensive than doing a syscall.
All in all the efficiency differences are likely negligible unless you run a CDN which does thousands of requests per seconnd.
In terms of throughput to the end user it will make zero measurable difference unless the box ran out of CPU.
On the one hand, sure, you can probably squeeze some cycle or two out of buffering everything in memory. Even though your disk read is a memory read in all likelihood given how filesystem caching works, it's still an IO call, which isn't free.
Keeping everything in user space buffers might just be faster.
On the other hand, you're sending that sucker over network, and what you save doing this is most likely best counted in microseconds/request. It's piss in the ocean compared to the delay introduced even over a local network.
I was thinking the same. They said a Go precompiled version was faster, but was 200MB. Which I don't understand.
200MB of pages and assets, sure. Code? No. If you compile it into the binary then the storage is no worse than having a small binary and all the resources separate.
Taking a statically generated site and returning the raw bytes is 100% faster. The author said so themselves.
The tech is cool, but some of the language is so cringy. For example, the statement "websites are social constructs" makes zero sense. You could say that websites are material objects of a symbolic network of computer languages, like physical paper money is a material, fetishized object of the social construct of money. Websites themselves are not constructed socially. Maybe the author means how websites are perceived, or conventions of web tech itself, is constructed socially?
I don't have the time to get into a hardcore semiotics discussion at the moment, but basically I'm using words in the ways that normal people use words, which generally treats perception of the conventions of a thing as the thing itself. People do this mostly for convenience.
A website is a social construct because it can only function by the agreement of everyone involved (i.e., we all agree on how to parse HTML).
The individual site may be constructed individually (maybe) but it can only work if the society of people-who-use-the-internet all agree to follow a series of conventions about how websites work; you can't start using \<soul\> instead of \<body\> and expect everything to work as normal, because the reason the \<body\> tag is used to define the body of a page is because we needed a way to make sure people can use a webpage without having to define an entire new language for each one.
I don't agree with this at all. What makes one set of frequency changes over a wire a website and another a voice call? A big pile of socially constructed concepts, from written language to Unicode to TCP and HTML. The electrical impulses are physically real; the website is a construct and makes sense only in the context of society.
Talking about cringe, the furry stuff does in my opinion also negatively impact impression of the article. Also I distinctly remembered extremely similar blog written by someone with different name, but apparently author changed it yet again.
You don't need Rust for this -- you can do the same in Go, Node, etc. In 2012 my cheap VPS had a crappy HDD share but fairly acceptable memory, so I rendered the Markdown files and stored them in a little structure, returning them directly from memory.
Everyone thought it was amazing even though it was just a dumb http server returning pages[req.path] :-) Latency was under 10ms which was pretty amazing for a 2012 KVM VPS.
I don't think OP was implying that Rust was a requirement, just what was actually used in this case. And, indeed, OP gives some reasons that Rust might be preferable:
> And when I say fast, I mean that I have tried so hard to find some static file server that could beat what my site does. I tried really hard. I compared my site to Nginx, openresty, tengine, Apache, Go's standard library, Warp in Rust, Axum in Rust, and finally a Go standard library HTTP server that had the site data compiled into ram. None of them were faster, save the precompiled Go binary (which was like 200 MB and not viable for my needs). It was hilarious. I have accidentally created something so efficient that it's hard to really express how fast it is.
I did that in Go although it was "only" caching the markdown rendering - the page templates were written in Go (via some lib that gave tools to make that mangeable) and compiled with the app so the whole template building was blazingly fast.
I get the fun for a developer to set up something like this to experiment and learn new things. But I'm left with a question: why? Like, is there really a point aside for the aforementioned intrinsic dev fun?
There has to be a point of diminishing return. And again, I'm not discarding the dev side of things but it seems a lot of extra tooling and complexity cor not much gain.
I admire the OP's ability to use their blog as a rapid prototyping platform that is constantly growing and changing. Over engineering on a personal project like this is the whole point! Very cool.
I am too much of an OCD perfectionist and don't have the guts to ship this often.
I had the numbers at one point, but I have lost them. I can try to recreate them, but I'd probably have to use my old mac pro again to be sure the results are consistent.
This loaded pretty slowly for me (2 seconds) and also has aggressive page layout changes. It’s almost like for 99% of software the most important part is UX not the low level programming language that is chosen
Can you use more rust to serve the 7 readers of a blog? You know what: use caching or something that compiles to plain html (hugo, jekyll etc.). No need for hardcore memory optimization.
And then all the gains were entirely eaten by first hop to a network device. Speaking from experience as I did similar thing, although speed was not a concern, just perpetual annoyance with available tools for blogging.
It's pretty bad faith to post an inflammatory comment, initially with several errors like referring to the Ref<> stuct instead of Rc<>, add edits to complain about downvotes, and then steadily edit it to be more correct and reasonable while leaving in the complaints about downvotes. Leaving the impression that it was the current iteration that attracted the downvotes because "the truth hurts".
It suggests to me you know as well as we do that the downvotes were about snarkily expressing your views while making mistakes that might suggest you aren't all that familiar with rust in the first place, and not that you're expressing "forbidden thoughts."
There is value in having unsafe parts of a program clearly annotated (not just with comments). It is similar to how in some languages you annotate pure functions and they do not compile unless they are pure.
jmillikin|3 years ago
Chrome and Curl both report it takes about 1100ms to load the linked page's HTML, split about 50/50 between establishing a connection and fetching content. I'm not sure how the implementation works internally but that seems like a long time for a site served from memory and aiming to be "high-performance". The images bring the total time up to around 5.7s.
As a point of comparison, my site (nginx serving static content, on the 0.25 CPU GCP instance) serves the index page in 250ms. Of that, ~140ms is connection setup (DNS, TCP, TLS). The whole page loads in < 1000ms.
https://i.imgur.com/X4LDbWj.png
https://i.imgur.com/Ccwzmgz.png
One thing to remember is that when a server like nginx serves static content, it's often serving it from the page cache (memory). The author of Varnish has written at some length about the benefits of using the OS page cache, for example <https://varnish-cache.org/docs/trunk/phk/notes.html>. Some of the same principles can be applied even for servers that render dynamically (by caching expensive fragments).
xena|3 years ago
Groxx|3 years ago
vinay_ys|3 years ago
edit: web.dev measure gave this blog post url a performance score of 30/100 which is quite poor.
xena|3 years ago
kixiQu|3 years ago
deathanatos|3 years ago
Jabbles|3 years ago
I would have liked to see the actual results from this comparison: "I compared my site to Nginx, openresty, tengine, Apache, Go's standard library, Warp in Rust, Axum in Rust, and finally a Go standard library HTTP server that had the site data compiled into ram."
xena|3 years ago
trh0awayman|3 years ago
f_devd|3 years ago
cbm-vic-20|3 years ago
toast0|3 years ago
If it's amd64, long mode requires a page table. Otherwise, a page table is handy so you can get page faults for null pointer dereferencing. Of course, you could do that only for development, and let production run without a page table.
My hobby OS can almost fill your needs though, but the TCP stack isn't really good enough yet (I'm pretty sure I haven't fixed retransmits after I broke them, no selective ack, probably icmp path mtu discovery is broken, certainly no path mtu blackhole detection, ipv4 only, etc), and I only support one realtek nic, cause it's what I could put in my test machine. Performance probably isn't great, but it's not far enough along to make a fair test.
erk__|3 years ago
filleokus|3 years ago
bhedgeoser|3 years ago
staticassertion|3 years ago
allan_s|3 years ago
I remember working in 2008 on a project for some geothermal devices that were spitting some IoT data on a "hardcoded" html page directly in the C code of the program, the device was using a chinese 8051-like CPU so you had no OS-per se
im_down_w_otp|3 years ago
shrubble|3 years ago
KptMarchewa|3 years ago
Thaxll|3 years ago
dec0dedab0de|3 years ago
I don't think the author is claiming it is faster than a static site stored in memory, they're saying it is faster than a traditional static site that loads files from the disk. At least that's how I read it.
Matthias247|3 years ago
It can be a tiny amount more efficient since an async disk IO implementation might dispatch the file read() call to a thread pool, wait for the result, and then send the data back to the client. Makes 2 extra context switches compared to sending data from memory. Now if the user is super confident that the data is hot and in page cache then a synchronous disk read will fix the problem. Or trying a read with RWF_NOWAIT and only falling back to a thread pool if necessary.
On the other hand rendering a template on each request also requires CPU, which might be either more or less expensive than doing a syscall.
All in all the efficiency differences are likely negligible unless you run a CDN which does thousands of requests per seconnd.
In terms of throughput to the end user it will make zero measurable difference unless the box ran out of CPU.
marginalia_nu|3 years ago
Keeping everything in user space buffers might just be faster.
On the other hand, you're sending that sucker over network, and what you save doing this is most likely best counted in microseconds/request. It's piss in the ocean compared to the delay introduced even over a local network.
kuschku|3 years ago
Philip-J-Fry|3 years ago
200MB of pages and assets, sure. Code? No. If you compile it into the binary then the storage is no worse than having a small binary and all the resources separate.
Taking a statically generated site and returning the raw bytes is 100% faster. The author said so themselves.
lijogdfljk|3 years ago
greenhearth|3 years ago
xena|3 years ago
dawnbreez|3 years ago
The individual site may be constructed individually (maybe) but it can only work if the society of people-who-use-the-internet all agree to follow a series of conventions about how websites work; you can't start using \<soul\> instead of \<body\> and expect everything to work as normal, because the reason the \<body\> tag is used to define the body of a page is because we needed a way to make sure people can use a webpage without having to define an entire new language for each one.
NoraCodes|3 years ago
unknown|3 years ago
[deleted]
greenhearth|3 years ago
Hitton|3 years ago
19h|3 years ago
Everyone thought it was amazing even though it was just a dumb http server returning pages[req.path] :-) Latency was under 10ms which was pretty amazing for a 2012 KVM VPS.
NoraCodes|3 years ago
> And when I say fast, I mean that I have tried so hard to find some static file server that could beat what my site does. I tried really hard. I compared my site to Nginx, openresty, tengine, Apache, Go's standard library, Warp in Rust, Axum in Rust, and finally a Go standard library HTTP server that had the site data compiled into ram. None of them were faster, save the precompiled Go binary (which was like 200 MB and not viable for my needs). It was hilarious. I have accidentally created something so efficient that it's hard to really express how fast it is.
xani_|3 years ago
spullara|3 years ago
https://www.lukew.com
manuelmoreale|3 years ago
There has to be a point of diminishing return. And again, I'm not discarding the dev side of things but it seems a lot of extra tooling and complexity cor not much gain.
whalesalad|3 years ago
I am too much of an OCD perfectionist and don't have the guts to ship this often.
xena|3 years ago
I have CDO too but I work around it by sheer trolling with infrastructure, like my hacked up to hell CDN: https://xeiaso.net/blog/xedn
treffer|3 years ago
Seeing the initial comments here I think it would be better to go with the original title.
unknown|3 years ago
[deleted]
unknown|3 years ago
[deleted]
mkl95|3 years ago
pradn|3 years ago
Great blog by the way :)
xena|3 years ago
AJRF|3 years ago
xena|3 years ago
I'm not a guy, I'd prefer if you used they to refer to me, but she works too.
The PAM one was a really fun talk to write. I need to finish that postmortem on how that talk went wrong.
allan_s|3 years ago
http://cppcms.com/wikipp/en/page/main
https://github.com/Tatoeba/tatowiki the wiki of tatoeba.org ( https://en.wiki.tatoeba.org/articles/show/main# ) is written in it
Existenceblinks|3 years ago
https://dashbit.co/blog/welcome-to-our-blog-how-it-was-made
apstats|3 years ago
robertlagrant|3 years ago
HillRat|3 years ago
hit8run|3 years ago
xani_|3 years ago
forchune3|3 years ago
[deleted]
llllllllllll9|3 years ago
[deleted]
maxbond|3 years ago
It suggests to me you know as well as we do that the downvotes were about snarkily expressing your views while making mistakes that might suggest you aren't all that familiar with rust in the first place, and not that you're expressing "forbidden thoughts."
mejutoco|3 years ago
NoraCodes|3 years ago
planning on it :)