top | item 6908064

TCP HTTP Server written in Assembly

154 points| thikonom | 12 years ago |canonical.org | reply

70 comments

order
[+] derefr|12 years ago|reply
Cool stuff. Really, though, this is still relying on a rather large runtime library: the physical, data-link, and network-layer drivers.

Now what'd be really awesome to see, would be one of those Operating System guides that shows you how to write an OS kernel, in assembler, that can speak HTTP. Even just limiting yourself to targeting the synthetic hardware of a VM program, it'd still be quite a feat.

Bonus points if the entire network stack has been flattened using the hand-rolled equivalent of stream-fusion. :)

[+] kragen|12 years ago|reply
Agreed, and also the filesystem. I think you may be looking for lwIP and Contiki.
[+] gwu78|12 years ago|reply
Are you saying you'd like to see string procesing moved into the kernel?
[+] neverm0re|12 years ago|reply
Here's another simpler implementation of an HTTP server in Linux x86 assembly from last year, coincidentally by the one who did the Seiken Densetsu 3/Secret of Mana 3 translation hack and the old Starscream 68k emulator:

http://www.neillcorlett.com/etc/mohttpd.asm.txt

And a not so successful thread to go with it: https://news.ycombinator.com/item?id=4714971

[+] kragen|12 years ago|reply
That's very nice! Thanks! I spent a lot of last night figuring out how to use socketcall, and this should be helpful.
[+] kragen|12 years ago|reply
I hacked on httpdito some more, and it has been improved in several ways:

- it now forks so that it can handle multiple concurrent connections (up to a limit of 2048);

- it no longer uses libc at all, so it's down to 2088 bytes (I had it lower, but then I added forking);

- it's less complex now that it only has one way of invoking system calls instead of two;

- there are some performance results in the comments.

- it has a name, "httpdito";

- strlen works correctly.

Probably nobody will read this comment here, but I thought it was worth mentioning.

[+] kragen|12 years ago|reply
Down to 1928 bytes now, and has timeouts for robustness. You can still DoS it but it takes more work.
[+] mappu|12 years ago|reply
Cool!

My comments as an inexperienced assembly developer, assuming this is optimising for binary size:

- The pug/doN macros do an extra reg-reg copy if passed a register - and the recursive definition calls pop/pop/pop instead of just add %esp, -4*N, you could shave a few bytes

- AT&T syntax will always look weird to me, but the heavy use of macros and local labels is quite elegant

- A little bit of candid swearing in the comments? Fine by me, but is this officially associated with canonical?

[+] aroman|12 years ago|reply
> - A little bit of candid swearing in the comments? Fine by me, but is this officially associated with canonical?

Assuming you mean Canonical Ltd., the company behind Ubuntu, this has absolutely nothing with them — this is hosted on canonical.org, not canonical.com.

[+] pbsd|12 years ago|reply
Agree, AT&T syntax was just not designed for human reading. I doubt this is too optimized for size, since there are obvious tricks that it misses.

Another observation: the strlen code is incorrect, as it also counts the \0. We can fix this, and make the code 1 byte shorter (in glorious Intel syntax):

    lea esi, source        ; depends on source
    xor ecx, ecx           ; 2 bytes
    salc                   ; 1 byte
    cld                    ; 1 byte
    _back:
    scasb                  ; 1 byte 
    loopnz _back           ; 2 bytes
    not ecx                ; 2 bytes
[+] derleth|12 years ago|reply
> you could shave a few bytes

This is practically axiomatic in assembly language programming.

It's just not worth it to turn you code into what you'd need to turn it into in order to make it as small (or as fast) as it can possibly be on that specific version of that specific microarchitecture from that specific manufacturer, such work being undone by the next version of the hardware.

> AT&T syntax will always look weird to me

AT&T syntax is meant to be a generic assembly language syntax; it's supposed to look equally weird to everyone, regardless of what CPU they're writing code for. GAS will accept Intel syntax, or a somewhat heterodox variant thereof. NASM is the usual assembler of choice on modern x86 Unix-a-likes, I think.

http://www.nasm.us/

> A little bit of candid swearing in the comments?

Hey, if the Linux kernel devs can do it, why not them?

[+] tokenizer|12 years ago|reply
As a web developer who isn't familiar with assembly or any web server more barebones than nginx, what benefits does something like this provide? Speed? Could this be a solution for an extremely simple directory/static file web server?
[+] anonymouscowar1|12 years ago|reply
This is a simple, single-threaded single-process accept-read-respond-loop web server. It's vulnerable to trivial trickle DoS attacks and probably has other issues. There are no advantages, the author just did this for fun.

The TCP part comes from C code in the kernel, so this headline is a little misleading ;-).

[+] knappador|12 years ago|reply
This is normally the kind of question I ask about anything involving HTML/CSS only or JS only =D PoC's based on low-level concepts are the ones that make you curious about everything from top to bottom. Even though assembly is the least abstract and most esoteric of programming (some would argue opposite) spaces, the program actually reveals itself quite quickly knowing just a few tid-bits. This is how you get to see that even the most low-level aspects of programming are quite accessible.
[+] kragen|12 years ago|reply
Nginx will almost certainly be faster, and is somewhat robust against DoS attacks. I didn't write this to provide benefits. There are situations where this would work better than nginx (where, say, you don't want to spend any time configuring anything) but there are better existing solutions for those cases.
[+] pmiller2|12 years ago|reply
Neat little piece of performance art (pun intended).
[+] jebblue|12 years ago|reply
Good way to put it, was trying to think of something similar.
[+] radikalus|12 years ago|reply
No full tcp stack in assembly? =p

(Yes there's no point as it's better in hardware blah blah)

[+] Vektorweg|12 years ago|reply
I'm really happy that executable size doesn't matter for server software. Because Yesod produce really big execs.
[+] mikkom|12 years ago|reply
> Depends on the C libraries.

^ That tells everything you need to know.

[+] pekk|12 years ago|reply
and I just got finished rewriting all my large webapps in some obscure Java framework for performance, because of some benchmarks I saw on HN. Guess now I have to rewrite it all in assembly, because more performance is always better right?
[+] yeukhon|12 years ago|reply
Well, the JVM might be doing something smarter than your assembler-from-scratch code.
[+] anonymouscowar1|12 years ago|reply
This is not a very fast webserver. Anything using sendfile() and threads/processes will beat it handily.
[+] meshko|12 years ago|reply
OMG all these macros. It looks more like Python then Assembly. Come on, real men do not use macros.
[+] kragen|12 years ago|reply
Thank you very much! This is the nicest comment in this entire thread!
[+] asmman1|12 years ago|reply
You're right. Real men don't use Assembly, too, but do use binary instead of. :)
[+] derleth|12 years ago|reply
> Come on, real men do not use macros.

The sexism and historical ignorance in this sentence are in a race to see which can be more breathtaking.

Regardless of which wins, meshko will look like a complete fool to anyone who knows what they're talking about.

[+] puppetmaster3|12 years ago|reply
Likely does not have any back door. Rumor is GCC opens back door for you know who.
[+] StavrosK|12 years ago|reply
Voldemort?