Some other tricks that were not touched upon in the article, but which may apply depending on the nature of your traffic:
1) If you have lots of short connections and you want to tune the amount of time that the kernel will keep half-closed connections around then you can play around with changing the values of net.ipv4.tcp_fin_timeout, net.ipv4.tcp_tw_reuse, net.ipv4.tcp_tw_recycle, and net.ipv4.tcp_max_tw_buckets.
2) If you have a modern NIC then you probably need to tweak the txqueuelen in your ifconfig options.
3) If you are get hits from a large number of random browsers then sometimes setting net.ipv4.tcp_no_metrics_save and net.ipv4.tcp_moderate_rcvbuf to turn off cacheing of flow metrics helps.
4) Increase net.core.somaxconn to increase your listen queue size.
5) If you have a local firewall like iptables in place make sure you increase net.ipv4.ip_conntrack_max, direct your high-traffic ports to the NOTRACK table, and play around with all of the various net.ipv4.netfilter.ip_conntrack_tcp_timeout_* settings.
Good tips. The only thing I would recommend against is setting net.core.somaxconn too high - a too large backlog at a time when your server is already resource constrained might just push it over the brink.
Just a technical note on the 64K myth section. My understanding is that TCP connections track by the tuple (remote_host, remote_port, local_host, local_port) so a single client can have 64k unique connections to each port on a remote machine.
If that is actually the case, the document gets its myth correction wrong (by a lot) :)
You are right. The part I didn't really make clear is that we only serve on the single external port. Were we to use multiple, then yes, we could have 64k * 64k per IP pair.
This isn't relevant to newer kernels as these settings are dynamic based on memory size since 2.6.26ish - The kernel will set this based on usage no need to tweak. The only real issue is making sure you buy a high end network card that will offload as much as possible to avoid x context switches per second (I don't know what it is exactly with netpoll).
The C10k solutions are effectively the same as for C500k, those being epoll (Linux), kqueue (BSD), etc. Our Java NIO server utilizes epoll to handle C500k.
About the suggested sysctl.conf settings: I think you'd also need to adjust net.core.rmem_max and net.core.wmem_max in order for the net.ipv4.tcp_rmem and net.ipv4.tcp_wmem settings to be effective.
Furthermore it couldn't hurt to increase net.core.netdev_max_backlog, which is the maximum number of packets queued on the input side, when the interface receives packets faster than kernel can process them.
Regarding the `net.core` parameters. We do modify those, but my assumption (probably wrong) was that the `net.ipv4` changes would override the core configs. I'll take a look and update the post. Good point about `netdev_max_backlog`, I need to read up on that one too.
Linking this with the IPv6 stuff currently on the front page: note that none of this would be necessary if the clients were running IPv6 (or otherwise un-NAT-ed) - the server could simply send them a UDP packet or even open a TCP connection.
This is interesting stuff. I jumped into node.js programming a while ago and will like to run similar tests on node.js. Can anyone tell me how client side load of 500K long lived connections achieved ? Is there a standard set of programs to achieve this or some custom scripts.
A good question. Shrinking TCP buffer sizes can have a negative performance impact when sending large amounts of data; our use case was keeping track of a large number of mostly silent connections, and so we benefit from the smaller memory footprint.
[+] [-] evgen|15 years ago|reply
1) If you have lots of short connections and you want to tune the amount of time that the kernel will keep half-closed connections around then you can play around with changing the values of net.ipv4.tcp_fin_timeout, net.ipv4.tcp_tw_reuse, net.ipv4.tcp_tw_recycle, and net.ipv4.tcp_max_tw_buckets.
2) If you have a modern NIC then you probably need to tweak the txqueuelen in your ifconfig options.
3) If you are get hits from a large number of random browsers then sometimes setting net.ipv4.tcp_no_metrics_save and net.ipv4.tcp_moderate_rcvbuf to turn off cacheing of flow metrics helps.
4) Increase net.core.somaxconn to increase your listen queue size.
5) If you have a local firewall like iptables in place make sure you increase net.ipv4.ip_conntrack_max, direct your high-traffic ports to the NOTRACK table, and play around with all of the various net.ipv4.netfilter.ip_conntrack_tcp_timeout_* settings.
[+] [-] bnoordhuis|15 years ago|reply
[+] [-] metabrew|15 years ago|reply
[+] [-] sophacles|15 years ago|reply
If that is actually the case, the document gets its myth correction wrong (by a lot) :)
Can anyone clarify this?
[+] [-] superjared|15 years ago|reply
[+] [-] nwmcsween|15 years ago|reply
[+] [-] chrisbolt|15 years ago|reply
http://www.kegel.com/c10k.html
[+] [-] superjared|15 years ago|reply
[+] [-] metachris|15 years ago|reply
1. Serve many clients with each thread
2. Serve one client with each server thread
3. Build the server code into the kernel
[+] [-] metachris|15 years ago|reply
About the suggested sysctl.conf settings: I think you'd also need to adjust net.core.rmem_max and net.core.wmem_max in order for the net.ipv4.tcp_rmem and net.ipv4.tcp_wmem settings to be effective.
Furthermore it couldn't hurt to increase net.core.netdev_max_backlog, which is the maximum number of packets queued on the input side, when the interface receives packets faster than kernel can process them.
[+] [-] superjared|15 years ago|reply
Thanks for the feedback!
[+] [-] JoachimSchipper|15 years ago|reply
[+] [-] ashish01|15 years ago|reply
[+] [-] nivertech|15 years ago|reply
http://news.ycombinator.com/item?id=1755575
Is maximum number of connections that you can reach on largest EC2 instance is the same as on physical server?
[+] [-] plainOldText|15 years ago|reply
[+] [-] robotadam|15 years ago|reply
[+] [-] c00p3r|15 years ago|reply
So, moving a /var/log (not just /var) on separate device connected to distinct controller port is a big deal.
If you're running, say, mail server, you should separate /var/spool and /var/log and /var/db/mysql if any.
Partitioning, serious network card (think Broadcom) and big CPU caches are good things to begin with.
[+] [-] ciupicri|15 years ago|reply
[+] [-] c00p3r|15 years ago|reply
Even Oracle providing much more good advices, let alone some individual pros.
Good starting point: http://www.puschitz.com/InstallingOracle10g.shtml
Update: Oh, yes, I understood. Newfags doesn't know what Oracle is. MySQL = RDBMS, I see. ^_^