aivarsk | 2 months ago | on: Solid is not so solid
aivarsk's comments
aivarsk | 3 years ago | on: Python Asyncio
Async Python is slower than "sync" Python under a realistic benchmark. A bigger worry is that async frameworks go a bit wobbly under load. https://calpaterson.com/async-python-is-not-faster.html
aivarsk | 3 years ago | on: Ask HN: What are the reasons behind your success as a self-taught programmer?
aivarsk | 4 years ago | on: Malloy – A Better SQL, from Looker
It begs for examples of the ugly/verbose/complex SQL and the beautiful/succinct/simple Malloy. Looking at the samples folder did no work for me.
aivarsk | 4 years ago | on: Ask HN: What is your preferred notebook screen size and why?
I picked the wrong CPU for T490s and it fried two keyboards: several keys on the right including enter and backspace stopped working after ~6 months.
aivarsk | 5 years ago | on: Ask HN: Optimal cloud service to run tiny website with back end Python + SQLite?
But Atlantic had a 256Mb RAM, 10Gb disk option years ago for just $0.99 I still have one server there for hosting static HTML, running some scraping tasks, MS document to PDF conversion service (Python Twisted) for a customer, etc. All that for just $1.20 a month (including VAT)
aivarsk | 6 years ago | on: Turing Pi: Kubernetes Cluster on Your Desk
aivarsk | 7 years ago | on: The Internet Has a Huge C/C++ Problem and Developers Don't Want to Deal with It
aivarsk | 7 years ago | on: Ask HN: Work from home (WFH) setup
My only setup is my ThinkPad T450s. That's all.
At some point in my carrier I used 4 monitors, specific low profile keyboards and mice at the office but I don't miss that. It took some time (years!) to adjust my workflow to a single screen and maybe I'm not as productive as I could be but I'm still among the most productive employees at my company.
What I achieved is that I'm equally productive and +/- comfortable anywhere: at work, on the couch or office room at home, on a train, at airport, in the garden, anywhere. That consistency is more important to me than maximum efficiency.
So that and I choose what kind of tasks I work on when I'm alone or when family is at home because you can't expect to fully isolate yourself.
aivarsk | 7 years ago | on: I am leaving llvm
aivarsk | 8 years ago | on: Show HN: SuperString – A fast and memory-optimized string library for C++
aivarsk | 8 years ago | on: Ask HN: Which blogs/newsletters would you be willing to pay $5/month for?
aivarsk | 8 years ago | on: Ask HN: What website, from your early days on the net, do you miss?
aivarsk | 8 years ago | on: Will Cash Disappear?
aivarsk | 8 years ago | on: Ask HN: What's the worst thing your code has done?
Came to work the next day, nightly build still had not finished on slave servers, had errors about non-existent home folder when tried to log in.
What made it worst was that every server mounted a NFS share that contained fingerprints and binaries of different versions of software modules built on different platforms.
Killed all slaves, restored the NFS share from week old backups on tapes, tens of developers could not create new versions of software and send previous versions/patches to customers for a while.
aivarsk | 8 years ago | on: Ask HN: How can one learn to build API gateways?
aivarsk | 8 years ago | on: Using select(2) the right way
aivarsk | 8 years ago | on: Using select(2) the right way
> why not just use poll(2) instead?
Because as I mentioned later poll basically is the same as select but requires more memory to be copied to/from kernel and was slower in some test cases although I can come up with cases where poll will be faster than select. Networking libraries like libev and others allocate fd_set the same way.
>> To find out which sockets have events some loop through data structures and check each socket they contain. But a much better way is to iterate through fd_set as array of long and check 32-64 bits at a time.
> you're better off switching to a socket API that scales well (epoll or kevent) and does this filtering for you. Or like another commenter suggested, using a library that abstracts this functionality.
That's how kernel, libev and others work with fd_set.
>> The correct way is to maintain a descriptor set all the time and create a copy of it and pass the copy to select.
> Again -- if you've reached the point where this tradeoff matters, just go directly to epoll/kevent.
Again -- libev and others do this
> Maybe your program deals with non-socket fds, and the set of socket fds is fairly sparse. Using a map is actually pretty reasonable even if fds are dense.
Yes, there might be cases where you have non-sockets and you can't use array indexed by socket. But it's great in most of the cases and kernel will keep it as dense as possible. It might be that you have 10,001 connections and then 10,000 are closed and highest socket is still in use and array memory is wasted. But it will not require more memory than during peaks.
libev use of select: http://cvs.schmorp.de/libev/ev_select.c?revision=1.56
aivarsk | 8 years ago | on: Using select(2) the right way
I have my doubts about select causing 100% cpu utilization, I suspect you were doing other suboptimal things as well. The sample code I wrote is running well with both 100 and 10,000 connections. I have my own anecdotal evidence where application was barely handling just 100 mostly inactive connections and OPS guys suggested limits of just 50 connections per application. After I fixed how feed sets were created and how the result of select was processed the same application was running just fine with 8,000 connections. We had to support Linux, Solaris, AIX and HP-UX at that time and select/poll were available on all of them. That's why I invested time in optimizing the code instead of switching to epoll. OPS guys still suggested limit of 1,000 connections per application but this time it was for availability and other non-performance reasons.
aivarsk | 9 years ago | on: Select(2) is fundamentally broken
You just have to use select() correctly:
1) You can raise the 1024 limit of feed set size by "#define FD_SETSIZE 65536" (required for SunOS to use select_large_fdset() instead of select()) and allocating memory for fd_set yourself.
2) Do not loop over descriptors and use FD_ISSET to check if file descriptor is in set. Instead loop over fd_set one word at a time: if word != 0 then go and analyze each bit of that word (see how Linux kernel does it).
3) The other thing is to limit number of select() calls you make per second and do short sleeps if needed. That allows for events to be processed in batches and the cost of select() calls gets relatively smaller compared to the "real work" done. It also increases latency but you can work out a reasonable number of select() per second. This idea I got from "Efficient Network I/O Polling with Fine-Grained Interval Control by Eiji Kawai, Youki Kadobayashi, Suguru Yamaguchi"
I learned how to use accept() correctly from "Scalability of Linux Event-Dispatch Mechanisms by Abhishek Chandra, David Mosberger". The main idea is to call accept() in a loop until EAGAIN or EWOULDBLOCK is returned or you have accepted "enough" connections.
I don't get why author claims that epoll() fixes the problem with registering and unregistering descriptors. If you use epoll then adding or removing a descriptor is a system call but in case of select() you just modify a local data structure and call select() when you're done adding and removing all of descriptors. And you shouldn't call accept() from multiple threads, a single thread calling accept() is enough for most of us unless you're web scale ;-D