(no title)
ikatson | 2 years ago
E.g. parsing bencode, and other binary protocols involved in the network (bittorrent peer protocol, tracker requests, DHT protocol) was all easy comparing to managing state.
Managing state (e.g. what parts we have downloaded so far, what are we downloading at the moment, what peers are we expecting to get pieces from, etc) has a lot of edge cases that need to be handled.
For a recent example, I was dealing with a bug that manifested like this: when resuming a torrent from "paused" state, it was stuck at ~99.87% downloaded, and never progressed. Turned out, than when I put the torrent into "paused" state, after disconnecting the peers, I wasn't marking the in-flight pieces for re-download. So when new peers connected, they had nothing to do.
DHT was also tricky, for multiple reasons:
- as it works via UDP, there's no "connection" - you need to match incoming messages to previously sent requests yourself. But then you might never get a response, so you must manage some kind of timeouts, to clean up memory of old outgoing requests that are no longer expected to be completed.
- DHT involves "recursive" requests. You query N peers at first, and each of them might return M other peers. So on next round you need to make N * M requests. And this recursion can continue forever, ever growing exponentially. You need to put some heuristics around not to explode the network and at least your own computer. For example, my MacOS UI was freezing when rqbit was trying to send too many DHT requests at once.
- Handling the above (and other) examples makes the code worse. Managing code complexity, at least for myself to understand what on earth is going on, is much harder than e.g. implementing the binary protocols.
Otherwise, it's all about tricky details, behaviours that can only be observed under certain circumstances, and only if you're curious enough to look, e.g.
- Garbage collection. Ensuring that when the client (e.g. peer, browser etc) disconnect, everything is cleaned up. E.g. when the torrent is paused, this causes a massive "stop" for all spawned tasks. If you don't account for that, they might keep running forever.
- Network issues. Re-connecting to peers, to DHT etc, re-trying everything that can be retried, might be a head-scratcher sometimes.
All that said, it's quite fun to deal with these when that's your goal by itself - to enjoy the process of coding.
anacrolix|2 years ago
ikatson|2 years ago
For block requesting, rqbit has a pretty simple algorithm https://github.com/ikatson/rqbit/blob/main/crates/librqbit/s..., and I didn't notice it in benchmarks, thanks to Rust being fast by default I guess. I admit though, never looked how other clients do it, maybe the rqbit algorithm is too naive.