(no title)
SeanSullivan86 | 2 months ago
I was interviewing recently and was asked about implementing a web crawler and then were discussing bottlenecks (network fetching the pages, writing the content to disk, CPU usage for stuff like parsing the responses) and parallelism, and I wanted to just say "well, i'd test it to figure out what I was bottlenecked on and then iterate on my solution".
moab|2 months ago
Your question about what degree of parallelization is unfortunately too vague to really answer. SSDs offer some internal parallelism. Need more parallelism / IOPS? You can stick a lot more SSDs on your machine. Need many machines worth of SSDs? Disaggregate them, but now you need to think about your network bandwidth, NICs, cross-machine latency, and fault-tolerance.
The best engineers I've seen are usually excellent at napkin math.