I think the optimal strategy would be to use the "reduce" step in mapreduce. Have threads that read portions of the file and add data to a "list", 1 for each unique name. Then, this set of threads can "process" these lists. I don't think we need to sort, that'd be too expensive, just a linear pass would be good. I can't see how we can do SIMD since we want max/min which mandate a linear pass anyway.
qsort|2 years ago
Would have been more interesting with something like median/k-th percentile, or some other aggregation not as easy.
dist-epoch|2 years ago
https://www.felixcloutier.com/x86/phminposuw