top | item 40033364

(no title)

emcq | 1 year ago

If we are going to consider using prior runs of the program having the file loaded in RAM by the kernel fair, why stop there?

Let's say I create a "cache" where I store the min/mean/max output for each city, mmap it, and read it at least once to make sure it is in RAM. If the cache is available I simply write it to standard out. I use whatever method to compute the first run, and I persist it to disk and then mmap it. The first run could take 20 hours and gets discarded.

By technicality it might fit the rules of the original request but it isn't an interesting solution. Feel free to submit it :)

discuss

gunnarmorling|1 year ago

This actually doesn't fit the rules. I've designed the challenge so that disk I/O is not part of the measured runtime (initially, by relying on the fact that the first run which would pull the file into the page cache will be the slowest and thus discarded, later on by loading the file from a RAM disk). But keeping state between runs is explicitly ruled out as per the README, for the reason you describe.

pama|1 year ago

Having write access to storage or spawning persistent daemons is an extra requirement and that is often not available in practice when evaluating contest code :-)

This is a fun project for learning CUDA and I enjoyed reading about it——I just wanted to point out that the performance tuning in this case is really on the parsing, hashing, memory transfers, and IO. Taking IO out of the picture using specialized hardware or Linux kernel caching still leaves an interesting problem to solve and the focus should be on minimizing the memory transfers, parsing, and hashing.