top | item 4014702

(no title)

For what it's worth, I've looked into the 'Split' case, and the performance difference when specialized to the single-byte case is about 2%, which is mostly because Split already has built-in specialization for the single-byte case, which amounts to a couple extra instructions in a function whose running time is dominated by allocation.

I think they made the right choice there; the Go team seems very good about optimizing only where it matters; there's lots of low hanging fruit, but the majority of it isn't very useful fruit.

discuss

taliesinb|13 years ago

Just for fun, I just looked into it too. Which factor dominates depends on the kinds of strings; for large strings, extra instructions in the loop matter very much. I'm processing very large strings.

I did 128 runs on a byte array of length 2^24. It has delimiters placed at positions {2^i, i < 24}.

I tested my implementation against both the "bytes" package implementation, and a copy of the relevant portions of the "bytes" package (to account for any odd effects of inlining and separate compilation). I did the set of timings twice in case there was any GC.

Here's the wall time in milliseconds for the three implementations, on a 2010 Macbook Air.

mine 3313 copy 4709 bytes 5689 mine 3327 copy 4660 bytes 5660

My single-byte implementation is about 40% faster than the local version, and 70% faster than the "bytes" version. Not quite twice, but I wasn't far off.

But aside from performance, there is just consistency of interface. Once you've established a 'Byte' variant of some functions, you should do it for all the common functions.

supersillyus|13 years ago

It doesn't sound like you're using the benchmarking tools that Go provides; I'd recommend using that if you're not.

Ah, yeah, I was testing a much much smaller byte array with multiple split points. I'm not terribly surprised that in your case you've found the hand-coded byte version to be faster (though the difference is more than I would've guessed; care to post the code?) However, I'm still not sure it's merited in the standard library. Split() could pretty easily be further specialized to basically call your single byte implementation or equivalent at the cost of a handful of instructions per call. Alternately, if you know you're dealing with very large byte slices with only a few split points, it is only a couple lines of code to write a specialized version that is tuned for that. The same argument could be make for IndexByte, but I'd claim that IndexByte is a much more fundamental operation in which one more often has need for a hand-tuned assembly implementation. I wouldn't say the same for Split. There's a benefit to having fewer speed-specialized standard library calls, and I don't think splitting on a byte with performance as a primary concern happens often enough to merit another way to split on a byte in the standard library. But I'm certain that reasonable people who are smarter than me would disagree.