(no title)
JeanPierre | 11 years ago
And this is the problem. As you yourself say, most programmers usually associate/assume "Big-O notation is intended to describe how the time required for an operation grows as the number of items being handled gets large".
But formally, this is not the case at all. O(g(x)) tells you the that the worst case input to the algorithm will require less operations (usually operations, sometimes memory or other things) than the function g multiplied by some positive constant. g is usually determined by the size of the input. (This definition is also simplified somewhat, but it illustrates the problem)
We can say that ArrayLists.add have O(n) runtime and that quicksort have O(n²) runtime, but they both also have O(n³) and O(n!) runtime.
In addition, Big-O does not describe how the algorithm runs in the average case, or how frequent the worst case is. In and of itself, the Big-O notation tells us almost nothing about an algorithm – you barely conclude anything from it. I guess that is part of why it has become this very simplified term with a very wibbly-wobbly definition from time to time.
So when you're saying that calling it "practically O(1)" is lying, I agree if this was in an academic context. But for the average programmer (regardless of language!), O(g(x)) is usually thought of as the "worst case runtime for an algorithm". Since the actual runtime is – well – effectively constant, I considered it okay.
---
That being said, I have formally explained the persistent vector if you're interested in the academic version. The background chapter in http://hypirion.com/thesis.pdf should cover it.
ScottBurson|11 years ago
I would be more sympathetic to your argument that O(log n) is "practically" O(1) if you would put it in context. It's certainly true that how much the difference matters depends on the application. For someone doing real-time work (HFT, for example) it could be critical. For the average web app, it matters extremely little; the programmer productivity gains offered by functional collections far outweigh the small cost in CPU cycles. I totally agree with that argument, and have made it myself many times, but I still wouldn't say that log time is "practically" constant time for all applications.
JeanPierre|11 years ago
And yes, of course, at some point, constant factors break into applications. Would it be better if add in a short comment on "practically O(1) for non-HPC applications", or would you phrase it differently to give it better context?
rawnlq|11 years ago
Instead of telling people that it's O(log n) you should just show benchmarks instead.
Hitting L1 takes 1ns but main memory takes 100ns. So although you can say accessing an element in an array is always O(1) = O(100), it is hiding a factor of a 100 depending on how it is accessed. Having a log in the big Oh usually means it is jumping around a lot and is no longer a cache friendly algorithm, so that worries me more than whether it is log_2(x) or log_2(x)/5.
millstone|11 years ago
Is it? I'll bet it's actually much slower than familiar O(log n) operations like binary search, in particular due to the memory allocations.
weavejester|11 years ago