top | item 8505443

(no title)

JeanPierre | 11 years ago

I understand your concern, but I don't really think the problem is within Clojure people: When I started out explaining the structure to other people and that most operations have O(log n) runtime, people dismissed Clojure as being too slow for practical purposes. Part of this is because programmers, not Clojure devs in particular, connotate O(log n) with "much slower than O(1)". However, this data structure is incredibly much faster than the "O(log n)" people are generally used to.

And this is the problem. As you yourself say, most programmers usually associate/assume "Big-O notation is intended to describe how the time required for an operation grows as the number of items being handled gets large".

But formally, this is not the case at all. O(g(x)) tells you the that the worst case input to the algorithm will require less operations (usually operations, sometimes memory or other things) than the function g multiplied by some positive constant. g is usually determined by the size of the input. (This definition is also simplified somewhat, but it illustrates the problem)

We can say that ArrayLists.add have O(n) runtime and that quicksort have O(n²) runtime, but they both also have O(n³) and O(n!) runtime.

In addition, Big-O does not describe how the algorithm runs in the average case, or how frequent the worst case is. In and of itself, the Big-O notation tells us almost nothing about an algorithm – you barely conclude anything from it. I guess that is part of why it has become this very simplified term with a very wibbly-wobbly definition from time to time.

So when you're saying that calling it "practically O(1)" is lying, I agree if this was in an academic context. But for the average programmer (regardless of language!), O(g(x)) is usually thought of as the "worst case runtime for an algorithm". Since the actual runtime is – well – effectively constant, I considered it okay.

---

That being said, I have formally explained the persistent vector if you're interested in the academic version. The background chapter in http://hypirion.com/thesis.pdf should cover it.

discuss

ScottBurson|11 years ago

I understand that log-time algorithms have a marketing problem. I think the correct way to address that is the one I already suggested: give benchmark numbers for lookup, relative to ArrayList, for a typical collection size (or maybe for a few sample sizes -- 32, 1024, and 32768, perhaps). These will clearly show both that lookup is fast on small seqs and that it slows down logarithmically.

I would be more sympathetic to your argument that O(log n) is "practically" O(1) if you would put it in context. It's certainly true that how much the difference matters depends on the application. For someone doing real-time work (HFT, for example) it could be critical. For the average web app, it matters extremely little; the programmer productivity gains offered by functional collections far outweigh the small cost in CPU cycles. I totally agree with that argument, and have made it myself many times, but I still wouldn't say that log time is "practically" constant time for all applications.

JeanPierre|11 years ago

Agreed on the benchmarks – that's the next (and last) part on the list I'm going to do with this series.

And yes, of course, at some point, constant factors break into applications. Would it be better if add in a short comment on "practically O(1) for non-HPC applications", or would you phrase it differently to give it better context?

rawnlq|11 years ago

Cool you seem to be the author.

Instead of telling people that it's O(log n) you should just show benchmarks instead.

Hitting L1 takes 1ns but main memory takes 100ns. So although you can say accessing an element in an array is always O(1) = O(100), it is hiding a factor of a 100 depending on how it is accessed. Having a log in the big Oh usually means it is jumping around a lot and is no longer a cache friendly algorithm, so that worries me more than whether it is log_2(x) or log_2(x)/5.

millstone|11 years ago

> However, this data structure is incredibly much faster than the "O(log n)" people are generally used to.

Is it? I'll bet it's actually much slower than familiar O(log n) operations like binary search, in particular due to the memory allocations.

weavejester|11 years ago

If you're comparing a binary search to a lookup on a Clojure hashmap, there should be no memory allocations necessary.