top | item 28976597

(no title)

Yes, this is a general problem with functional data structures. They have to be fragmented in order to share data. There's also the more nebulous issue that they encourage nesting to leverage the architectural benefits of immutability, which is a complete disaster for cache friendliness.

Replacing the critical path is an option, but that only works for numpy-style situations where you can cleanly isolate the computational work and detach it from the plumbing. If your app is slow because the inefficiencies have added up across the entire codebase (more common, I would argue), that's not an easy option.

discuss

capableweb|4 years ago

You seem to be missing the point lukashrb made regarding using Java data structures. Your claim was "It's inherent. Clojure will always be slow" which is demonstrable false as you can use interop with your host language (Java, JavaScript, Erlang) to use data structure that don't suffer from this. Wrap how you're using them in Clojure functions and the usage code for this is idiomatic while also having cache friendly data structures.

MillenialMan|4 years ago

I understand the point, I'm not arguing that you can't do that or speed up your code by doing that.

Python is also slow, but you can still go fast (in some cases) by calling into C libraries like numpy - but the performance is coming from C, not Python. Python is still slow, it's just not an issue because you're only using it for plumbing.

But Clojure is immutable-by-default, that's the point of the language - it gives you various guarantees so you don't have to worry about mutability bubbling up from lower levels. In order to wrap your heavy path you have to go outside the Clojure structure and sacrifice that guarantee. You do lose structural integrity when you do that, even if you attempt to insulate your mutable structure. The system loses some provability.

didibus|4 years ago

> If your app is slow because the inefficiencies have added up across the entire codebase (more common, I would argue), that's not an easy option.

This is where I would have to disagree, in my experience, that is less common. Generally there are specific places that are hot spots, and you can just optimize those.

Could be it depends what application you are writing, I tend to write backend services and web apps, for those I've not really seen the "inefficiencies have added up", generally if you profile you'll find a few places that are your offenders.

"Slow" is also very relative.

    (cr/quick-bench (reduce (constantly nil) times-vector))
    Execution time mean : 3.985765 ms

    (cr/quick-bench (dotimes [i (.size times-arraylist)]
                      (.get times-arraylist i)))
    Execution time mean : 775.562574 µs

    (cr/quick-bench (dotimes [i (alength times-array)]
                      (aget times-array i)))
    Execution time mean : 590.941280 µs

Yes iterating over a persistent vector of Integers is slower compared to ArrayList and Array, about 5 to 8 times slower. But for a vector of size 1000000 it only takes 4ms to do so.

In Python:

    > python3 -m timeit "for i in range(1000000): None"
    20 loops, best of 5: 12.1 msec per loop

It take 12ms for example.

So I would say for most uses, persistent vectors serve as a great default mix of speed and correctness.