Is Julia Really Fast?

[+] plafl|4 years ago|reply

I'm extremely happy with Julia performance. I'm using it right now to prototype some new algorithms for my PhD.

How fast is fast? Well in my case I'm trying to come up with things related to collision detection and I'm getting <30ns using SAT between OBBs (Oriented Bounding Boxes) and <300ns for minimum distance computation using the GJK algorithm again between OBBs.

In the context of recommender systems I did an experiment when I started learning Julia comparing against Cython and wrote about it, although I was a newbie and I should update it: https://plopezadeva.com/julia-first-impressions.html

[+] jvanderbot|4 years ago|reply

So, the article does a bit of goalpost switching. OP provides an example of massive speedup for real applications, which I don't doubt, but constantly compares between Python ( which is fast only really when you dispatch to optimized C/C++ ) and C.

"Unless you are a C/C++ wizard normal Python developers cannot fix or optimize big libraries like NumPy or TensorFlow. In Julia its users tend to contribute more to the libraries they use themselves."

I am excited by Julia. It seems to have a good niche and community around it. It seems well suited for its job. That focus will probably make it better than more general alternatives in short order.

[+] cogman10|4 years ago|reply

Well, further, python's performance problems tend to come from it's C/C++ interop. Python made a few mistakes early on that make JITing it really hard without making backwards incompatible concessions.

Easy/Fast C interop is a blessing and a curse. Yes, you can do fast things from C with little overhead, but it also makes things like having a generational garbage collector or a JIT really hard.

[+] leephillips|4 years ago|reply

The author gives an example of a little function written in a generic way, and shows how JIT compilation specializes it at runtime into highly optimized machine code. It’s a wonderful example of how Julia’s JIT compiler works, and how it can lead to significant speedups over even C or Fortran code in some cases.

[+] Bostonian|4 years ago|reply

At the Fortran Wiki http://fortranwiki.org/fortran/show/Articles there are links to benchmarks comparing Fortran to Julia and other languages.

Looking at https://github.com/arturofburgos/Assessment-of-Programming-L... https://www.matecdev.com/posts/numpy-julia-fortran.html https://github.com/zyth0s/bench_density_gradient_wfn https://github.com/PIK-ICoNe/NetworkDynamicsBenchmarks

people do find Julia to be faster than Python/Numpy, but it is not uniformly faster than Fortran. And Julia's start-up time should not be ignored. Quoting the last link, "In fact the whole Fortran benchmark (300 integrations) finishes roughly in the time it takes to startup a Julia session and import all required libraries (Julia 1.5.1)."

[+] leephillips|4 years ago|reply

I don’t think anyone is claiming that Julia is uniformly faster than well-written Fortran. But you can get comparable performance with code using a style that many people find easier to read, write, and reason about; and it’s interesting that sometimes this code does outperform optimized C or Fortran code.

Startup time is much improved in recent Julia versions¹, but is certainly not negligible for short calculations.

[1]https://lwn.net/Articles/856819/

[+] fathead_glacier|4 years ago|reply

Comparing different metrics is not valid across all use cases. Counting startup time but excluding development time is disingenuous. Having used both Fortan and Julia the difference between development time is staggering.

Different tools for different uses cases is the best way to put it.

[+] dnautics|4 years ago|reply

If youre actually doing HPC, say, on a cluster, the startup time will be amortized, or else, you're doing something wrong.

[+] socialdemocrat|4 years ago|reply

Of course it will not always beat Fortran, but don't think think it is damn impressive that a high level dynamic language with much higher productivity frequently matches or beats Fortran? That is a crazy achievement if you ask me.

Real world systems today are going to use a lot of computing resources, such as clusters, GPUs, tensor processing units, multiple cores etc. In such a world, anything that makes that easy to deal with is going to have the performance edge in practice.

Doesn't matter how fast a Fortran program would be in theory, if the Julia program is delivered years ahead of it.

[+] jvanderbot|4 years ago|reply

Thanks for this. Not sure why its dowdownvoted. Numbers are worth looking at.

[+] streamofdigits|4 years ago|reply

If julia could become a native vector/gpu programming language that is usable by mere mortals this could be a niche that might eventually grow into mainstream. But I can't help but notice that e.g. nvidia's github mentions only: Python, C++, C, Go, Cuda. (not sure the order matters).

[+] eigenspace|4 years ago|reply

Julia has excellent CUDA support[1], I have no idea why nvidia doesn't promote it more. It's fast, flexible and very featureful. There was a recent thread about it here: https://news.ycombinator.com/item?id=27496679

The AMD [2] and Intel [3] support is younger, but developing quickly.

There's also [4] for a unified API that works across different GPU vendors to avoid lockin

[1] https://cuda.juliagpu.org/dev/

[2] https://github.com/JuliaGPU/AMDGPU.jl

[3] https://github.com/JuliaGPU/oneAPI.jl

[4] https://github.com/JuliaGPU/KernelAbstractions.jl

[+] anon321321323|4 years ago|reply

Julia is faster than python for what I've done so far, so yes. I actually prefer Elixir but I use both.

[+] jnxx|4 years ago|reply

Well, Common Lisp or Racket are a lot faster than Python, too, and both are general-purpose languages which has many advantages over a language specific for numerical computing.

[+] canadianfella|4 years ago|reply

[deleted]

[+] dandanua|4 years ago|reply

If only Julia had a fluent static compilation it would be in the top 10 of languages. It's too bad there are not so much investment into a really important stuff that will benefit the world. All the investment goes to bubbles and financial pyramids.

[+] cbkeller|4 years ago|reply

I hope it'll get there! If anyone wants to help out, check out (e.g.) https://github.com/tshort/StaticCompiler.jl/pull/46

[+] socialdemocrat|4 years ago|reply

If PackageCompiler.jl was more polished you mean?

[+] logimame|4 years ago|reply

One of the gripes that I have with Julia is that if you write linear algebra code naively, you will have tons of unnecessary temporary allocations, while in Eigen (a C++ library) you can avoid most of these without sacrificing too much readability. (It even optimizes how to run matrix kernels on the fly!) Sure, you can rewrite your Julia code in C-style to remove those temporary allocations, but then the code becomes even less readable than what you can achieve in C++.

Here's an example: https://ronanarraes.com/tutorials/julia/my-julia-workflow-re...

The naive Julia version has unnecessary allocations and therefore is 23% slower than the optimized version:

    @inbounds for k = 2:60000
        Pp   .= Fk_1 * Pu * Fk_1' .+ Q
        K    .= Pp * Hk' * pinv(R .+ Hk * Pp * Hk')
        aux1 .= I18 .- K * Hk
        Pu   .= aux1 * Pp * aux1' .+ K * R * K'

        result[k] = tr(Pu)
    end

In order for this loop to match the C++ version you need to use C-style functions:

    for k = 2:60000
        # Pp = Fk_1 * Pu * Fk_1' + Q
        mul!(aux2, mul!(aux1, Fk_1, Pu), Fk_1')
        @. Pp = aux2 + Q

        # K = Pp * Hk' * pinv(R + Hk * Pp * Hk')
        mul!(aux4, Hk, mul!(aux3, Pp, Hk'))
        mul!(K, aux3, pinv(R + aux4))

        # Pu = (I - K * Hk) * Pp * (I - K * Hk)' + K * R * K'
        mul!(aux1, K, Hk)
        @. aux2 = I18 - aux1
        mul!(aux6, mul!(aux5, aux2, Pp), aux2')
        mul!(aux5, mul!(aux3, K, R), K')
        @. Pu = aux6 + aux5

        result[k] = tr(Pu)
    end

... which is quite dirty. But you can write the same thing in C++ like this (and even be a bit faster than Julia!):

    for(int k = 2; k <= 60000; k++) {
        Pp   = Fk_1*Pu*Fk_1.transpose() + Q;
        aux1 = R + Hk*Pp*Hk.transpose();
        pinv = aux1.completeOrthogonalDecomposition().pseudoInverse();
        K    = Pp*Hk.transpose()*pinv;
        aux2 = I18 - K*Hk;
        Pu   = aux2*Pp*aux2.transpose() + K*R*K.transpose();

        result[k-1] = Pu.trace();
    }

which is much more readable than Julia's optimized version.

If Julia had a linear-algebra-aware optimizing compiler (without the sheer madness of C++ template meta-programming that Eigen uses), then Julia's standing in HPC would be much, much better. I admit that it's a hard goal to achieve, since I haven't seen any language trying this (the closest I've seen is LLVM's matrix intrinsics (https://clang.llvm.org/docs/MatrixTypes.html), but it's only a proposal)

[+] thebooktocome|4 years ago|reply

You're comparing dynamically resizable Julia arrays against Eigen's static arrays. Julia does have a static array package, StaticArrays.jl.

[+] zmk_|4 years ago|reply

In this code, the main problem---I think---is that there are intermediate results that are being allocated, e.g., Fk_1 * Pu * Fk_1'. I will speculate that you could improve on the baseline code by preallocating these in the same way as Pp, K, aux1, and Pu are initialized outside of the loop.

[+] antoine-levitt|4 years ago|reply

Are you sure that the difference is due to the allocations? I would expect this to be dominated by matrix multiplies or svds. Are you comparing this with the same blas/LAPACK?

Edit : OK, I see those are small matrices. Then Staticarrays should be a nice contender here, both for speed and readability.

[+] unknown|4 years ago|reply

[deleted]

[+] Steve_Baker77|4 years ago|reply

sign in required

[+] arjunbajaj|4 years ago|reply

or you can try outline.com

https://outline.com/R5kdau

[+] arianon|4 years ago|reply

Use Incognito Mode/Private Browsing if Medium is telling you that you ran out of free member-only stories.

[+] jvanderbot|4 years ago|reply

Not for me?

[+] effnorwood|4 years ago|reply

[deleted]

49 comments