top | item 26580926

Julia 1.6 Highlights

410 points| mbauman | 5 years ago |julialang.org

214 comments

order

Buttons840|5 years ago

I recently ported a reinforcement learning algorithm from PyTorch to Julia. I did my best to keep the implementations the same, with the same hyperparameters, network sizes, etc. I think I did a pretty good job because the performance was similar, solving the CartPole environment in the a similar number of steps, etc.

The Julia implementation ended up being about 2 to 3 times faster. I timed the core learning loops, the network evaluations and gradient calculations and applications, and PyTorch and Julia performed similar here. So it wasn't that Julia was faster at learning. Instead it was all the in-between, all the "book keeping" in Python ended up being much faster in Julia, enough so that overall it was 2 to 3 times faster.

(I was training on a CPU though. Things may be different if you're using a GPU, I don't know.)

gdpr|4 years ago

Similar experience over here. (G)ARCH models are severely underserved in Python, and I could not be bothered to learn a Probabilistic programming abstraction like Pyro or Stan just to build a quick prototype myself.

Chose Julia instead. Took 4 hours to get everything sorted out (including getting IT to allow Julias package manager to actually download stuff) and have the first model running just putting a paper into code. Since code is just writing the math, this is a vast communication improvement.

After fiddling around withit at home for a week, this was the first professional experience and I'm blown away.

wiz21c|5 years ago

could you tell us more ? It looks like a very in depth / interesting benchmark

stellalo|5 years ago

That’s interesting: did you use Flux?

beeforpork|5 years ago

Julia is such a wonderful language. There are many design decisions that I like, but most importantly to me, its ingenious idea of combining multiple dispatch with JIT compilation still leaves me in awe. It is such an elegant solution to achieving efficient multiple dispatch.

Thanks to everyone who is working on this language!

chalst|5 years ago

Julia is the first language to really show that multiple dispatch can be efficient in performance-critical code, but I'm not really sure why: JIT concepts were certainly familiar to implementors of Common Lisp and Dylan.

skohan|5 years ago

What does it mean exactly? Or what is novel here?

pjmlp|5 years ago

I advise you to check Common Lisp CLOS and Dylan.

MisterBiggs|5 years ago

I've been running the 1.6 release candidates, and the compilation speed improvements have been massive. There have been plenty of instances in the past where I've tried to 'quickly' show off some Julia code, and I end up waiting ~45 seconds for a plot to show or a minute for a Pluto notebook to run, and that's not to mention waiting for my imports to finish. It's still slower than Matlab for the first run, but it's at least in the same ballpark now.

peatmoss|5 years ago

In terms of “don’t make me think about why Julia is fast but feels slow for casual use” this release is going to be a game changer.

I just did a “using Plots” in 1.6.0, and it was fast enough to not care about the delta between Plots and, say, R loading ggplot.

Huge kudos to the Julia team.

Sukera|5 years ago

What kind of speed do you see now?

snicker7|5 years ago

On the package ecosystem side, 1.6 is required for JET.jl [0]. Despite being a dynamic language, the Julia compiler does a lot of static analysis (or "abstract interpretation" in Julia lingo). JET.jl exposes some of this to the user, opening a path for additional static analysis tools (or maybe even custom compilers).

[0]: https://github.com/aviatesk/JET.jl

akdor1154|5 years ago

Good gracious, thanks for this. If JET goes anywhere, then that+other goodies in 1.6 mean I will likely switch back from Python+mypy.

celrod|5 years ago

> or maybe even custom compilers

Like for autodiff or GPUs.

wiz21c|5 years ago

Whatever improves loading times is more than welcome. It's not really acceptable to wait because you import some libraries. In understand Julia makes lots of things under the hood and that there's a price to pay for that but being a python user, it's a bit inconvenient.

But I'll sure give it a try because Julia hits a sweet spot between expressiveness and speed (at least for the kind of stuff I do : matrix, algorithms, graphs computations).

odipar|5 years ago

I like Julia (mostly because of multiple dispatch). The only thing that's lacking is an industry strength Garbage Collector, something that can be found in the JVM.

I know that you shouldn't produce garbage, but I happen to like immutable data structures and those work better with optimised GCs.

eigenspace|5 years ago

Julia's garbage collector is quite good.

> I know that you shouldn't produce garbage, but I happen to like immutable data structures and those work better with optimised GCs.

If you use immutable data-structures in julia, you're rather unlikely to end up with any heap allocations at all. Unlike Java, Julia is very capable of stack allocating user defined types.

superdimwit|5 years ago

A low-latency GC would also be great. But again, the JVM only has that due to many millions of dollars spent over decades.

newswasboring|5 years ago

I didn't even know julia GC had issues. Care to elaborate?

noisy_boy|5 years ago

How easy it is to produce a compiled executable in 1.6? I took a cursory look at the docs but couldn't spot the steps for doing so.

dklend122|5 years ago

That's coming. Pieces are there but still need polish and integration. Fib was around 44kb with no runtime required.

Check out staticcompiler.jl

ced|5 years ago

We did it for production code installed at client sites, and it has been very easy for us. YMMV

triztian|5 years ago

I’ve also looked for this, does it mean that I have to install julia on the target machine and it’ll recompile when running?

Or are there steps to produce a binary (much like Go or C or Rust)??

3JPLW|5 years ago

The feature I'm most excited about is the parallel — and automatic — precompilation. Combined with the iterative latency improvements, Julia 1.6 has far fewer coffee breaks.

pjmlp|5 years ago

Love the improvements, all those little details that improve the overall usability.

xiphias2|5 years ago

Cool, I was thinking of downloading the RC, the demo was so impressive.

Will there be an M1 Mac version for 1.7?

fermienrico|5 years ago

Are the performance claims of Julia greatly exaggerated?

Julia loses almost consistently to Go, Crystal, Nim, Rust, Kotlin, Python (PyPy, Numpy): https://github.com/kostya/benchmarks

Is this because of bad typing or they didn't use Julia properly in idiomatic manner?

stabbles|5 years ago

I think it's more interesting to see what people do with the language instead of focusing on microbenchmarks. There's for instance this great package https://github.com/JuliaSIMD/LoopVectorization.jl which exports a simple macro `@avx` which you can stick to loops to vectorize them in ways better than the compiler (=LLVM). It's quite remarkable you can implement this in the language as a package as opposed to having LLVM improve or the julia compiler team figure this out.

See the docs which kinda read like blog posts: https://juliasimd.github.io/LoopVectorization.jl/stable/

And then replacing the matmul.jl with the following:

    @avx for i = 1:m, j = 1:p
        z = 0.0
        for k = 1:n
            z += a[i, k] * b[k, j]
        end
        out[i, j] = z
    end
I get a 4x speedup from 2.72s to 0.63s. And with @avxt (threaded) using 8 threads it goes town to 0.082s on my amd ryzen cpu. (So this is not dispatching to MKL/OpenBLAS/etc). Doing the same in native Python takes 403.781s on this system -- haven't tried the others.

SatvikBeri|5 years ago

I've rewritten two major pipelines from numpy-heavy, fairly optimized Python to Julia and gotten a 30x performance improvement in one, and 10x in the other. It's pretty fast!

paul_milovanov|5 years ago

looks like they're just multiplying two 100x100 matrices, once? (maybe I'm reading it wrong?) in Julia, runtime would be dominated by compilation + startup time.

A fair comparison with C++ would be to at least include the compilation/linking time into the time reported.

Ditto for Java or any JVM language (you'd have JVM startup cost but that doesn't count the compilation time for bytecode).

Generally, for stuff (scientific computing benchmarks) like this you want to run a lot of computation precisely to avoid stuff like this (i.e you want to fairly allow the cost of compilation & startup amortize)

StefanKarpinski|5 years ago

This appears to be a set of benchmarks of how fast a brainfuck interpreter implemented in different programming languages is on a small set of brainfuck programs? What a bizarre thing to care about benchmarks for. Are you planning on using Julia by writing brainfuck code and then running it through an interpreter written in Julia?

tgv|5 years ago

Idk, but just a few weeks ago I started looking at Julia, partly because of the performance claims. I wanted to write a program a bit heavier than your average starter program, so I wrote a back-tracker (automatic layout for stripboards, to be precise). It was

* interesting (not fun) to find out how Julia works

* annoying AF to discover that much of the teaching material was hidden behind some 3rd party website, presumably in videos (I didn't bother to register, but started browsing the manual instead). What's wrong with text?

* unnecessarily complex because the documentation for the basic functions is nearly inaccessible to beginners.

But, I managed to get a simple layout system up and running, and it wasn't fast. I rewrote it in Go (the language in which I'm currently working most), and it was literally >100x faster. And that should not be due to the startup costs, because a backtracker shouldn't have that much overhead JIT-ing.

I think I can now say that I can't see the use case for Julia. "Faster than Python" is simply not good enough, and for the rest there are no redeeming features. Perhaps the fabled partial differential equation module is worth it, but that can get ported to other languages, I guess.

otde|5 years ago

I think this particular Julia code is pretty misleading, and I'm (probably) one of the most qualified people in this particular neck of the woods. I wrote a transpiler for Julia that converts a Brainfuck program to a native Julia function at parse time, which you can then call like you would any other julia function.

Here's code I ran, with results:

  julia> using GalaxyBrain, BenchmarkTools

  julia> bench = bf"""
      >++[<+++++++++++++>-]<[[>+>+<<-]>[<+>-]++++++++       
      [>++++++++<-]>.[-]<<>++++++++++[>++++++++++[>++
      ++++++++[>++++++++++[>++++++++++[>++++++++++[>+       
      +++++++++[-]<-]<-]<-]<-]<-]<-]<-]++++++++++."""

  julia> @benchmark $(bench)(; output=devnull, memory_size=100)
  BenchmarkTools.Trial: 
    memory estimate:  352 bytes
    allocs estimate:  3
    --------------
    minimum time:     96.706 ms (0.00% GC)
    median time:      97.633 ms (0.00% GC)
    mean time:        98.347 ms (0.00% GC)
    maximum time:     102.814 ms (0.00% GC)
    --------------
    samples:          51
    evals/sample:     1

  julia> mandel = bf"(not printing for brevity's sake)"

  julia> @benchmark $(mandel)(; output=devnull, memory_size=500)
  BenchmarkTools.Trial: 
    memory estimate:  784 bytes
    allocs estimate:  3
    --------------  
    minimum time:     1.006 s (0.00% GC)
    median time:      1.009 s (0.00% GC)
    mean time:        1.011 s (0.00% GC)
    maximum time:     1.022 s (0.00% GC)  
    --------------
    samples:          5  evals/sample:     1
Note that, conservatively, GalaxyBrain is about 8 times faster than C++ on "bench.b" and 13 times faster than C on "mandel.b," with each being the fastest language for the respective benchmarks. In addition, it allocates almost no memory relative to the other programs, which measure memory usage in MiB.

You could argue that I might see similar speedup for other languages on my machine, assuming I have a spectacularly fast setup, but this person ran their benchmarks on a tenth generation Intel CPU, whereas mine's an eighth generation Intel CPU:

  julia> versioninfo()
    Julia Version 1.5.1
    Commit 697e782ab8 (2020-08-25 20:08 UTC)
    Platform Info:  OS: Linux (x86_64-pc-linux-gnu)
    CPU: Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz  
    WORD_SIZE: 64
    LIBM: libopenlibm  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
This package is 70 lines of Julia code. You can check it out for yourself here: https://github.com/OTDE/GalaxyBrain.jl

I talk about this package in-depth here: https://medium.com/@otde/six-months-with-julia-parse-time-tr...

adgjlsfhk1|5 years ago

They are measuring compile time, not runtime speed.

machineko|5 years ago

I think i can answer that, first of all Julia isnt as fast as C/C++/Nim etc. in most cases Julia is just fast in scientific computing that's all. (there is only one "scientific" benchmark on kostya benchmarks)

Second to write very fast julia u need to knew a lot of "tricks" and in most cases u won't be doing it as easy as writing normal code.

And all people writing this benchmark is measuring compilation time (XD?) or not including jitting time they could just look at code/readme for 5s before commenting.

Julia is fast and can be as fast as C but not in all cases and not as easy at it seems.

f6v|5 years ago

Is there a per-project way to manage dependencies yet? I find global package installation to be the biggest weakness of all the R projects out there. Anaconda can help, but it’s not widely used for R projects. And Docker... well, don’t get me started.

krastanov|5 years ago

I might be misunderstanding your question, but this post is about Julia, not R. Julia has a pretty great per-project dependency management.

eigenspace|5 years ago

Yes, absolutely. Julia has very strong per-project dependency tracking and reproducibility.

psychometry|5 years ago

renv is how R projects do per-package dependency management. Before renv there was packrat. This has been a solved problem for years now...

ng55QPSK|5 years ago

maybe i misread this, but milestone "1.6 blockers" still has 3 open with "1.6 now considered feature-complete. This milestone tracks release-blocking issues." - so how can 1.6 be ready?

kristofferc|5 years ago

It is simple. Those issues shouldn't have had the milestone.