I'd really recommend anyone doing mildly numerical / data-ey work in python to give Julia a patient and fair try.
I think the language is really solidly designed, and gives you ridiculously more power AND productivity than python for a whole range of workloads. There are of course issues, but even in the short time I've been following & using the language these are being rapidly addressed. In particular: generally less rich system of libraries (but some Julia libraries are state of the art across all languages, mainly due to easy metaprogramming and multiple dispatch) + generally slow compile times (but this is improving rapidly with caching etc). I would also note that you often don't really need as many "libraries" as you do in python or R, since you can typically just write down the code you want to write, rather than being forced to find a library that wraps a C/C++ implementation like in python/r.
>you can typically just write down the code you want to write, rather than being forced to find a library that wraps a C/C++ implementation like in python/r.
I don't think this is really a feature. It's nice that you can write more performant code in Julia directly and don't need to wrap lower level languages, without question, but the lack of libraries or library features is not a good thing. It's always better to use a general purpose library that's been battle tested than to write your own numerical mathematics code (because bugs in numerical code can take a long time to get noticed)
For specialized scientific computing applications, which would normally be written in C/C++, I would absolutely look into using Julia instead (though not sure what the openmp/mpi support is like). But I would also recommend against rolling your own numerical software unless you need to
This looks like a good reference for the fundamentals of both statistics and Julia, as claimed. I have a small critique, since the authors asked for suggestions.
The format for the code samples goes like (code chunk —> output/plots —> bullet points explaining the code line-by-line). This creates a bit of a readability issue. The reader will likely follow a pattern like: (Skim past the code chunk to the explanation —> Read first bullet, referencing line X —> Go back to code to find line X, keeping the explanation in mental memory —> Read second bullet point —> ...). In other words, too much switching/scrolling between sections that can be pages apart. Look at the example on pages 185-187 to see what I mean.
I’m not sure what the optimal solution is. Adding comments in the code chunks themselves adds clutter and is probably worse (not to mention creates formatting nightmares). I think my favorite format is two columns, with the code on the left side and the explanations on the right.
Here’s what I have in mind (doesn’t work on mobile): https://allennlp.org/tutorials. Does anyone know of a solution for formatting something like this?
Thank you. Indeed not sure how to optimize it. Perhaps in the next version of the book. Note that the book is to be Springer published (once finished) - this puts some limitations as well.
Note that Julia 1.2[1] is on the verge[2] of being released. Also, it is interesting to see the list[3] of GSoC and JSoC (Julia's own Summer of Code). A lot of projects target the ML/AI applications. Personally, I am waiting for proper GNN support[4] in FluxML, but seems not much interest in it.
Julia looked interesting to me, so I tried 1.0 after it came out. I have a oldish laptop (fine for my needs), and every time I tried to do seemingly anything, it spent ~5 minutes recompiling libraries or something. So I've been waiting newer versions that hopefully stop doing that, or for me to buy a better computer.
After installing the package, the first run has to precompile(?), and subsequent runs use the package cache. But ~25 s to create a simple plot is incredibly slow and frustrating to work with.
$ julia --version
julia version 1.1.1
$ time julia plot.jl
julia plot.jl 73.71s user 4.45s system 110% cpu 1:11.04 total
$ time julia plot.jl
julia plot.jl 24.41s user 0.39s system 100% cpu 24.633 total
$ time julia plot.jl
julia plot.jl 23.38s user 0.36s system 100% cpu 23.519 total
This is a core part of the design. It's part of why Julia is so useful for scientific computing, where one often has a large job that will require a lot of processing time, such that it is worth it to do an intensive JIT cycle every-time. And part of that is the analysis to take python-esque code and turning it into C levels of performance.
my bigger problem is how unstable all of the apis are. every single time i try to follow a guide/tutorial i get compilation errors because packages have shifted.
This is a very good resource. The one thing I would ask is that I would like to see examples of using DifferentialEquations.jl when you get to the section on dynamical systems, especially when doing discrete event simulation and stochastic differential equations. I opened an issue in the repo and we can continue discussing there (I'll help write the code, I want to use this in my own class :P)!
I agree it's a wonderful resource. Which is exactly why I disagree with your suggestion. The book is uncommonly clear in how it explains fundamentals and bringing in such a powerful library ends up moving quite a bit away from that. It will no longer be just about the fundamentals of Julia on one hand and on the other, the algorithms will no longer be implementing language invariant. Losing that invariance IMO makes it less of a text on fundamentals.
Can someone explain how this is more powerful than someone use an Python/R based workflow? E.g., I currently use a combination .ipynb, python scripts, and RStudio and this feels like it covers everything I need for any data science project.
I think Julia has a cleaner focus on scientific and mathematical computing than either R or Python (both for performance and understanding). i.e. the language is designed in such a way that corresponds more directly to mathematical notation and ways of thinking. If you’ve been in a graduate program that’s heavily mathematical, where you spend equal time doing pen and paper proofs and hacking together simulations and such (and frantically trying to learn a language like R/MATLAB/Python while staying afloat in your courses), you’ll appreciate the advantage of this. To my eyes, Python is too verbose and “computer science-y” and R is too quirky to fulfill this niche (I say this as someone that bleeds RStudio blue, and enjoys using Python+SciPy). I don’t think Julia is aimed at garden-variety / enterprise data science workflows. Caveat—I’m not a Julia user currently, so this is sort of a hot take.
The “Ju” in Jupyter is for Julia, so it’s designed to be used as an interactive notebook language also. The Juno IDE is modeled after RStudio.
Fast for-loop, the ability to microoptimize numerical code (skip bounds checking in array access, SIMD optimations), GPU vector computing can use exact same code as CPU due to Julia functions being highly polymorphic. Your research code is your production code.
Also the macro system allows one to define powerful DSLs (see Gen.jl for AI).
In section "1.2 Setup and Interface" there is a very short description of the REPL and how it can be downloaded from julialang.org, as well as a much longer description of JuliaBox and how Jupyter notebooks can be run from juliabox.com for free.
Although JuliaBox has been provided for free by Julia Computing, there has been discussion that this may not be possible in the future. However, Julia Computing does provide a distribution of Julia, the Juno IDE, and supported packages known as JuliaPro for free.
For new users, would the free JuliaPro distribution be a good alternative to JuliaBox and/or downloading the REPL and kernal from julialang.org?
No, I think you should simply download the ordinary version. Jupyter, Juno, etc. are easy enough to install locally. I forget the precise details, but I think JuliaPro comes with certain versions of packages, and it's less confusing just to get the latest of what you need (using the built-in package manager).
JuliaBox (and https://nextjournal.com/) are cloud services, but if you have a real computer and want to do this for more than a few minutes, just install it. (There's also no need for virtualenv etc.)
For people who have more Julia experience -- is this (thinking mainly of chapter 4) representative of how most Julia users do plotting? It looks like a lot of calling out to matplotlib via PyPlot. I know Julia has a ggplot-inspired library called Gadfly.jl, is PyPlot more commonly used?
There is not yet a universally-used package for plotting. One recent tool is Makie.jl [1]. Many use Plots.jl [2] as an interface to PyPlot, GR [3], and other backends. I.e. you can change the backend with a single command.
I bounce back and forth, usually using Gadfly for most plotting but Plots.jl is convenient for some stats plots (see StatsPlots.jl, which extends Plots.jl with nice built in functions for working with stats).
In R, most of the high performance code isn't written in R, it's written in Fortran or C or C++ (R has really good C++ integration via Rcpp). Python has something similar. The value prop of Julia is supposed to be that you have a language flexible enough to do the high-level stuff you'd normally do in R/Python, plus the ability to write high-performance code without having to drop into another language.
I remain skeptical that this solves a lot of real-world problems (I know a lot of users of R/Python who never need to resort to writing their own C/C++ code), but that's the sales pitch.
I was going to ask is there any Kindle version of this, then I skimmed over the book, and I don't think it will be readable on a Kindle. And even if it does, the reading experience will definitely be inferior.
Julia is everything python could have been, and much more. I'm stuck with python right now as a lot of people in the data science/ML community are, but it's becoming increasingly viable to use Julia for "real" work. The Python-Julia interop story is pretty strong as well, which allows you to (somewhat) easily convert pandas/pytorch/sklearn code into Julia using Python wrappers. Julia has some unconventional things in it but they are all growing on me:
1. Indices by default start with 1. This honestly makes a ton of sense and off by one errors are less likely to happen. You have nice symmetry between the length of a collection and the last element, and in general just have to do less "+ 1" or "- 1" things in your code.
2. Native syntax for creation of matrices. Nicer and easier to use than ndarray in Python.
3. Easy one-line mathematical function definitions: f(x) = 2*x. Also being able to omit the multiplication sign (f(x) = 2x) is super nice and makes things more readable.
4. Real and powerful macros ala lisp.
5. Optional static typing. Sometimes when doing data science work static typing can get in your way (more so than for other kinds of programs), but it's useful to use most of the time.
6. A simple and easy to understand polymorphism system. Might not be structured enough for big programs, but more than suitable for Julia's niche.
Really the only thing I don't like about the language is the begin/end block syntax, but I've mentioned that before on HN and don't need to get into it again.
I can't believe I'm jumping into the inevitable 1-based indexing discussion, but I'm surprised to see you say that one-based indexing results in "less "+ 1" or "- 1" things in your code". Most arguments I've seen come out to "it's fine" (certainly) or "it's more comfortable for mathematicians" (which I can't speak to).
Besides Dijkstra's classic paper[1] showing why 0-based indexing is superior, in practice I find myself grateful for 0-based indexing in Python because of how slices and things just work out without needing +1/-1.
I'd like to understand. Could you give an example of when 1-based indexing works out better than 0-based?
One reason I really like the 1 based indexing is that I can have a UInt index and 0 can act as a sentinel value. Really nice for writing things like vector embedded linked lists.
Why? When I see the '.' I immediately know it's a broadcasted function (for example * for matrix multiplication vs *. hadamard product), and I get the vectorized version of any function I write for free with no extra boilerplate (and the compiler will even automatically fuse them together if I chain them to avoid wasting allocations). You can even customize the broadcasting and the fusion.
You can effectively bookmark submissions by using the "favorite" link or just upvoting. The submission will show up in your profile under "favorite submissions" or "upvoted submissions", respectively.
[+] [-] superdimwit|6 years ago|reply
I think the language is really solidly designed, and gives you ridiculously more power AND productivity than python for a whole range of workloads. There are of course issues, but even in the short time I've been following & using the language these are being rapidly addressed. In particular: generally less rich system of libraries (but some Julia libraries are state of the art across all languages, mainly due to easy metaprogramming and multiple dispatch) + generally slow compile times (but this is improving rapidly with caching etc). I would also note that you often don't really need as many "libraries" as you do in python or R, since you can typically just write down the code you want to write, rather than being forced to find a library that wraps a C/C++ implementation like in python/r.
[+] [-] opportune|6 years ago|reply
I don't think this is really a feature. It's nice that you can write more performant code in Julia directly and don't need to wrap lower level languages, without question, but the lack of libraries or library features is not a good thing. It's always better to use a general purpose library that's been battle tested than to write your own numerical mathematics code (because bugs in numerical code can take a long time to get noticed)
For specialized scientific computing applications, which would normally be written in C/C++, I would absolutely look into using Julia instead (though not sure what the openmp/mpi support is like). But I would also recommend against rolling your own numerical software unless you need to
[+] [-] cauthon|6 years ago|reply
[+] [-] jointpdf|6 years ago|reply
The format for the code samples goes like (code chunk —> output/plots —> bullet points explaining the code line-by-line). This creates a bit of a readability issue. The reader will likely follow a pattern like: (Skim past the code chunk to the explanation —> Read first bullet, referencing line X —> Go back to code to find line X, keeping the explanation in mental memory —> Read second bullet point —> ...). In other words, too much switching/scrolling between sections that can be pages apart. Look at the example on pages 185-187 to see what I mean.
I’m not sure what the optimal solution is. Adding comments in the code chunks themselves adds clutter and is probably worse (not to mention creates formatting nightmares). I think my favorite format is two columns, with the code on the left side and the explanations on the right.
Here’s what I have in mind (doesn’t work on mobile): https://allennlp.org/tutorials. Does anyone know of a solution for formatting something like this?
[+] [-] ynazarathy|6 years ago|reply
Happy for more feedback (Yoni Nazarathy).
[+] [-] j88439h84|6 years ago|reply
[+] [-] psychometry|6 years ago|reply
[+] [-] xvilka|6 years ago|reply
[1] https://github.com/JuliaLang/julia/milestone/30
[2] https://discourse.julialang.org/t/julia-v1-2-0-rc2-is-now-av...
[3] https://julialang.org/blog/2019/05/jsoc19
[4] https://github.com/FluxML/Flux.jl/issues/625
[+] [-] caiocaiocaio|6 years ago|reply
[+] [-] anonova|6 years ago|reply
Take, for example, a simple program that creates a line plot (https://docs.juliaplots.org/latest/tutorial/):
After installing the package, the first run has to precompile(?), and subsequent runs use the package cache. But ~25 s to create a simple plot is incredibly slow and frustrating to work with.[+] [-] SolarNet|6 years ago|reply
[+] [-] mlevental|6 years ago|reply
[+] [-] ChrisRackauckas|6 years ago|reply
[+] [-] Cybiote|6 years ago|reply
[+] [-] ynazarathy|6 years ago|reply
[+] [-] adamnemecek|6 years ago|reply
using PyCall
np = pyimport("numpy")
np.fft.fft(rand(ComplexF64, 10))
Thats it. You call it with a julia native array, the result is in a julia native array as well.
Same with cpp
https://github.com/JuliaInterop/Cxx.jl
Or matlab
https://github.com/JuliaInterop/MATLAB.JL
It's legit magic
[+] [-] fny|6 years ago|reply
[+] [-] bdod6|6 years ago|reply
[+] [-] jointpdf|6 years ago|reply
The “Ju” in Jupyter is for Julia, so it’s designed to be used as an interactive notebook language also. The Juno IDE is modeled after RStudio.
[+] [-] snicker7|6 years ago|reply
Also the macro system allows one to define powerful DSLs (see Gen.jl for AI).
[+] [-] aapeli|6 years ago|reply
[+] [-] Merrill|6 years ago|reply
Although JuliaBox has been provided for free by Julia Computing, there has been discussion that this may not be possible in the future. However, Julia Computing does provide a distribution of Julia, the Juno IDE, and supported packages known as JuliaPro for free.
For new users, would the free JuliaPro distribution be a good alternative to JuliaBox and/or downloading the REPL and kernal from julialang.org?
[+] [-] improbable22|6 years ago|reply
JuliaBox (and https://nextjournal.com/) are cloud services, but if you have a real computer and want to do this for more than a few minutes, just install it. (There's also no need for virtualenv etc.)
[+] [-] cwyers|6 years ago|reply
[+] [-] chrispeel|6 years ago|reply
[1] https://github.com/JuliaPlots/Makie.jl
[2] https://github.com/JuliaPlots/Plots.jl
[3] https://github.com/jheinen/GR.jl
[+] [-] StefanKarpinski|6 years ago|reply
[+] [-] thetwentyone|6 years ago|reply
[+] [-] dlphn___xyz|6 years ago|reply
[+] [-] cwyers|6 years ago|reply
I remain skeptical that this solves a lot of real-world problems (I know a lot of users of R/Python who never need to resort to writing their own C/C++ code), but that's the sales pitch.
[+] [-] j88439h84|6 years ago|reply
[+] [-] jbee618|6 years ago|reply
[+] [-] unknown|6 years ago|reply
[deleted]
[+] [-] unknown|6 years ago|reply
[deleted]
[+] [-] chakerb|6 years ago|reply
[+] [-] ynazarathy|6 years ago|reply
Yoni Nazarathy.
[+] [-] mruts|6 years ago|reply
1. Indices by default start with 1. This honestly makes a ton of sense and off by one errors are less likely to happen. You have nice symmetry between the length of a collection and the last element, and in general just have to do less "+ 1" or "- 1" things in your code.
2. Native syntax for creation of matrices. Nicer and easier to use than ndarray in Python.
3. Easy one-line mathematical function definitions: f(x) = 2*x. Also being able to omit the multiplication sign (f(x) = 2x) is super nice and makes things more readable.
4. Real and powerful macros ala lisp.
5. Optional static typing. Sometimes when doing data science work static typing can get in your way (more so than for other kinds of programs), but it's useful to use most of the time.
6. A simple and easy to understand polymorphism system. Might not be structured enough for big programs, but more than suitable for Julia's niche.
Really the only thing I don't like about the language is the begin/end block syntax, but I've mentioned that before on HN and don't need to get into it again.
[+] [-] kbd|6 years ago|reply
Besides Dijkstra's classic paper[1] showing why 0-based indexing is superior, in practice I find myself grateful for 0-based indexing in Python because of how slices and things just work out without needing +1/-1.
I'd like to understand. Could you give an example of when 1-based indexing works out better than 0-based?
[1] http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF
[+] [-] cshenton|6 years ago|reply
[+] [-] kgwgk|6 years ago|reply
The goals of Python were quite different from the goals of Julia.
[+] [-] j88439h84|6 years ago|reply
[+] [-] abakus|6 years ago|reply
[+] [-] ddragon|6 years ago|reply
[+] [-] plouffy|6 years ago|reply
[+] [-] grzm|6 years ago|reply
[+] [-] 6gvONxR4sf7o|6 years ago|reply