top | item 26984681

Introduction to Pluto.jl

161 points| joshday | 4 years ago |juliafordatascience.com | reply

112 comments

order
[+] nerdponx|4 years ago|reply
I really wish the Julia ecosystem would stop assuming that you always interact with your computer through the Julia REPL and started supporting proper command line interfaces. This is one of the big annoyances and mistakes of the R ecosystem, and I think it's unwise to carry that mistake over to Julia.

Also, big "ugh" to browser-based tooling. I want to browse webpages in my browser, I don't want to do my data science work there. We don't even have a good native client for Jupyter notebooks yet, let alone for this new Jupyter alternative that doesn't support the existing Jupyter kernel protocol.

Not only that, but Pluto also apparently has some obnoxious UX limitations that remind me of other less-than-usable wannabe-Jupyter-notebooks (e.g. Apache Zeppelin, Databricks): https://towardsdatascience.com/could-pluto-be-a-real-jupyter...

In short: nice idea, but I'd rather see continued unification around Jupyter and a proper IDE that can at least emit and interact with Jupyter-compatible data.

On the other hand, the Jupyter notebook JSON format is bad for a variety of reasons (e.g. you need special tools for readable Git diffs) and I really wish we had all settled on R Markdown instead. But R has its own NIH tooling problem and nobody was ever going to adopt it because the R community itself (driven by RStudio) has little interest in sharing or interoperability with other languages.

</cynical-angry-rant>

[+] extr|4 years ago|reply
Confession: after doing Data Science work for the past 4 years I STILL don't really understand why people like Jupyter.

R was my first programming language and I got really spoiled with RStudio where everything "just works" and the "highlight code -> run in REPL" workflow is super smooth and tightly integrated. All I want is for that to work in other languages, but it seems like if you want it in Python you need to be running PyCharm or a similarly-heavyweight IDE (seriously, despite all the hype of VSCode there are still a ton of issues with just highlighting code and running it in an IPython terminal) and for Julia it just doesn't exist. If you really want a Jupyter-like workflow you can just use R Notebooks, which are literally just better in every way.

[+] clarkevans|4 years ago|reply
Pluto notebooks are Julia scripts, usable at the command line.

Edit: Pluto uses Julia's package manager; moreover, Manifest.toml can be used to pin all of your project's dependencies so the notebook is repeatable, from a code perspective.

[+] cwyers|4 years ago|reply
How does RStudio have little interest in interoperability with other languages? They produce the reticulate package[1] to allow calling Python code for R, they have added support for Python to RMarkdown and RStudio[2], they let you host Python apps on their RStudio Connect product[3], they sponsor Ursa Labs to work on the Arrow project for easy data interchange[4].

1) https://rstudio.github.io/reticulate/ 2) https://solutions.rstudio.com/python/ 3) https://blog.rstudio.com/2020/12/16/rstudio-connect-1-8-6-py... 4) https://ursalabs.org/

[+] lacker|4 years ago|reply
To me this seems like an improvement in the direction that you want, in particular that notebooks are reactive. All too often I get a Jupyter notebook from someone else and try to run it on my machine only to find that some intermediate step does not work any more, because the original developer ran something out of order or removed a critical step. A reactive notebook seems more likely to still work after a lot of changes are made while experimenting.
[+] JustFinishedBSG|4 years ago|reply
> I really wish the Julia ecosystem would stop assuming that you always interact with your computer through the Julia REPL and started supporting proper command line interfaces.

What does it even mean? What is a CLI interface for a programming language if not a REPL ?

[+] pdeffebach|4 years ago|reply
Plenty of people use the REPL in terminal and sublime text or vim or whatever. I also dislike browser-based tooling and think Julia has done a good job avoiding Rstudio-style dependencies.

But if your point is the inability to do `julia script.jl` , yeah thats a pain point. Fortunately there has been some tooling to make running many jobs in a row easier: https://github.com/dmolina/DaemonMode.jl

[+] kkylin|4 years ago|reply
This preference seems to depend a lot on where you come from. Having come from Scheme / Lisp (same as some of the original Julia developers AFAIK), I find I prefer the REPL in Emacs for coding. I do use Jupyter quite a bit for running simulations, doing data analysis, etc. For me, the main reason to use Jupyter has been (i) interacting with sessions on remote machines without needing to bother with X, and (ii) being able to easily incorporate LaTeX and share whole documents (math + working code) to collaborators and students.
[+] thenoblesunfish|4 years ago|reply
Is Julia different from Python in this regard? I use Python mostly by executing scripts, but it’s nice to have the REPL and IPython and Jupyter. With Julia I’m free to just run “julia script.jl”, aren’t I? There’s probably more to your complaint than I naively realize, though. Maybe Python has better IDE support?
[+] alpaca128|4 years ago|reply
I wish there was a plain text format as base that everyone agreed on no matter what UI or backend is used; that would suddenly make it usable in any text editor and people could build tools and plugins that "just work" no matter whether Jupyter or something else is used.

The closest we got was the org-mode file format with human-readable data for everything, but it seems tightly coupled with Emacs unless you only want to use it as Markdown replacement.

[+] dash2|4 years ago|reply
Big ugh to browser based tooling, and yet also continued unification around Jupyter? Are there any plans to have a non-browser Jupyter?
[+] tambarskjelve|4 years ago|reply
> Also, big "ugh" to browser-based tooling.

Hear hear! A simple web-view inside a native application window is a huge improvement imho. If only JuptyerLab provided a simple interface to access menu elements as well, you could easily have a nearly complete native experience.

[+] maximilianroos|4 years ago|reply
I used Pluto for last year's Advent of Code. It's extremely good for these sorts of problems — rapid iteration with modest computational requirements.

Think of something you might use a spreadsheet for — Pluto has a similar feeling of instant feedback.

---

Some features that are missing:

– Some things are difficult to do with the keyboard; I used my mouse more than with other tools. The author doesn't like modal editing, but ideally they could be implemented with modifier keys (https://github.com/fonsp/Pluto.jl/issues/65)

- It's hard to understand what happens _within_ a cell — logging goes to the terminal rather than the notebook — and there aren't many introspection tools. This is an environment where transparency / introspection would be particularly helpful.

---

Pluto doesn't solve every problem, or completely replace notebooks; to respond to a couple of comments:

> I have many extremely long notebooks that would almost certainly crash if you tried to recompute the whole thing

Right, don't use Pluto for that! It's not one environment to rule them all

> Many of the cells won't work at all because the inputs are long gone

That seems bad! Pluto will help you ensure that doesn't happen.

[+] teruakohatu|4 years ago|reply
I have played around with Pluto.jl, and colleagues of mine use it for research, but I keep going back to Jupyter. I tend to have long running cells that are pulling information from external sources or training models, and triggering one of those cells accidentally will waste a lot of time running something that may not be reliably interrupted.

There is talk about putting in execution barriers that would help with this, at the risk of making Pluto more complicated for users:

https://github.com/fonsp/Pluto.jl/discussions/298

[+] oivey|4 years ago|reply
The fact that Pluto only runs dependent cells on changes mostly solves this for me. For example, a cell can load things into the variable data, and then another cell can apply a function f(data). If I alter f, data is not reloaded and f(data) automatically runs.
[+] dandanua|4 years ago|reply
This can be easily solved. You can bind a variable to a checkbox like this:

   @bind allow_run html"Run cell below <input type=checkbox>"
and wrap your long running cell in the if block:

   if allow_run
      your_code
   end
[+] nerdponx|4 years ago|reply
FWIW I've significantly improved my experience by breaking up my notebooks into smaller pieces such that each notebook only does "one thing", while using DVC to run them and keep track of intermediate results. Or in a case where the intermedaite result was itself somewhat "exploratory", having the notebook itself check for the existence of an intermediate result and load it from disk instead of recomputing it.

Execution barriers are a nice idea though. There is/was a Jupyter notebook extension for "initialization cells", but the whole notebook extension ecosystem seems kind of dead and it's unclear if Jupyter Lab will ever have equivalents.

[+] spinningslate|4 years ago|reply
I'm always impressed by the quality of the Julia ecosystem. It seems to be in that sweet spot with sufficient use & contribution to be viable, but not so popular that quality suffers.
[+] teruakohatu|4 years ago|reply
I love Julia and part of its charm is that everything is relatively new and so quite consistent, also helped by the community ethos and technical features that aid composition.

Python and R (especially R) have plenty of libraries that are high-quality, or even industry standard, but which are decades old and feel it. Python's NLTK is 20 years old for example and it can feel grating switching between NLTK and spaCy. R has three different object systems (four according to some), so you might be using some ancient battle tested library with Hadley Wickham's latest cutting edge libraries.

[+] dandanua|4 years ago|reply
I don't get why people dislike reactivity. This feature alone makes Pluto superior to Jupyter. If you don't want recomputation of some dependent cells there are easy ways to avoid that. But there are no easy ways to add reactivity to Jupyter.

Besides that, Pluto can bind UI elements to your code. You can make simple interactive games that run in Pluto! How it's not awesome?

[+] borodi|4 years ago|reply
For those that are put off byt the "weird" cell execution behavior there is also https://github.com/compleathorseplayer/Neptune.jl A non reactive fork of Pluto that has basically all the benefits of pluto and multi-line cell execution without begin without the reactive behaviour. Also running code blocks with inline results in vscode also has some notebook feel to me.
[+] krastanov|4 years ago|reply
Why would someone use Neptune instead of just using Jupyter? I see how Pluto has a new value proposition that Jupyter lacks (reactivity), but it looks to me like Neptune simply removes that value.
[+] xal|4 years ago|reply
It's funny because this is probably a really non-standard sentiment but I really wish that they would make an electron app out of this. Installing it is reasonably easy but definitely beyond a lot of people who could get value from it.
[+] nerdponx|4 years ago|reply
Normally I dislike Electron apps (with some very well-built exceptions) but in this case it makes perfect sense. It already renders HTML, CSS, and JS!
[+] enriquto|4 years ago|reply
I like the idea of Pluto, because I cannot stand the non-deterministic cells of Jupyter notebooks anymore. Reading this page is like having sex with someone you love. Where has Pluto been all this time? I have finally found all what was missing for a complete life! There's even things that I didn't know I needed because I didn't even have the language to express them! This is my favorite page on the internet and Pluto is my favorite thing ever. I can see no downside to this, no defects, even with a conscious effort to do so.

Yet, trying Pluto, it seems to be outrageously slow and clunky. Is it expected? Sometimes it takes a few seconds to do something. I'm not talking about the initialization (which is still a shame, but that's a different issue). I'm talking about running individual cells with simple code. This is unusable as of today, at least on my 3-year old laptop.

[+] joppy|4 years ago|reply
Pluto is quite fast for me - could you perhaps be hitting the first-run JIT startup time in Julia? Do the cells re-evaluate quickly, after whatever code they depend on has been JITted?
[+] newswasboring|4 years ago|reply
hey, did you use Julia 1.5 or 1.6? There is a massive improvement in latency between those two versions.
[+] mark_l_watson|4 years ago|reply
Have cells reactive immediately to variable changes in other cells is great. I wish Jupiter did that.

I also wish I had an excuse to get more into Julia. I really like Flux.

[+] dunefox|4 years ago|reply
Sometimes the only excuse you need is interest.
[+] shusson|4 years ago|reply
> When you change a variable, that change gets propagated through all cells which reference that variable.

I've always thought this was the most annoying quirk of notebooks in general, so it's nice to see a different take.

[+] _ZeD_|4 years ago|reply
I don't want to be that guy, but it seems to me those tools are converging to ... excel spreadsheets
[+] dagw|4 years ago|reply
An 'Excel' that is less opaque, easier to test and debug and backed by a more sane and powerful language is what a lot of the world is clamoring for. So yea, that would be great.
[+] RocketSyntax|4 years ago|reply
so every time i change a variable i train my neural network? yikes. non-linear <3

front end programmers coding for data science use case <x3

[+] legerdemain|4 years ago|reply
LOL, how often do you want your entire notebook to recompute just because you change something somewhere? Have you never tried pursuing a little side experiment in an existing notebook, or have ten abandoned false starts leading to one good result? I have many extremely long notebooks that would almost certainly crash if you tried to recompute the whole thing, and many of the cells won't work at all because the inputs are long gone. Some of these notebooks are years old. The datasets they have in memory aren't saved anywhere else. What possible motivation do I have to lose all of this precious state?

If I wanted a software-grade, rock-solid data pipeline, I would just copy-paste some code from an existing notebook and run it on Papermill.

[+] lacker|4 years ago|reply
Some of these notebooks are years old. The datasets they have in memory aren't saved anywhere else.

That sounds dangerous to me. If your computer crashes or you introduce a bug to your notebook, you could lose all that data. Personally, I prefer my notebooks to be reproducible at any point.

[+] MisterBiggs|4 years ago|reply
The whole notebook doesn't recompute only cells that are dependent on the cell that changed. This is extremely powerful because you never end up with stale cells that are showing incorrect values.
[+] enriquto|4 years ago|reply
> how often do you want your entire notebook to recompute just because you change something somewhere?

This is exactly what I want, always. In Jupyter I'm continuously doing the restart kernel and re-run all cells dance. It is annoying and I love another system optimized for that like Pluto, without those stupid non-deterministic cells.