top | item 43418697

(no title)

dmadisetti | 11 months ago

I'm from a computational mechanics background, and my biggest issue with jupyter notebooks was doing analysis, playing around with my models, getting results I was happy with; then having my kernel break on a fresh session whenever I wanted to revisit the result.

You could argue this is a user issue, and being sloppy with cell state is a me problem- not a notebook problem- but marimo prevents this in the first place. Just once is already a wasted day's work. Even if one is especially diligent in Jupyter, they do not have the guarantees marimo provides. Unless you are pressing "restart and run all" in a Jupyter kernel every time - you are accumulating state.

I got started with marimo and helped build out the caching because I wanted to open up a notebook from scratch and automatically restart at the point I was at- without having to worry whether my cell order is correct or not. I wanted my code to automatically rerun update outputs if I changed something like a filtering or transform step to my data. I also wanted to go back to previous states easily without having to rerun long computations or deal with data dump versioning.

marimo's caching makes this all automatic.

The plan is to bring caching to an opt-in cell basis so this happens behind the scenes, with some ideas for handling side-effects like randomness and other "non-pure" cases.

But to answer your points, my use case has been specifically scientific computing. Yes for 1) prototyping- but also knowing my prototype does not hold some hidden state or bug. I have shown my advisor results, gone to deploy and been unable to replicate them at scale because of some hidden bug in my notebook execution. To me this is unacceptable. but also 2) documentation, notes, and report sharing. Not saving the outputs directly with the code isn't a blocker here, because caching automatically restores my state, and I can export to a more suitable format like a PDF or HTML file if I need to share it. Saving generated binaries in source for any other setup would be considered poor form- I don't understand why Jupyter gets a pass here. "Write once" is a google doc, notebooks should be "run many times", i.e. reproducible and sharable.

Hopefully, we get to address some of the other pain points with existing infrastructure; like auto-scaling on HPCs (easy for marimo, it's already a DAG); integrating code more directly into papers (we have mkdocs, and experimental quarto extension exists); and built-in environments (no more `pip install -R requirements.txt` on a paper project and hoping the authors remembered to freeze and pinned versions)

---

Hope my tone doesn't come off too contentious- I've just been burned by Jupyter and I'm excited because I feel like marimo addresses its major issues.

discuss

epistasis|11 months ago

Fantastic, thanks for such a detailed response, it is very much appreciated and I learned a lot! (And you don't come off as contentious at all!)

I think your use case makes a ton of sense for Marimo, iterative modeling and mucking around. Most of my modeling has been less like that, and goes straight to a single computation with the result. That is a significant workflow difference.

The jupyter lab web interface has many many UX problems that make it extremely difficult, as does the whole kernel launching infrastructure. Having a desktop app, or another way to launch kernels in a daemon like mode without keeping a terminal open, could save a ton of headaches.

Marimo is great in that it accomodates pure functional programming fantastically for notebooks. Unfortunately too much of the Python API ecosystem does not, but that doesn't mean that it couldn't in the future.