top | item 36788070

(no title)

amirathi | 2 years ago

We all know .ipynb JSON format is not a great fit for Git. The Jupyter ecosystem has come a long way in the last few years. Solving this really comes down to a few tools -

- JupyterLab Git Extension[1] for local diffs (pre-commit diffs)

- nbdime[2] / nbdev[3] for resolving .ipynb git merge conflicts

- GitHub PR code reviews with ReviewNB[4]

- Alternatively, if you don't care about cell outputs then Jupytext[5] to sync .ipynb JSON to markdown

Disclaimer: I built ReviewNB. It's a completely bootstrapped business, 5 years in the making and now used by leading DS teams at Meta, AWS, NASA JPL, AirBnB, Lyft, Affirm, AMD, Microsoft & more[6] for Jupyter Notebook code reviews on GitHub / Bitbucket.

[1] https://github.com/jupyterlab/jupyterlab-git

[2] https://nbdime.readthedocs.io

[3] https://nbdev.fast.ai

[4] https://www.reviewnb.com

[5] https://github.com/mwouts/jupytext

[6] https://www.reviewnb.com/#customers

discuss

order

enriquto|2 years ago

> Alternatively, if you don't care about cell outputs then Jupytext[5] to sync .ipynb JSON to markdown

Notice that using markdown is a possibility for jupytext, but not the only one. More interestingly, you can also store your notebooks as plain python files, whose comments are interpreted as the markdown cells of the notebook.

This is very useful, and not only for version control: if your notebooks are python files they can be executed easily in CI or by third parties just by launching the interpreter. No need even of the jupyterlab dependency.

With some care, you can craft a single python file "foo.py" that can be used at the same time as

1. an executable command-line program (that happens to be written in python)

2. an importable python module

3. a jupyter notebook (to open it you need the jupytext extension of jupyter)

4. the documentation with auto-generated figures, convertible to html or to pdf using "jupyter nbconvert --execute"

5. a regular .ipynb if for some reason you want to distribute the outputs in a re-executable format

For small simple projects, to showcase, describe and illustrate an independent algorithm, we have found this structure invaluable.

kzrdude|2 years ago

And VS Code supports the py-percent format as a notebook too (that jupytext can use)

ctannyc|2 years ago

This is a post from my Linkedin page on my hopes for Jupyter notebooks and git. Anyone know of progress along this line?

#Jupyter notebook and git

As much as Jupyter Notebooks have been a great tool for data science, the transition to deployment, and the general software engineering friendliness of Jupyter Notebooks could use some work. From time to time, I have explored how others have dealt with turning notebooks into an organized codebase and outputs. To date, I have not found a comfortable approach for me. The ideal approach for me would be to use something like 'node metadata' in the way of [Leo Editor](https://leo-editor.github.io/leo-editor/) to function as 'decorators' for a notebook cell for integration with git.

By this I mean using something like special markers in Python comments (since much of data science is done with Python) to map the content of a cell (or output) to a git repository. Better yet, define a special cell type for git metadata preceding a code cell. Then implement some basic git operations on the contents of a cell. Let's suppose we use @@git as a marker for metadata in comments for git. --- beginning of cell --- # @@git %upstream%=https://github.com/pyro-ppl/pyro # @@git %local%=~/repo/pyrodev # @@git %branch%=burnburnburn # @@git %file%=examples/cvae/util.py

# Here begins the contents of the util.py file ... --- end of cell ---

An extension would implement items in the menubar for various git operations: stage - stage the content as util.py file checkout - checkout from upstream, replace local copy, and refresh content of cell commit - commit stage file specified by %file% status - ...

Imagined workflow is that once a working idea scattered throughout a notebook has been sketched out, the user would mark the notebook cells that should be mapped to files in a git repository. Also this could be used in a mixed dev/data science environment where library code under development can be pulled right into a notebook.

Yes, there will be problems with committing code with comments that are specific to one user which is why a special cell type makes sense. Yes, there will be problems that I can't even imagine right now but ...

Please message me if you know of a cell-based git extension for Jupyter Notebooks.