Not having the outputs tied into the code is actually preferable if the ultimate goal is reproducible science. Code should be code, documentation should be documentation, and outputs should be outputs. Having multiple copies of important code in non-version controlled files is not a good practice. Having documentation dispersed with questionable organization in unsearchable files is not good a practice. Having outputs without run information and timestamps is not a good practice. Its easy to fall in to those traps with Jupyter notebooks. It might speed up initial set up and experimentation, but I've been working academic labs long enough to see the downstream effects.
majormajor|1 year ago
But since most uses of Jupyter notebooks I've seen don't version control them much at all, it's not as useful in practice often.
spiralk|1 year ago
yunohn|1 year ago
epistasis|1 year ago
What a strange thing to assert, especially as a general overarching truth.
The best reports I have ever seen have matched code and output in the same file. There's never a question of what code generated a plot or a table with a notebook.
With .py files and separate outputs there's far more change for unreproducibke science, it's far messier, and for someone who doesn't appear to respect the organizational capabilities of academic labs, you are condemning them to far more poorly organized outputs.
> Having multiple copies of code
That doesn't have anything to do with notebooks. It's as silly as saying that a Python package is a poor idea because you say somebody repeat code across multiple places.
> non-version controlled files
Notebooks are no less version controllable than .py files.
> outputs with timestamps and run information
Jupyter notebooks are perfect for this, far superior to a directory of cryptically named outputs that need to be strung together in some order
> documentation dispersed with questionable organization
Using separate Python files rather than a notebook means that documentation can never be where it needs to be: next to the output. This is one of the ways that Python files are strictly inferior for generating results.
There are roughly two modes for notebooks: exploration with a REPL, and well-documented reports. The best scientific reports I have ever seen are notebooks (or R Markdown output) that are the full report text plus code plus figures.
spiralk|1 year ago
This is not a great way to make your argument, though you are not the not only one here making a personal judgement without even knowing about my background. These are all issues I have seen first hard. With most academic labs being funding limited, the "organizational capabilities of academic labs" seems irrelevant to me. In our field, no one is getting grants to manage code of any kind .py or .ipynb and I suspect its the same at most university labs. It's effort wasted that ultimately does take time away from the actual research that's fundable and publishable. As someone who has been responsible for wrangling people's notebooks in the past, it's enough of a problem that I would encourage to remove all .ipynb.
> That doesn't have anything to do with notebooks. It's as silly as saying that a Python package is a poor idea because you say somebody repeat code across multiple places.
Human factors make jupyter notebooks lead to the problems I have listed. The issues are most apparent with large groups and over long periods of time. Python and other programming languages already solved most of these problems with git. There isn't a tool that is as elegant and scales from individuals to massive organizations.
> There are roughly two modes for notebooks: exploration with a REPL, and well-documented reports. The best scientific reports I have ever seen are notebooks (or R Markdown output) that are the full report text plus code plus figures.
The REPL functionality is handled by .py cell execution, as I’ve mentioned in other comments. It baffles me how the minimal effort saved by not using separate tools -- one for code, one for documentation -- justifies the issues it introduces.