I dislike how Jupyter notebooks have become normalized. Yes, the interactive execution and visuals are nice for more academic workflows where the priority is quick results over code organization. However, when it comes to sharing code with others for the sake of doing reproducible science, jupyter notebooks cause more trouble than they are worth. Using cell based execution with python is so elegant with '# %%' lines in regular .py files (though it requires using VSCode or fiddling with vim plugins which not all scientists want to do I suppose). No .ipynb is necessary, .py files can be version controlled and shared like normal code while sill retaining the ability to use interactively, cell by cell.Its much easier to organize .py files into a proper python module, and then share and collaborate with others. Instead, groups will collect jumbles of slightly different versions of the same jupyter notebooks that progressively become more complex and less manageable over time. It's not a hypothetical unfortunately, I've seen this happen at major university labs. I'm not blaming anyone because I understand -- the funding is there to do science and not rewrite code to build convenient software libraries. Yet, I can't help but wish jupyter notebooks could be removed from academic workflows.
epistasis|1 year ago
If the code is the end product, sure, use a python package.
But does your .py with `# %%` in it also store the outputs? If not, why even bring this up? A .py output without the plots tied to the code doesn't meet the basic use case.
If the end product is the plot, I want to see how that plot was generated. And a Jupyter notebook is a much much better artifact than a Python package, unless that Python package hard codes the inputs and execution path like a notebook would.
Over the past 20 years of my career I have run into this divergence of use cases a lot. Software engineers seem to not understand the end goals, how it should be performed, and the learnings of the practitioners that have been generating results for a long time. It's hard to protect data scientists from these inflexible software engineers that see "aha that's code, I know this!" without bothering to understand the actual use case at hand.
spiralk|1 year ago
Twirrim|1 year ago
It's great for exploring code and data too, especially situations where I'm really trying to feel my way towards a solution. I get to merrily intermingle rich text narrative and code so I explain how I got to where I got to and can walk people through it (I did that with some experimenting with an SMT solver several months ago, meant that people that had no experience with an SMT solver could understand the model I built).
I'd never use it to share code though. If we get to that stage, it's time to export from jupyter (which it natively supports), and then tidy up the code and productionise it. There's no way jupyter should be the deployed thing.
spiralk|1 year ago
ants_everywhere|1 year ago
Like you I see the appeal, but they're a usability nightmare beyond a few lines. Part of the problem, I think, is that you can't really incrementally improve them. Who wants to refactor a notebook and deal with all the cell dependency breakage?
So they start off okay and then slowly become terrible until they're either irreplaceable or too terrible to work with and a new one is started.
abdullahkhalids|1 year ago
The tool and the tool maker are supposed to serve the user. The user is not supposed to conform to the whims of the tool maker.
ants_everywhere|1 year ago
Probably the solution is that things like Jupyter notebooks and spreadsheets should be views into some better source of truth rather than the source of truth themselves.
[0] https://phys.org/news/2024-08-business-spreadsheets-critical.... I remember a similar figure from studies a decade or so ago.
paddy_m|1 year ago
KolenCh|1 year ago
Also, while many practices out there is questionable, in alternative scenarios where ipynb doesn’t exist, they might have been using something like matlab for example. Eg, in my field (physics), often time there are experimentalists doing some coding. Ipynb can be very enabling for them.
I think a piece of research should be broken down and worked by multiple people to improve the state of the project. Some scientists might be passing you the initial prototype in the form of a notebook, and some others should be refactoring to something more suitable for deployment and archival purpose. Properly funding these roles is important, and is lacking but improving (eg hiring RSE.)
In my field, the most prominent way when ipynb is shared a lot is for training. It’s a great application as that becomes literate programming. In this sense notebook is highly underused as literate programming still hasn’t got mainstream.
spiralk|1 year ago
I think the notebooks are a fine learning tool to introduce people to programming initially, but I'm afraid it doesn't allow for growth beyond a certain level. You have a good point about funding for those software roles. Perhaps this may not be as big of a concern if there were more software talent in these labs to handle the issues that arise.
luplex|1 year ago
dxbydt|1 year ago
Everytime I search my Slack, I have to run two searches because DS can't agree on how to spell the damn thing.
ambicapter|1 year ago
spiralk|1 year ago
There isn't a single install tool that "just works" for this at the moment. If editors came with more robust support for it by default, I think the notebook format wouldn't be needed at that point and people could use regular python and interactive cell based python more interchangeably. I've seen important code get buried under collections of jupyter notebooks across different users so I have a good reason for this. Notebooks simply dont scale beyond a certain complexity.
unknown|1 year ago
[deleted]