top | item 16840692

Jupyter, Mathematica, and the Future of the Research Paper

321 points| bachmeier | 8 years ago |paulromer.net | reply

173 comments

order
[+] dahart|8 years ago|reply
> The tie-breaker is social, not technical.

The tie-breaker is financial. Jupyter is winning because it's free, not because it's social. It becomes social because of widespread adoption, and it get's widespread adoption because it's free.

I love Jupyter, love love love. But there's a lot of hyperbole and opinion here. Mathematica is just a for-profit business, it's that simple. And it wouldn't be fair to deny the example that Mathematica, Maple & Matlab have set for free solutions like Jupyter.

There's nothing dishonest about a for-profit business. And if Mathematica wants to keep PDF export for themselves, so be it, that's their right. What's dishonest is expecting technical software and service for free and calling people names like vandals if you don't get it. Just celebrate Jupyter and enjoy that people are doing great work you get to use without paying. I don't love Mathematica or it's founder, but there's no real need to impugn Mathematica in order to make this point.

[+] carreau|8 years ago|reply
> it's free

Well for you maybe – and we strongly believe it should be – but it's built on top of thousand of volunteers hours, grants money (Thanks Sloan, Helmsley, Moore) and donation from companies (Anaconda, microsoft...) and individual, and partners. NumFOCUS (https://www.numfocus.org/) manage all of that it's a 501c3 tax deductible ! If Jupyter is of help to you (or your company, organisation) think about contributing back (Code, Dev Time, Design, UX, translation, Legal, ...)

Much Love from the Jupyter Team.

[+] rotorblade|8 years ago|reply
>> The tie-breaker is social, not technical.

> The tie-breaker is financial. Jupyter is winning because it's free [...]

It is a bit more nuanced than that. Personally I do not pay from Mathematica usage, so why do I like Jupyter more?

The Mathematica notebook interface is horrible. You may go "oh, neat" the first few times you try it then, at least I, get more and more frustrated on all the idiotic issues

* Indentation/text-wrapping. Write a long line that starts wrapping, it gives it a little indentation to signify this, an your next line that you have indented is slightly more indented, but it is really hard to see, so you have no idea of where your line-breaks are.

* Brackets. "[" are used for function calls and for part-specification. The number of square brackets in your expression makes it necessary hard to read when it is big enough.

* Jumping text. The notebook interface does not have the "auto-complete brackets" (maybe v11 does), so you add your first, all the text in the Cell gets reformatted and you have to find the fucking place you wanted end the bracket. This is akin to working with images in MS Word.

* Exporting. The notation is just ugly, fine, that is personal, but "Sin[]" as "sin()"... ok. Ah, good, it has a "Copy as Latex", nice... "Sin[]" -> "\text{Sin}[]". Really? Who in the world uses "\text{Sin}" for the sine-function and square brackets for function-calls when typesetting maths in Latex?

It is just a complete nightmare to try and incorporate these things into your workflow, at least for me.

Jupyter just behaves as you'd expect. Just that it is so much smoother to work with wins. For me, I do symbolic calculations, SymPy can do some things much easier than Mathematica, but a lot of things it can't or you have to work some more to get going. That Jupyter allows you not to have an aneurysm every day at work, which makes you actually wanna spend the extra time working it out.

[+] gaius|8 years ago|reply
The tie-breaker is financial. Jupyter is winning because it's free

This is it really

Back in the 90s I was using a program called MathCAD, it provided a “notebook” interface by running as a plugin to Word 6. In terms of general usability and experience, 20-years-ago blows away modern-day Jupyter and it’s silly “cells” interface, which it does not because it’s better but because it’s trying to force itself into a web browser. I haven’t used MathCAD since but I bet in 2018 it’s amazing.

I think few people who have used the commercial tools think Jupyter is better. But the commercial tools are soooooooo expensive...

[+] klmr|8 years ago|reply
> The tie-breaker is financial.

It can be both financial and social: individual institutes are all too happy to pay for Mathematica licenses, and consequently institute members can use it to produce reproducible research with it. However, the reproducibility of the resulting notebooks is drastically hindered by the fact that a reader essentially also needs to pay for Mathematica to get the full benefit out of these notebooks (even if they are readable without Mathematica). As a consequence, few people bother using it even though they can afford to. Social drivers disincentivise its usage.

By a similar dynamic, Git beat out the competing DVCS: in this case mostly technical rather than financial factors that drove individual actors to prefer Git over alternatives (due, to a large part, to GitHub). But many people don’t actually care about technical considerations (or even prefer other systems over Git in this regard). What people most care about is seamless integration. In the end, a social driver caused Git’s adoption.

[+] blablabla123|8 years ago|reply
Also not to forget that Jupyter is just different to Mathematica. With Mathematica you can do symbolic computations, yes also Statistics and Machine Learning, but also Group Theory and what not. Jupyter does a great job as an interface for certain Statistics and Machine Learning tasks, also I'm quite sure that it needs less resources but that's all.

That said, I'm still missing a free but powerful tool for symbolic computations like Mathematica or Maple.

> The tie-breaker is financial.

Exactly, it cannot be emphasized enough. Of course as a student you get these powertools for a small price or even for free. But if you are not in University, those tools are super expensive. For a reason but there is still a need for far more open source in this area.

EDIT: I'm just realizing there is Sympy, niiceee...

[+] stjohnswarts|8 years ago|reply
Exactly, you can be a programmer and be a capitalist too. May the best paradigm or hybrid of the various paradigms win.
[+] FractalLP|8 years ago|reply
Long time user of Python here and recent user of Mathematica.

Some observations I have are that they're both great. Python is a nice open source scripting language, but getting libraries to work can sometimes be a pain. Mathematica is basically install this and everything is included. The Mathematica documentation is amazing and it is really simple how to do most things. The whole iPhone "there is an app for that" is equivalent to "there is a function for that".

Graph Theory works flawlessly in Mathematica. In Python, there is a module to Graphviz. Let me know if Python has something new though. There are a lot of other examples. Mathematica's Import[] function can read over 150 different file types including: CSV,.XLS, genetic encoding files, optimization files....whatever. It is usually far easier and more consistent than finding a corresponding Python library and struggling with the install and minimal documentation. Let me be clear that Python is awesome and rocks and i think Jupyter is moving it in the right direction. I just feel like many dismiss Mathematica as something that does Calculus homework rather than what it is today which is a massive 20 million LOC conglomeration of C & Java & Wolfram language that does everything from Statistics, Machine Learning, Visualization, BlockChain, 3D printing, NodeGraphs, data sets and analysis...etc in a single consistent package. It is expensive and proprietary and certainly has its own faults, but a lot of that cash is funneled back into a great product.

[+] askvictor|8 years ago|reply
While I totally hear you regarding the pain of python modules (particularly on Windows), the point of python 'distributions' like anaconda and canopy is to bring the kitchen sink along, kind of like mathematica.

The problem with Mathematica from a science point of view is that, being closed source, means you can't independently ensure the calculations are happening correctly. To be replicable, science involving data needs to use open source tools.

[+] cs702|8 years ago|reply
This is spot-on:

"Membership in an open source community is like membership in the community of science. There is a straightforward process for finding a true answer to any question. People disagree in public conversations. They must explain clearly and listen to those who response with equal clarity. Members of the community pay more attention to those who have been right in the past, and to those who enhance their reputation for integrity by admitting in public when they are wrong. They shun those who mislead. There is no court of final appeal. The only recourse is to the facts.

It’s a messy process but it works, the only one in all of human history that ever has. No other has ever achieved consensus at scale without recourse to coercion.

In science, anyone can experiment. In open source, anyone can access the facts of the code. Linus Torvalds may supervise a hierarchy that decides what goes into the Linux kernel, but anyone can see what’s there. Because the communities of science and open source accept facts as the ultimate source of truth and use the same public system for resolving disagreements about the facts, they foster the same norms of trust grounded in individual integrity."

The entire blog post is worth a read.

[+] gaius|8 years ago|reply
Membership in an open source community is like membership in the community of science. There is a straightforward process for finding a true answer to any question

Oh please. Dare to ask what is the best of anything and prepare for an epic flame war.

[+] yaroslavvb|8 years ago|reply
I've been using Mathematica since 1995 and Jupyter/colab for 5+ years. Most recently I've been using them both in parallel. While Jupyter is probably the future in terms of mass adoption, there are still some areas where Jupyter is lagging.

1. Mathematica has an easy way of sharing notebook. I just run "deploy" command which turns notebook into publicly accessible webpage, hosted by wolfram, here's an example -- https://www.wolframcloud.com/objects/user-eac9ee2d-7714-42da...

2. Mathematica has more active community. Mathematica-specific questions are likely to be answered within an hour by experts on https://mathematica.stackexchange.com/

3. Mathematica has better tools for simple interactivity. I like to throw in "Manipulate" for a simple graph with a draggable constant, or go to http://demonstrations.wolfram.com/index.php for an idea for more complicated demonstration to use in a presentation

4. Mathematica has more options for advanced visualization, and interfaces are more uniform since graph drawing, 3D drawing, and other kinds of visualizations are developed within a single system. Some examples https://www.wolfram.com/language/11/new-visualization-domain...

[+] carreau|8 years ago|reply
Thanks for your feedback, Mathematica has indeed millions of $ to provide more features and advertise them, and Jupyter have only a few full time devs that probably do not advertise enough its features:

1) Binder makes that a git push away https://mybinder.org/ Want to check the discovery of gravitational waves ? Go ahead ! https://github.com/minrk/ligo-binder You know the nice thing ? it does not require you to opt-in, as long as a repo is public you an run it. So you don't need have to deploy, or know it exists. we are _already_ doing that for you.

2) Jupyter is "Just" the frontend. StackOverflow have matpltlolib, numpy, sympy, .. tags. We don't the subdomain (yet), and I actually prefer to have tags to have better searching :-)

3) Sure it's called ipywidgets (https://ipywidgets.readthedocs.io/en/latest/), that's the tech. From ipywidgets import interact, and @interact as decorator on your function... that's it.

4) For convenience Library that use ipywidgets for 3D see https://ipyvolume.readthedocs.io/en/latest/animation.html (Hey it also support VR !) See https://www.youtube.com/watch?v=nZ3HQpSXn2U that will blow your mind.

We'll try to be better at advertising our features !

[+] askvictor|8 years ago|reply
With regards to sharing, both Google and Microsoft have free hosting for shareable jupyter notebooks. Probably not quite as easy to get them from your computer to the cloud as a deploy command, but it probably wouldn't be hard to create a module that does exactly that (if one doesn't already exist)
[+] iguy|8 years ago|reply
Right, Jupyter is nice to have, but this is really over the top nonsense:

Jupyter encourages individual integrity; Mathematica lets individuals hide behind corporate evasion

I have no idea what he's talking about re PDF export either. I print to PDF all the time, to email people a static document to look at, etc. It works just fine. (Whether you can you make book-quality formatted text easily, I've no idea, never been tempted to try.)

[+] rexpress|8 years ago|reply
I suspect that the OP used the "Print..." command in the File menu, and selected PDF as the printer option. ISTR that this can sometimes result in poor quality results as presumably it is relying on an external PDF engine to render the notebook.

Whenever I've used the "Save As..." command, choosing PDF as the target, I've also only had good quality output.

[+] mistermann|8 years ago|reply
The writing style is very reminiscent of Ayn Rand.
[+] promer|8 years ago|reply
iguy, agreed. You have have no idea because you haven't tried.
[+] sago|8 years ago|reply
Jupyter is an amazing and useful piece of software. I agree that its openness is important, that its flexibility in producing content is excellent, and that it deserves to be the current hotness. But I'm afraid

> Now, Jupyter is the unambiguous technical leader.

is pure fantasy, imho. SymPy is still two decades behind Mathematica in large swathes of symbolic computation.

It may be that, for the things that the author wanted to do, the Python libraries were a good fit (it seems he was working on NLP), but overall, I just don't see it.

[+] askvictor|8 years ago|reply
I'm surprised there's no mention or discussion of the importance of open-source tooling for replicable science. Without seeing and reviewing the source, how can you tell that a particular calculation is right? Also, relying on costly tools such as Mathematica cuts off a sizable amount of the population from being able to replicate or play with your findings on cost grounds alone.
[+] falkod|8 years ago|reply
Long-term Mathematica user (physicist) here: I don't think the use of open source software would make most science -- maybe that does not apply to cs/datascience -- more replicable. Usually that takes an expert in the field. And usually these experts are employed at universities where Mathematica licenses are not the prime cost factor. That said, I am all for open source software. Although I would argue that probably trustable scientific results do not rely on the inner workings of e.g. Mathematica anyway, but use Mathematica as vehicle for say linear algebra or symbolic manipulation etc. While the inner workings of Mathematica may not be open source, in principle the relevant algorithms are not propriertary but usually well-known mathematical results and as such at least in principle easily reproducible outside of the ecosystem.
[+] hpcjoe|8 years ago|reply
Responders have been making the argument that Jupyter is beating Mathematica because of financial or social issues. I'd like to posit a different interpretation, which could be construed to encapsulate these reasons, as well as additional other factors.

Jupyter has a lower friction to adoption and usage than Mathematica, for a definition of friction which encompasses ease of acquisition and sharing. I include economic considerations in the ease of acquisition and sharing. Lack of proprietary walled garden lock-in/lock-out factors in as well.

People are also likely considering the longer term scenario, whereby data, model, and information interchange has been hindered by proprietary formats (the "wall" in the walled garden) and lack of complete information on how to get information in and out. Which is what the OP was complaining about, as they were not able to easily construct a publication quality preprint/submission from one, but could do it easily from the other.

Some of these sources of friction are effectively "own-goals", that is, you increase friction in such a way as to prevent something that people need to do, to be effectively impossible. Or you hide it. Or disable certain groups from using that functionality.

Then the question is balancing the longevity of the format, the proprietary value against alternatives. Increasingly, people are less interested in this friction for a number of critical systems.

I am looking at this from the perspective of someone who has a few 10's of MB of data/writings on 25-30 year old 3.5 inch and 5.25 inch floppies. These are in formats for which I may not have an ability to extract the data/information without some significant effort.

The formats that have survived well for me over the last 30 years have been either open, or readable/writeable with open tools. The closed ones, not so much luck with.

[+] limeblack|8 years ago|reply
FYI there is a not as complete open source implementation of Mathematica called mathics[1][2]. In fact it is also Python based just like Jupyter(I don't think this is a coincidence).

[1]: http://mathics.org

[2]: http://mathics.net

[+] unknown|8 years ago|reply

[deleted]

[+] lopmotr|8 years ago|reply
Open source is only great when it exists. For finite element analysis, there are only two generally useful open source products and neither of them has a remotely modern or easy UI. For $10,000 or so, you can get a proprietary one that's fast to use and doesn't have you hitting a brick wall when you find there's some key feature it can't do.

UI is a major failure of open source - it can hardly ever achieve it, at least not well. Most of the popular open source programs have no UI at all.

[+] forapurpose|8 years ago|reply
> Python libraries let me replicate everything I wanted to do with Mathematica: Matplotlib for graphics, SymPy for symbolic math, NumPy and SciPy for numerical calculations

Are the Python libraries precise enough for professional mathematicians? And do they deal with mathematical 'edge cases', a variety of inputs (formats, notations, etc.), etc.?

On one hand, I could say 'the author uses them therefore they must be sufficiant'. On the other, I've seen plenty of cases where the professionals were not careful about the tools they use (e.g, spreadsheets running critical, large-scale financial operations).

[+] williamstein|8 years ago|reply
SageMath (which I started in 2004) is in fact a Python library targeted at professional mathematicians (mainly research in pure math), and is much stronger than Mathematica in many areas, including number theory, algebraic combinatorics and algebraic dynamics. It is weaker than Mathematica in symbolic calculus.
[+] bloaf|8 years ago|reply
People tend to underestimate the extent of Mathematica's libraries. Take for example process control [1]. Not only does Mathematica have a pretty thorough set of functions for solving process control problems, it works with both linear and non-linear systems, and can find symbolic solutions. When I search for python equivalents, I find abandoned or incomplete-looking projects that are much more limited in scope (e.g. linear systems only) and trying to just provide some of Matlab's functionality (i.e. no symbolic analysis)

[1] https://reference.wolfram.com/language/guide/ControlSystems....

[+] mlevental|8 years ago|reply
in particular I'm curious if sympy is really as good as Mathematica. I haven't used Mathematica since doing physics hw as an undergrad but it's symbolic manipulation was amazing most of the time
[+] arca_vorago|8 years ago|reply
I have chosen the emacs org mode system over Jupyter, but I still like Jupyter regardless. The real tragedy is how dependent people have become on proprietary stacks like Mathematica.
[+] mfe5003|8 years ago|reply
I learned and was fluent with Mathematica early and learned python later. I still run to Mathematica for doing symbolic analysis because there is basically no impedement between my ideas and the keyboard when I am solving that type of problem. I've moved all my numerical analysis to the scipy system since it is a more natural language for those types of problems.
[+] ChrisRackauckas|8 years ago|reply
>Which reminds me. If you are a Julia enthusiast, how do you suppose the investors in this new language plan to make their big score?

This is a weird jab at Julia. Open source software is woefully underfunded. Julia Computing was founded in the wake of Heartbleed where people learned that open source needs some kind of funding to keep developers alive (example article: https://arstechnica.com/information-technology/2014/04/tech-... ). Coming from academic backgrounds, the core contributors really had two options if they wanted to devote full time to Julia: either everyone gets an academic job while working on Julia instead of papers (lol), or band together to get R&D funding and use that to fund a life of open source development. They did the latter.

It's quite silly to even imply there's something nefarious that can go on here. Their main product is the language. They can't sabotage that without sabotaging themselves. They may have some priorities swayed, just like how any other individual who's working on open source is doing it for their own reasons. For example, IBM funded them to add PowerPC support, and what do you know Julia works on PowerPC. Is that so awful? With this funding model, what ends up happening is you have a large group of people who dedicate their lives to developing open sourced code for automatic GPU compilation, machine learning libraries, etc. along with compiler support for optimizing scientific computing. Because of this (and other reasons), Julia ends up having a much stronger governance which is one reason why its development ends up being more active. And this activity in turn makes its project more democratic than projects like CPython or Jupyter which have been larger projects for a longer time, but with less contributors (Julia's 686 vs CPython's 524 vs Jupyter's 330).

And most of the Julia contributors aren't even part of that company! Many are academics. A lot of the funding is through NumFOCUS, a non-profit which also helps projects like Jupyter, matplotlib, etc. which the author is for! (And they are great projects as well!)

So while I am happy that the author is pro open source, I think it's necessary to point out that this open source outsider view is both wrong and dangerous. Saying that you love the purity and despise anyone who gets to make a living from it is harmful! Open source is a labor of love, but it has also destroyed many careers. I think society has this view that open source (mathematical) projects are "funded" by academic careers, but even creators of popular projects like SageMath have publicly noted that open source is harmful to academic success (https://escapethetower.wordpress.com/2016/06/13/creator-of-s...).

Instead of being against funding open source contributors, I would like to see the author promote funding for open source. Paul Romer is a leading economist. He has the power to proclaim that open source matters for academic careers and push for it to be put on equal footing with papers for grant applications in his field. People like him should be advocating for jobs dedicated to open source development, not scoffing at the supposed impurity of someone being paid to develop a public good. Someone at the top of the academic hierarchy should start a change and make the development of public tools as valued as the development of (non-public) publications.

[+] jhbadger|8 years ago|reply
Also the idea of companies making money off of open source isn't new or nefarious and often helps people who aren't even their customer. Red Hat is the classic example, and many organizations use CentOS, a distribution based off of Red Hat's distro without paying Red Hat anything. And in scientific computing there's RStudio, which makes a great open source IDE for R besides offering prducts and services for sale.
[+] jonnycomputer|8 years ago|reply
R Studio Notebooks are pretty good too; I like that, by default, there is an interactive console connected to the same kernel in addition to the notebook. This allows me to use the console to interactively probe my data, or try out something, and then record a more finished product in the notebook itself. I think this can be done in Juypter (http://jupyter-notebook.readthedocs.io/en/latest/examples/No...), but, not out of the box.
[+] wodenokoto|8 years ago|reply
Since the author is talking about using Jupyter for research papers, how do you do basic things, like bibliographies, naming tables and referencing them later?

I have seen table of content, but those have been generated by a big block of javascript.

[+] EGreg|8 years ago|reply
It always goes like this.

The initial solutions may be proprietary, and financed by investors. They have a business model so of course they don’t give everything away for free.

With time, enough people get together to build an open source alternative. And then like a snowball it eclipses everything proprietary that went before it.

What would a world look like that didn’t apply Capitalism to ideas?

One where companies couldn’t sue one another for Intellectual Property infringements. Like Waymo suing Uber.

One where self driving cars can incorporate improvements made by any other self driving car instead of putting people at risk reinventing the wheel.

Where the long tail of drug research leads to something.

Why would people release their findings? Because if they don’t, others will. And then they won’t get that small measure of input and control and attach their name to it. Jonas Salk is an exception in the biomedical field. Albert Einstein is the norm in physics.

The alternatives to Capitalism do not have to be Socialism. They can be SCIENCE. OPEN SOURCE. WIKI.

Collaboration instead of Competition.

I would like to see the same in web browser engines. WebKit instead of IE. And so on. When that happens, we all win.

Yes free software is good. Software, like knowledge, does not have to be scarce.

[+] TeMPOraL|8 years ago|reply
> It always goes like this.

> The initial solutions may be proprietary, and financed by investors. They have a business model so of course they don’t give everything away for free.

> With time, enough people get together to build an open source alternative. And then like a snowball it eclipses everything proprietary that went before it.

I wish it did. The most obvious counterexamples that come to mind are Microsoft Office and Adobe Photoshop. With maybe the recent exception of Krita for the latter case, there are no known open-source alternatives that don't suck hard compared to the propertiary applications I mentioned.

I'm not really sure why that happens, but open source doesn't seem to scale well when building large end-user applications.

[+] zeth___|8 years ago|reply
Jupyter isn't a foil for mathematica. It has completely different use cases.

The straight one to one fight is between sagemath, or their rebranded cocal site, and mathematica. In terms of ability to do things sagemath is the glue between all the amazing open source math and science software that has been written over the last 50 years.

https://cocalc.com/

But in terms of presentation by far the best is org-mode with sage, julia and everything else tied in. One text file that can be emailed around, put under version control, and can speak two dozen computer languages with babel, and has latex support for both pdf and html output.

[+] williamstein|8 years ago|reply
Thanks for mentioning SageMath and CoCalc (I founded both of these projects)! A minor clarification is that CoCalc is not a rebranding of SageMath, but is instead a new web application whose goal is to make it very easy to collaboratively use Sage, Jupyter, LaTeX, Julia, etc. In constract, Sage is a more traditional open source software package, which people install on their own computers. The goal of Sage is to be a viable open source alternative to the core Mathematica computer algebra system (and also to Magma, etc.), whereas the goal of CoCalc is to make all technical open source software very, very easily accessible, mainly to students.