top | item 10809216

A native Python IDE built for data science

201 points| coris47 | 10 years ago |yhat.com | reply

44 comments

order
[+] SwellJoe|10 years ago|reply
I find the dramatic rise of Python (and open source tools in general) for scientific work interesting and cool. When I first started using Python many years ago, I was doing contract work for the SciPy/NumPy folks (Enthought), and Python was still a blip in the scientific world...Java and Fortran and a bit of C++ ruled the commercial world, with Mathematica and MatLab handling the academic side of things (with some overlap and some outliers).

It's really cool to see. I like seeing science democratized, and Python is definitely a democratizing influence, and the fact that so much of it is open source is really fantastic. I've also noticed that a lot more domain experts are becoming programmer+domain experts through this evolution. It used to be that there were teams with a scientist to design it and one or more programmers to implement it, and that's becoming less of a requirement, which can accelerate the science-ing to a notable degree.

[+] minimaxir|10 years ago|reply
The UI is obviously inspired by Rstudio for R. And I have zero objections to that; this is something that I've wanted for awhile, after having difficulty with PyCharm for my Python-related data projects. I'll play around with it a bit.

As a heads up, the setup workflow assumes you are on OS X, which may be a problem if it asks you to open a Terminal on Windows: http://i.imgur.com/nya50e4.png

[+] ekianjo|10 years ago|reply
I realized the download on Linux is massive, though (600+ Mb) - why is that? R and RStudio combined weigh way less than that.

Plus, for distributing binaries in Linux, instead of a zip file (tar.gz would be more common, too) it's better to support the main distros with a repository (PPA for Ubuntu, pacman for Arch, etc...) since it's way more user friendly every single time you want them to stay up to date.

[+] glamp|10 years ago|reply
hey minimaxir, the commands should still work if you have python and/or conda installed. if you have any issues you can post here: https://github.com/yhat/rodeo/issues.

thanks for trying it out!

[+] IndianAstronaut|10 years ago|reply
Funny enough, I traded in Rstudio for Jupyter notebooks for R, especially for demos to other people since it is much easier to see tables, graphs and such.
[+] ced|10 years ago|reply
In the last year, my workflow for data science/AI has completely shifted to Jupyter notebooks. Is there any IDE that offers a similar experience?
[+] jasongrout|10 years ago|reply
Jupyter dev here. FYI, we're currently working on building a new Jupyter web interface that resembles a more classic IDE experience, which we are calling JupyterLab. A first version is progressively coming together, and is planned to have code editor and terminal components. We also plan to have a notebook component, like the current notebook, in a later version. Our in-progress work is spread across many repos currently (see the various jupyter/jupyter-js-* repos on github).
[+] plusepsilon|10 years ago|reply
There is Beaker notebooks which is similar to Jupyter. Haven't tried it but you can integrate multiple languages in one notebook.

http://beakernotebook.com/

[+] jgamman|10 years ago|reply
honest question: what if your science isn't maths/physics/data? I'm a chemist and from what i can see there's @#$@# all out there in FOSS land.
[+] analog31|10 years ago|reply
Excellent question. Here's my chemistry cred: I'm married to a chemist, related to a couple more, and have worked in an area related to analytical chemistry, though I got my degree in physics, 2+ decades ago.

So here are some generalizations.

While in school, I noticed that the physics students were far more interested than the chemistry students, in math and computer stuff. Maybe we were computer science wannabees, or maybe we guessed (correctly in my case) that proficiency with computers would make us more employable. This was true in both undergrad and grad school.

And there's a long tradition of physicists stealing ideas from math and computation for solving physics problems. When I was in school, computation was considered to be a specialized branch of chemistry, but was at the forefront of physics.

Another difference is that the physics students were generally more interested in making our own tools. The current "maker" and "hacker" trends are old hat for small-lab experimental physicists.

Chemistry has always been a bigger field than physics, which I suspect has attracted more interest in making commercial equipment and software. I've noticed in an industrial setting, that managers are often looking for closed solutions that can't be modified by the user, either for regulatory reasons or adversarial labor-management attitudes. The industry wants your boss to think that letting you make your own tools is either dangerous, or a waste of your time.

In contrast, even in industry, physicists still have to make our own tools. And management already knows that we're freaks. ;-)

So the absence of FOSS tools for chemistry doesn't shock me.

[+] entee|10 years ago|reply
There are some tools out there, for example Open Babel:

http://openbabel.org/wiki/Main_Page

which has some python bindings built in. I set some of this up for myself during my PhD but it was occasionally kind of a pain sometimes to get it to work. Also at the time I was a bit of a noob so there's that :).

It has some nice features for handling chemical structures, I used it mostly for translating one format into another and computing fingerprints, but I think more can be done.

In general I'd agree with @analog31, biology has some good OSS tools, physics has some good OSS tools, but you get to the bridging discipline of chemistry and you find very few. My theory re. organic chemistry and biochemistry applications: it's way more profitable to be closed source. In contrast to the other two fields (gross generalization I know, but somewhat true) there's a very large market for commercial software in Pharma. If someone is willing to pay top dollar, especially an industry that is paranoid about IP and therefore tends to (rightly or wrongly) prefer closed, proprietary solutions, then that's where software will end up.

[+] michaelperalta|10 years ago|reply
I'm curious what advantages are there with this or (PyCharm) over something like Spyder?
[+] plusepsilon|10 years ago|reply
PyCharm is unparalleled in its understanding of code and it's great for building codebases. It is a programmer's tool first and foremost. I find PyCharm's interactive features clunky and have to do extra work to see the data.

RStudio / Rodeo provides an interactive data analysis environment where multiple "views" are presented right in front of the user. A view could be a plot, a data frame or interactions between the code editor and the terminal. As a data analysis person it really helps to put the mental strain of code far away as possible and just explore the data.

Jupyter Notebook are nice but it can get overwhelming (too much scrolling) when things get complicated. Great teaching tool, however.

I think each of these tools have different use cases and it's great that Python is getting more user-friendly with the data science workflow.

[+] snydly|10 years ago|reply
I'd like to know this too... Comparison between PyCharm, Canopy, Spyder, Yhat, etc.

After using it for 10 minutes, it feels identical to RStudio. That's a good thing.

[+] vittore|10 years ago|reply
yeah, wonder what features are not covered by free version of PyCharm ( except of UI obviously copied from R)
[+] ihaveajob|10 years ago|reply
Neat tool, but watching the video, the grammar nazi in me couldn't stop looking at that "palendrome".
[+] _RPM|10 years ago|reply
Just curious, what qualifies it as Native?
[+] bthornbury|10 years ago|reply
I am curious about this as well.

Taking a look at the source (https://github.com/yhat/rodeo) it appears to be in all python.

I was under the (perhaps mistaken) impression that native referred to code which compiled to assembly.

[+] revelation|10 years ago|reply
Let's see, we're running a browser, which runs a JavaScript VM, which runs the our node.js logic, which runs Python, which calls into native numpy. See, native!

I have this visceral reaction when I can tell something is based on Electron or IWebBrowser, 2.0.

[+] drvortex|10 years ago|reply
It doesn't seem to be able to work Python 3.5. It doesn't find the path and now the interface is stuck.
[+] cgm616|10 years ago|reply
I am desperately trying to get this to work with my pyenv-virtualenv anaconda installation, but I can't get it to work out.

I also tried setting the path the ~/.pyenv/shims/python, but that didn't work out either.

[+] ilyaeck|10 years ago|reply
A pros/cons comparison to Jupyter would be helpful.
[+] yeukhon|10 years ago|reply
Jupyter or formerly known as IPython Notebook has a huge UX problem for me. The UI is made to be like notebook (no duh), but for larger codebase you want to have an editor-like UI. Jupyter maybe okay for demo.
[+] mrlinx|10 years ago|reply
Finally, something very useful for anyone into python+data that doesn't like working inside a browser.
[+] balls187|10 years ago|reply
What makes this specific for Data Scientists?

Also curious about the performance of data-frame viewer for large data sets.

[+] joelschw|10 years ago|reply
Why should I use this over Spyder?