Launch HN: Deepnote (YC S19) – A better data science notebook
I'm Jakub and I'm the founder of Deepnote (https://deepnote.com/). We're building a better data science notebook.
As an engineer, I spent most of my time working on developer tools, building IDEs, and studying human-computer interaction. I helped build a couple of startups, I built tools for JavaScript development, and worked on Firefox DevTools. But once I started to work with data scientists, all those code editors and IDEs that I knew as a software engineer suddenly stopped being the right tool for the job. Notebooks were.
Notebooks as we know them today have many pain points (versioning, reproducibility, collaboration). They don't work well with other tools. They don't exactly encourage best practices. But none of these are fundamental flaws of the notebook paradigm. They are signs of a new computational medium. Much like spreadsheets in the 1980s.
Two years ago, my co-founders and I started to think about a better data science notebook. Deepnote is built on top of the Jupyter ecosystem. We are using the same format, and we intend to remain fully compatible in both directions. But to solve the above problems, we've introduced significant changes.
First, we made collaboration a first-class citizen. To allow for this, Deepnote runs in the cloud by default. Every Deepnote notebook is easily shareable (like Google Docs) and easy to understand even by non-technical users.
Second, we completely redesigned the interface to encourage best practices, write clean code, define dependencies, and create reproducible notebooks. We also built a really good autocomplete system, and added a variable explorer.
Third, we made Deepnote easy to integrate with other services. We didn't want to build another data science platform where people work with an iframed notebook. We want to build an amazing notebook that plays well with other services, databases, ML platforms, and the Jupyter ecosystem.
Check out a 2-min demo here: https://www.loom.com/share/b7e05ecca78047c2a2f687d77be8ecea
Building a new computational medium is hard. It takes time. Today, we're launching a public beta of Deepnote. Not everything works yet. Some pieces are missing. But we also have a lot in store, including versioning, code reviews, visualizations. We still have a lot to learn too, so I'd love to hear your thoughts and feedback.
[+] [-] setgree|5 years ago|reply
My main question is how/if DeepNote addresses issues of reproducibility. Is this a priority for your team? You mention it a few times in your post here, but there is not so much in the docs -- I looked it up in and got just to this:
> Even though the Custom environment cache is implemented using Docker images, it doesn't primarily serve the reproducibility problem. The aim of the feature is to significantly speed up the start time of your projects. In other words, you should consider it to be only a cache at this point.
My experience with Notebooks suggests that the main (computational) reproducibility challenges were
A) 'hidden state' information (e.g. cells executed out of order, variables changed and then reverted but not re-run); and
B) no clear infrastructure for documenting/caching dependencies (I see you have a terminal option, and the web-based access should address some of this, but something like 'conda install environment.yml` doesn't seem possible out of the box.)
I would understand if these issues are not priorities for you, I don't think most data science projects need to be run in the far future and most teams can informally sync their dependencies.
If reproducibility is a core priority, do you plan to write something about DN serves that purpose? I'd be glad to take a close look if you do (I have written/worked a fair bit on this in the past).
[+] [-] Equiet|5 years ago|reply
Regarding other issues: We currently record every execution in project history. That means even if you run cells out of order, you can still get a list of commands that shows how you got to the current state.
The next step for us is to start subtly notifying users when they are doing something that could be an issue later down the road (for example executing cells out of order). We already built this, but decided not to ship it yet because it needed more love. The second thing we are working on is interactive/reacting execution. This is very very very cool and brings the experience from the notebook to the next level (at least for me), but needs much more testing.
Reproducibility vs flexibility (in the sense of letting the user do whatever they want if they know what they're doing) is a difficult problem. In the end, it's going to be a combination of friendly nudges and much better experience if users are following the "reproducible" path. However, we never want to limit users in what they are able to do.
I spent a lot of time thinking about this and would be happy to chat about what you're thinking. Feel free to email me at [email protected].
[+] [-] smacke|5 years ago|reply
https://nbsafety.org
[+] [-] marapuru|5 years ago|reply
One thing that annoyed me a bit is that I could only register with github or google. Why can't I just create an account directly with your service?
[+] [-] Equiet|5 years ago|reply
[+] [-] marapuru|5 years ago|reply
The tutorial is nice, I like how it guides me through the tool. But I struggled finding the publish button. As it was under the Share text. It would be quick win to make it more of a CTA (make it blue or something like that). Look at Figma for an example.
[+] [-] abalaji|5 years ago|reply
Overall this seems pretty cool! The realtime editing seems to be killer, google collab is close but not as good from my initial testing. Some of the python package integrations may be able to be replicated with open source tools (e.g. table visualization and https://github.com/quantopian/qgrid)
My big question comes down to vendor lock in. What's the vision here for compatibility with the Jupyter eco-system in the long haul? (e.g. do we see Deepnote features contributed back to Jupyter)
[+] [-] Equiet|5 years ago|reply
Regarding the lock-in, it's in our best interest to remain fully compatible. So yes, there'll always be a way how to export your project and run it in plain Jupyter. The hope is the more advanced features (comments, output visualizations, different cell types) will appear in Jupyter over time as well, but it's also up to Jupyter whether they want those features.
[+] [-] the21st|5 years ago|reply
[+] [-] ZephyrBlu|5 years ago|reply
[+] [-] lhnz|5 years ago|reply
Do you hire remote engineers? I'm London based.
[+] [-] amirathi|5 years ago|reply
Absolutely. We're solving a small part of this by making notebooks play nicely with GitHub (https://reviewnb.com). Code reviews & collaboration for Jupyter Notebooks, essentially.
Happy to see more products taking a stab at this problem. I'd be curious to know how you implement version control (git or something else) & what kind of experiences does that translate to for the user. Congrats on the launch!
[+] [-] Equiet|5 years ago|reply
But I'd like to improve on this experience. There are many ways how to do it (great job btw), but we want to explore how a versioning system native to notebooks would look like. We're still iterating on that.
[+] [-] dfsegoat|5 years ago|reply
I'd be curious to see a detailed feature comparison between this and Google Colab / Colab pro [1,2]? I think others might find this useful as well.
1 - https://colab.research.google.com/notebooks/intro.ipynb#rece...
2 - https://colab.research.google.com/signup
[+] [-] Equiet|5 years ago|reply
[+] [-] WClayFerguson|5 years ago|reply
https://quanta.wiki
A "collaborative notebook" would be one very good way to describe what Quanta is as well. I'm the developer of it, by the way.
[+] [-] tpetry|5 years ago|reply
[+] [-] Equiet|5 years ago|reply
[+] [-] taigi100|5 years ago|reply
I fully recommend you try it - it's awesome.
All that's left which I want are dark mode mainly and maybe a cheaper alternative to more powerful GPU / something along those line. Tho, with the long-running tasks I don't really mind.
Great job and congrats on launching!
[+] [-] ianbicking|5 years ago|reply
I no longer know what the best implementations of web-based error handling are, but this 15 year old(!) approach still seems to beat the state of the art in notebooks: https://github.com/cdent/paste/blob/master/paste/exceptions/...
You have a rich interface, showing textual tracebacks is unnecessary!
I'd do inspectable values as well (not just relying on __repr__, but making any top-level object interactively inspectable), but that's more involved. But probably worth it!
[+] [-] Reebz|5 years ago|reply
[+] [-] Equiet|5 years ago|reply
We built Deepnote so that the work you do as a data scientist can be shared with both engineers and non-technical folks. We're not really an mlops platform. We make a really good notebook that integrates with other platforms.
[+] [-] beisner|5 years ago|reply
I could envision hooking up the outputs of multiple executions of the same (or different) notebooks to these visualizations.
You can kinda get something like this with matplotlib or plotly but it has always felt kinda missing something.
[+] [-] Equiet|5 years ago|reply
Speaking on behalf of Deepnote, there's no such an API yet, but it's something that I'd definitely like to see and build.
[+] [-] threatofrain|5 years ago|reply
https://twitter.com/dkvasnickajr/status/1321901316411711490?...
[+] [-] rsweeney21|5 years ago|reply
When I see a launch HN from over a YC batch over a year ago I assume there was a pivot or they had trouble getting traction. Doesn't seem like that happened in your cases and it might not happen in most cases, which is why I'm asking.
Either way, looks like an awesome product. I sent it over to the Data Science team at my company and they were pretty impressed.
[+] [-] Equiet|5 years ago|reply
Interestingly, ever since we started almost 2 years ago we've been pretty laser focused and there were minimal changes to the vision overall. But we also knew what we were going into and that it'd take time.
[+] [-] chrisaycock|5 years ago|reply
[+] [-] Equiet|5 years ago|reply
[+] [-] joebo|5 years ago|reply
[+] [-] epiteton|5 years ago|reply
[+] [-] woko|5 years ago|reply
I use GPU on Colab, so I will stick to Colab for now, but I think I will hop on Deepnote from time to time.
[+] [-] bravura|5 years ago|reply
[+] [-] adlha|5 years ago|reply
With Deepnote, our focus is improving usability of notebooks as a medium for both data scientists and non-technical users. We want to build a really good notebook experience that plays well with the rest of your stack and helps data science teams work better with the rest of the organization.
[+] [-] kndjckt|5 years ago|reply
[+] [-] adlha|5 years ago|reply