Show HN: dstack – an open-source tool to build data applications easily
I am Riwaj, the cofounder of dstack.ai (https://github.com/dstackai).
A few months ago, we built an online service that allows users to publish data visualizations from Python or R. The idea was to build a tool that did not require additional programming or front-end development for publishing data visualizations. Such a code can be invoked from either Jupyter notebook, RMarkdown, Python, or R scripts. Once the data is pushed, it can be accessed via a browser.
Open-sourcing dstack: During our customer discovery phase, we realized that dstack.ai should integrate a lot more open source data science frameworks than we integrated ourselves. For example, as a user, I want to push a matplotlib plot, a Tensorflow model, a plotly chart, a pandas dataframe, and I expect the presentation layer to fully-support it. Supporting all types of artifacts and providing all the tools to work with them solely seems to be a very challenging task. With this, we open-sourced the framework. Now you can build dstack locally, and run it on your servers, or in a cloud of your choice if that’s needed. More details on the project, how to use it, and the source code of the server can be found at the https://github.com/dstackai/dstack repo. The client packages for Python and R are available at the https://github.com/dstackai/dstack-py and https://github.com/dstackai/dstack-r correspondingly.
What’s next: User callbacks- so that application shows not just pre-calculated visualizations but also can fetch data from a store and process it in real-time. ML models- so that data scientists can publish a stack which binds together a pre-calculated ML model and user parameters Use cases- Support specific use cases that help data scientists to build data science models into data applications as fast as possible.
We would be happy to get your feedback on the open-source framework and also get your opinion on what kind of use cases can be built on top of the framework? Thank you.
[+] [-] bicepjai|5 years ago|reply
``` You hereby grant to Company an irreversible, nonexclusive, royalty-free and fully paid, worldwide license to reproduce, distribute, publicly display and perform, prepare derivative works of, incorporate into other works, and otherwise use and exploit your User Content, and to grant sublicenses of the foregoing rights, solely for the purposes of including your User Content in the Site. You hereby irreversibly waive any claims and assertions of moral rights or attribution with respect to your User Content. ```
Are these texts common ?
[+] [-] snowwrestler|5 years ago|reply
The long list makes it seem very broad, but this phrase constrains it quite a bit: "solely for the purposes of including your User Content in the Site." This would prevent them from using your content in an ad, or selling your content to some other company, for instance.
[1] Under U.S. federal law, all content is copyrighted upon creation. I hold the copyright on this comment, and I have granted Ycombinator a license to display it on the HN site.
EDIT - here is the relevant sentence from the HN terms of use agreement. It's actually broader than the language you quoted.
> By uploading any User Content you hereby grant and will grant Y Combinator and its affiliated companies a nonexclusive, worldwide, royalty free, fully paid up, transferable, sublicensable, perpetual, irrevocable license to copy, display, upload, perform, distribute, store, modify and otherwise use your User Content for any Y Combinator-related purpose in any form, medium or technology now known or later developed.
[+] [-] peterschmidt|5 years ago|reply
[+] [-] gitgud|5 years ago|reply
Seems like it could be good for making data-driven dashboard graphs. Although [2] the react library looks like it needs a bit more work.
Congrats on shipping something though!
[1] https://dstack.ai/
[1] https://github.com/dstackai/dstack-react
[+] [-] peterschmidt|5 years ago|reply
EDIT: Speaking of the react library, we've just finished a refactoring and plan to improve it too. Please don't hesitate to share your feedback, over email or via GitHub issues. And thank you!
[+] [-] peterschmidt|5 years ago|reply
[+] [-] Cypher|5 years ago|reply
[+] [-] peterschmidt|5 years ago|reply
1. Here's the most simple tutorial how to make an interactive dashboard and share it: https://docs.dstack.ai/tutorials/dashboards-tutorial It includes screenshots.
2. Here's another tutorial with more realistic data: Output: https://dstack.ai/gallery/d/b56128a3-522e-42d7-8662-9b1a768d... The code for it is available at https://github.com/dstackai/dstack-tutorials-py/blob/master/...
Actually we have very few examples. We gonna make more of them within this week.
[+] [-] peterthehacker|5 years ago|reply
Can you elaborate on “What’s next”?
> User callbacks- so that application shows not just pre-calculated visualizations but also can fetch data from a store and process it in real-time.
How are you envisioning this working? Will dstack be like a database? How will “user callbacks” be triggered?
[+] [-] peterschmidt|5 years ago|reply
[+] [-] helltone|5 years ago|reply
[+] [-] peterschmidt|5 years ago|reply
Basically, we want to make it possible to make data apps as simple as writing a few lines of code using only the libraries that data scientists already know - pandas, Matplotlib, scikit, Tensor, pytourch, etc. Ideally so you don't have to write your application code at all, and rather deploy your data science models and simply bind them with a simple UI logic. We believe the need to apply ML to enterprise use-cases will grow even more and tools like that will be very useful. Basically you'll be able to create an application that help your HR/Sales/Marketing/Product/<you name it> department to apply ML – in minutes, without the need to write this application, deploy or maintain.
[+] [-] lullibrulli2|5 years ago|reply
[+] [-] jordz|5 years ago|reply
[+] [-] peterschmidt|5 years ago|reply
[+] [-] phtevus|5 years ago|reply
[+] [-] peterschmidt|5 years ago|reply
[+] [-] mushufasa|5 years ago|reply
It looks like you are more of a wholistic platform, including a workflow scheduler etc.
[+] [-] peterschmidt|5 years ago|reply
Another thing is that we’d like to eliminate the need to do any programming or HTML/CSS as much as possible.
Th jobs that are available as a part of the hosted solution is not yet part of the open-source library but this is certainly something for us to consider moving under open-source too.
We are currently at quite an early stage and a lot of work is still ahead. We’ll appreciate any feedback and suggestions on where to steer the roadmap.
Gonna work on preparing more use-case specific tutorials within coming weeks.
[+] [-] PaulHoule|5 years ago|reply
For instance, suppose I have a notebook that takes 2 hours to generate a model. From the viewpoint of explaining it I'd like to make a notebook where I start from the beginning, train the model, then use it.
If I want to show it to people I want to save all the results and re-render them, not rerun the calculation, certainly if I want to show off the results in a 1 hour talk!
From the viewpoint of reproducibility, however, you have to be able to run the notebook from top to bottom and get a 'correct' result. I'm not going to say the 'same' result because many calculations are stochastic in nature (e.g. random numbers) or because often the data changes. (Let's say I have somebody make a notebook that does April's sales reports -- shouldn't I just be able to point it to the may data to make May's sales reports?)
Between the long time delays (longer than people can hold a context in their mind, longer than they want to wait) for the system to settle down and the total complexity I find that many people involved with data science violently resist confronting the above issues. The effects are much like the visual "blind spot" -- you might get a series of projects that were 98% completed but didn't quite deliver business value although everybody feels like they did their part.
Like other vendors in this crowded space, dstack leads with technology as the key problematic "e.g. supports Python and R", "matlib, Tensfolow, plotly, ..."
It's certainly true that people don't want to face up to reality in that area. Maybe 50% or 90% of the "waste" in the area involves setting your dependencies up, begging your boss to get you access to "the cloud of your choice if that's what's needed". The trouble with is that investment in particular technologies are of temporary value (maybe people will still be using R in 2030, maybe they won't be using Tensorflow, almost certainly plotly gets bought by Google and shut down by then)
Years back I researched the problem of running Tensorflow models that we got off the pavement, building a database that says TF version X depends on CUDA version Y, CNN version Z, and being able to have multiple copies of the userspace GPU drivers installed simultaneously (e.g. just put 'em in a directory and set the library path to point at 'em -- don't even need containers!)
I could have sworn Google looked at my source because they did the one thing that could have broke that strategy. Also the company I was working for lost interest in that particular shiny thing. That's a basic problem with maintaining a distribution of other people's software -- like treading water it takes effort just to stay in one place.
The more fundamental problems that turn up in going from data to decision and products are eternal and not tied to a particular technology. If you solve those problems rather than chase the shiny you might break out of the pack.
[+] [-] peterschmidt|5 years ago|reply
IMO dstack is a lot about process. Technologies can change. The process often stays. We’d like to find the best way to solve problems people face every day regardless a particular technology.
One more little thing which might be relevant is that dstack actually tracks revisions. What we haven't figured yet out is how to link the particular revision of the applications with the particular revision of the code / notebook.
[+] [-] bmarotta|5 years ago|reply