jbednar's comments

jbednar | 6 years ago | on: Voilà turns Jupyter notebooks to standalone web applications

Panel and Voila attacked the same problem (moving easily between Jupyter and standalone server contexts) from completely opposite directions. This difference has some implications on their design and function.

Voila is based on ipywidgets running in Jupyter notebooks, and to make a standalone dashboard they had to create a standalone server that can securely execute Jupyter cells and display the results without allowing arbitrary code execution. The server is thus a work in progress, while the Jupyter integration was already solid.

Panel is based on Bokeh models, and because Bokeh models already had a full-blown standalone server, the task for Panel was to make Bokeh models (a) work seamlessly in Jupyter (previously they were awkward and limited in that context), (b) support other plotting libraries (by wrapping everything as a Bokeh model), and (c) have an API that's easier to use than native Bokeh for easy prototyping and design. Solid, secure server support came for free.

Once both libraries support each others models (soon!) and get a bit more polished, then off the top of my head, the main differences will be:

- Panel can use a Jupyter notebook, but it works equally well with a plain Python file; the notebook is just a source of Python code for it. Panel can be used fully even without Jupyter installed. Voila is closely tied to the Jupyter cell-based execution model, which is good or bad depending on your point of view. - Panel allows you to construct a "server view" of your notebook that can be completely independent of what is shown in the notebook, even though it is specified inside the notebook. I use that capability to have the same notebook go step by step analyzing a given dataset in detail, and then separately designate what should be shown in the server context, which is very handy; the boss sees one view, I work on another, and it all stays in sync. Voila works with notebook cell outputs only, and so I don't think it's possible to have fully different views of your data in the two contexts. - Panel supports building complex GUIs, with hierarchies of nested objects that each define their own editable parameters, without having to tie any of that code to Jupyter, Bokeh, or any other GUI or plotting system. This approach is really important for building large, complex codebases (e.g. simulators or data-analysis systems) that sometimes are used in dashboards, sometimes in notebooks, sometimes in batch runs, and sometimes on e.g. large remote computing systems.

I'm sure there are lots more differences, but that's enough for now!

jbednar | 6 years ago | on: Voilà turns Jupyter notebooks to standalone web applications

Panel can use a Bokeh server but does not require it; it is equally happy communicating over Bokeh Server's or Jupyter's communication channels. Panel doesn't currently support using ipywidgets, nor does Voila currently support Bokeh plots or widgets, but the maintainers of both Panel and Voila have recently worked out mechanisms for using Panel or Bokeh objects in ipywidgets or using ipywidgets in Panels, which should be ready soon. I'm not sure of the details of how one would use Voila with other languages, but Panel can already show anything that has an interface to Python, e.g. an R ggplot visualization.

jbednar | 7 years ago | on: Datashader: turns even the largest data into images, accurately

I think this was already said above, but it still seems to be getting confused, so to repeat: Datashader renders everything out of core, in the server. So it doesn't matter whether a client could successfully accumulate results for a large data incrementally; to use WebGL directly one still has to send all of the data to the client eventually. With Datashader the dataset is never sent to the client in the first place; it stays on the server, which could be a remote HPC system with thousands of cores processing petabytes. Datashader renders the data into an image-shaped array on the server, then sends that (much smaller) array to the client, so that the client never sees any data larger than the available screen resolution. This is no claim that doing so is unprecedented or some crazy new idea, just that Datashader lets you render datasets regardless of their size, completely independently of any client (browser) limitations, and without having to serialize the data over an internet connection.

jbednar | 7 years ago | on: Datashader: turns even the largest data into images, accurately

Sigh. Datashader is not a paper, it's an actual usable piece of software, so it should be compared to other tools and libraries for rendering data. Unlike nearly ever other 2D plotting library available for Python, it can operate in core or out of core, so it's entirely appropriate to advertise that fact (why hide it?). Unlike OpenGL's point drawing functions and nearly every other 2D plotting library available for Python, it avoids overplotting and z-ordering issues that make visualizations misleading (so why hide that?). Unlike NumPy's histogram2D, it allows you to define what it means to aggregate the contents of each bin (mean, min, std, etc.), to focus on different aspects of your data. It's a mystery to me why you think Datashader should somehow fail to advertise what it's useful for!

jbednar | 7 years ago | on: Datashader: turns even the largest data into images, accurately

I'm not sure if you're objecting to the name "Datashader", but surely every library needs a name, and this one is accurate in that it allows the sort of shading that one does for 3D rendering to be applied to 2D data plotting. Or are there other buzzwords used in the docs you find objectionable?

jbednar | 7 years ago | on: Datashader: turns even the largest data into images, accurately

By default, Datashader accurately conveys the shape of the distribution in a way that the human visual system can process. If you want a linear representation, you can do that easily; see the first plot in http://datashader.org/topics/census.html , but you'll quickly see that the resulting plot completely fails to show that there are any patterns anywhere besides the top few population hotspots, which is highly unrepresentative of the actual patterns in this data. There is no saturation here; what it's doing in the homepage image is basically a rank-order encoding, where the top brightness value is indeed shared by several high-population pixes, the next brightness value is shared by the next batch of populations, etc. Given only 256 possible values, there has to be some grouping, but it's not saturating.

jbednar | 7 years ago | on: Datashader: turns even the largest data into images, accurately

Datashader is server-side rendering, and thus not in any way comparable to WebGL in its usage. With Datashader, only the final rendered/rasterized image-like object is sent to the client, which lets it handle arbitrarily large datasets (anything your remote servers can process). With WebGL the dataset is sent to the browser for rendering, which has some advantages but is a very different process than what Datashader does.

jbednar | 7 years ago | on: Datashader: turns even the largest data into images, accurately

If you want one word, Datashader is a rasterizer. It takes data of many types (points, lines, grids, meshes) and creates a regular grid where each grid cell's value is a well defined function of the incoming data. Not sure anyone would be any happier with "rasterizer" than "renderer" or "shader" or any other single word...

jbednar | 7 years ago | on: Datashader: turns even the largest data into images, accurately

Datashader's approach is a bit different from an accumulation buffer, though similar in principle. It's not 3D rendering, and has no need for a z ordering; instead it's essentially 2D histogramming. For points, it simply takes each point, calculates which pixel it would land in, and aggregates per pixel, without ever storing all the data points per pixel. The key benefit over something like SciPy's histogram2d functions are in how it is implemented and used -- highly optimized, and highly integrated with viz tools so that it can let you just interact with your data naturally as if you had infinite resolution. Try it and see!

jbednar | 7 years ago | on: Python Data Visualization 2018: Why So Many Libraries?

Of course! HoloViews does allow infinite customizability, to pull out more and more subtle features, show things more clearly, and just to make it look nice or to match your favorite style. But unlike ggplot2, HoloViews does that in a way that can apply to the _data_, rather than having to recapitulate the process every single time you build an individual plot. That way you and your colleagues can together build up whatever style you find most effective, then keep working with it across the full multidimensional landscape of data that you work with in a particular field. HoloViews is a completely different approach, if you really let the ideas sink in (e.g. from our paper about it at http://conference.scipy.org/proceedings/scipy2015/pdfs/jean-...), and is in no way a second-class citizen compared to ggplot2 or any other approach in R...

jbednar | 7 years ago | on: Python Data Visualization 2018: Why So Many Libraries?

Personally, I don't _want_ a grammar of graphics; I want a grammar of data, where the data happens to have a graphical representation. I don't want to spend ages piecing together a fancy plot; I want to spend just a little time annotating my data to declare what it means, and then no matter how I slice and dice my data it will show up in a meaningful way. That way I can explore it to really understand it, which is the point of HoloViews (http://holoviews.org). But people approach plotting in lots of different ways, and some people actually _do_ want to spend their time making plots, so they are welcome to their ggplot2!

jbednar | 7 years ago | on: Python Data Visualization 2018: Why So Many Libraries?

There are links to each library included, which will let you look at galleries of examples that are much more helpful than any single image would be. But if you want to collect images from each library, feel free to post that in the comments!
page 1