Looks like a great project. Contrary to other comments, rendering != visualization. This project seems to have paid attention to lots of the seemingly little but critical details of this type of visualization that are a pain to handle yourself (anti-aliasing of multi-scale data, terrain shading, large- and out-of-core visualization).
Any one of these topics can bring a visualization project to a screeching halt, or make the results look misleading or bad.
Even better that they built a tool that works with existing libraries, rather than replacing them. Good work!
I wish more people were outraged at this kind of election tampering. Great visualizations though! Zoom in on some of those tight masses of black outlines. The shapes are ridiculous. Maryland 3rd? Come on.
I've used datashader for plotting NGS (Next Generation Sequencing) enrichments. At the time I had to hack together the ability to use the polygon select tool on the data, but it worked and blew my mind.
Very elegant solution to a difficult problem (overplotting).
> Turns even the largest data into images, accurately
The first image, the image of the USA, seems really mis-representative to me. LA and NYC should be way way way more bright in relation to everything else than the entire area east of the Mississippi.
At least to my eyes that map makes it look like parts of Denver, Kansas City, Salt Lake City, Atlanta, and the San Joaquin Valley are just as dense as Manhattan.
Atlanta's population density 630 per square mile
Manhattan's population density 70826 per square mile
It seems like an accurate data image would have Atlanta's brightness 1/100th of Manhattan's. Basically it looks like they saturated out at around 250 people do anything over 250 people is the same brightness.
By default, Datashader accurately conveys the shape of the distribution in a way that the human visual system can process. If you want a linear representation, you can do that easily; see the first plot in http://datashader.org/topics/census.html , but you'll quickly see that the resulting plot completely fails to show that there are any patterns anywhere besides the top few population hotspots, which is highly unrepresentative of the actual patterns in this data. There is no saturation here; what it's doing in the homepage image is basically a rank-order encoding, where the top brightness value is indeed shared by several high-population pixes, the next brightness value is shared by the next batch of populations, etc. Given only 256 possible values, there has to be some grouping, but it's not saturating.
Yes, datashader actually gives you the ability to dial-in as much gamma compensation as you want, to account for the human visual system's nonlinear response to luminance.
Looks like a really cool project. One thing that I would be interested in see would be using Datashader as a dynamic visualisation library - for example, generative art projects. Probably not the main interest of data visualisation practitioners but hey, if you've got a sweet pipeline to render all those points, why not?
This actually gave me an interesting idea regarding bitcoin passphrase mnemonics.
Instead of text we could use the same algorithm to generate images.
So you could have an index of images and generate them. I'm actually wondering if you could use nouns and verbs to maybe make stories if you could mutate the nouns reliably.
Like 'bird flying' vs 'bird sleeping' ...
This could help to remember long passphrases visually which people seem to be better at.
Yep -- at https://github.com/graphistry/pygraphistry, we started by making millions of nodes/edges interactive. If you use notebooks, can signup on our site and get going. The trick is we connect GPUs in the browser to GPUs in the cloud, and encapsulate it enough that you can stick to writing standard SQL/pandas/etc.
We've been curious about server-side static tile rendering for larger graphs, but has been on the back-burner. (We already connect to GPUs on the server, so not rocket science.) Currently, we're actively increasing how much can be ingested + computed on, such as for finding influencers, communities & rings, etc. However, visualizing that hasn't been an operational priority for our users. More useful to generate the communities, and then either inspect individual ones, or see how communities stitch together: quickly run out of pixels otherwise due to too many edges. Likewise, we're building connectors to gigascale-petascale graph DBs: titan, janus, aws neptune, tigergraph, spark graphx, etc.
We still are interested, but more for when we start supporting geographic maps: you can see that is the primary use for datashader. Also, because data art is fun :)
My team has had luck rendering an SVG from the graph and sending it to a browser. It works well for about 10k vertices and edges. Above that scale we use datashader, and we're investigating a potential to move to QGIS. We tried Gephi a few years ago and it had trouble at these scales.
You can render networks in datashader, there's a line primitive.
I added edge bundling (probably the slowest thing in datashader!) but I know there's examples of flight path rendering in the video I linked in another comment.
How about instead of starting with an insult (I can't believe you didn't already know this) you instead congratulate them on putting together a full working library, with pretty, easy-to-grasp examples then offer up some research links that they could use to further refine and improve their system. It's our job to teach people, you can't expect everyone to suddenly know everything.
To the Datashader team: I apologize for the above comment. Good job in building and launching a tool for others to use, and great choices for examples!
You're missing the point of this project. It's not about the feasibility of throwing a billion points at a pile of software, to get an image. I can do that with a simple Python script. It's about doing so to create a meaningful and accurate data visualization, and not just a picture of, say, shiny spheres or a scene from Avatar.
I actually have a background in 3D computer graphics, and it's precisely because of my detailed knowledge of raytracing, rasterization, OpenGL, BMRT, photon maps, computational radiometry, BDRFs, computational geometry, and statistical sampling, etc... that when I came to the field of data science & specifically the problem of visualizing large datasets, I realized the total lack of tooling in this space.
The field of information visualization lags behind general "computer-generated imagery" by decades. When I first presented my ideas around Abstract Rendering (which became Datashader) to my DARPA collaborators, even to famous visualization people like Bill Cleveland or Jeff Heer, it was clear that I was thinking about the problem in an entirely different way. I recall our DARPA PM asking Hanspeter Pfister how he would visualize a million points, and he said, "I wouldn't. I'd subsample, or aggregate the data."
Datashader eats a million points for breakfast.
Since you're clearly a computer graphics guy, the way to think about this problem is not one of naive rendering, but rather one of dynamically generating correct primitives & aesthetics at every image scale, so that the viewer has the most accurate understanding of what's actually in the dataset. So it's not just a particle cloud, nor is it nurbs with a normal & texture map; rather, it's a bunch of abstract values from which a data scientist may want to synthesize any combination of geometry and textures.
I chose the name "datashader" for a very specific and intentional reason: we are dynamically invoking a shader - usually a bunch of Python code for mathematical transformation - at every point, within a sampling volume (typically a square, but it doesn't have to be). One can imagine drawing a map of the rivers of the US, with the shading based on some function of all industrial plants in its watershed. Both the domain of integration and the function to evaluate are dynamic for each point in the view frustum.
> thinking up fancy names for reinventing the wheel
They're not claiming to have reinvented the wheel, they're just explaining what it is.
> 'Turning data into images' isn't exactly a new concept.
No, but doing so on large data accurately (the last word is important that you cut off) is not something I know I can easily achieve in a different python library faster. I'd like to know if I could.
Ah, the good old "MapReduce is basically functional programming 101" trope, usually resulting from a fundamental misunderstanding of the problem the framework / tool in question solves.
It doesn't matter if something has been done before if the new way hooks into new systems. Tools do not exist in a vacuum. All tools are apart of a system of tools that should always be considered when evaluating any component piece.
I also was expecting something new, but in their defense they've made a very appealing version of something old. I'm sure there are a lot of people out there who haven't thought about saturation with "large" data sets before.
[+] [-] mhalle|7 years ago|reply
Any one of these topics can bring a visualization project to a screeching halt, or make the results look misleading or bad.
Even better that they built a tool that works with existing libraries, rather than replacing them. Good work!
[+] [-] BubRoss|7 years ago|reply
Webgl will basically do all of that for you, including the out of core if you can stream the data in.
[+] [-] IanCal|7 years ago|reply
Here's a 2016 talk on it: https://www.youtube.com/watch?v=fB3cUrwxMVY
There's likely a lot of improvements since then, but that should help show some of the core parts and explain why it's a useful tool.
[+] [-] 24gttghh|7 years ago|reply
I wish more people were outraged at this kind of election tampering. Great visualizations though! Zoom in on some of those tight masses of black outlines. The shapes are ridiculous. Maryland 3rd? Come on.
[+] [-] unknown|7 years ago|reply
[deleted]
[+] [-] itodd|7 years ago|reply
Very elegant solution to a difficult problem (overplotting).
[+] [-] abcc8|7 years ago|reply
[+] [-] unknown|7 years ago|reply
[deleted]
[+] [-] tokyodude|7 years ago|reply
The first image, the image of the USA, seems really mis-representative to me. LA and NYC should be way way way more bright in relation to everything else than the entire area east of the Mississippi.
At least to my eyes that map makes it look like parts of Denver, Kansas City, Salt Lake City, Atlanta, and the San Joaquin Valley are just as dense as Manhattan.
Atlanta's population density 630 per square mile
Manhattan's population density 70826 per square mile
It seems like an accurate data image would have Atlanta's brightness 1/100th of Manhattan's. Basically it looks like they saturated out at around 250 people do anything over 250 people is the same brightness.
[+] [-] jbednar|7 years ago|reply
[+] [-] pwang|7 years ago|reply
[+] [-] johnmarinelli|7 years ago|reply
[+] [-] jbednar|7 years ago|reply
[+] [-] whoisjuan|7 years ago|reply
[+] [-] tnvaught|7 years ago|reply
[+] [-] burtonator|7 years ago|reply
Instead of text we could use the same algorithm to generate images.
So you could have an index of images and generate them. I'm actually wondering if you could use nouns and verbs to maybe make stories if you could mutate the nouns reliably.
Like 'bird flying' vs 'bird sleeping' ...
This could help to remember long passphrases visually which people seem to be better at.
[+] [-] simplyinfinity|7 years ago|reply
[+] [-] lmeyerov|7 years ago|reply
We've been curious about server-side static tile rendering for larger graphs, but has been on the back-burner. (We already connect to GPUs on the server, so not rocket science.) Currently, we're actively increasing how much can be ingested + computed on, such as for finding influencers, communities & rings, etc. However, visualizing that hasn't been an operational priority for our users. More useful to generate the communities, and then either inspect individual ones, or see how communities stitch together: quickly run out of pixels otherwise due to too many edges. Likewise, we're building connectors to gigascale-petascale graph DBs: titan, janus, aws neptune, tigergraph, spark graphx, etc.
We still are interested, but more for when we start supporting geographic maps: you can see that is the primary use for datashader. Also, because data art is fun :)
[+] [-] jbednar|7 years ago|reply
[+] [-] jointpdf|7 years ago|reply
[1] https://gephi.org/features/
[2] https://www.graphcore.ai/posts/what-does-machine-learning-lo...
[+] [-] maliker|7 years ago|reply
[+] [-] IanCal|7 years ago|reply
I added edge bundling (probably the slowest thing in datashader!) but I know there's examples of flight path rendering in the video I linked in another comment.
[+] [-] BubRoss|7 years ago|reply
[deleted]
[+] [-] jameskilton|7 years ago|reply
How about instead of starting with an insult (I can't believe you didn't already know this) you instead congratulate them on putting together a full working library, with pretty, easy-to-grasp examples then offer up some research links that they could use to further refine and improve their system. It's our job to teach people, you can't expect everyone to suddenly know everything.
To the Datashader team: I apologize for the above comment. Good job in building and launching a tool for others to use, and great choices for examples!
[+] [-] pwang|7 years ago|reply
I actually have a background in 3D computer graphics, and it's precisely because of my detailed knowledge of raytracing, rasterization, OpenGL, BMRT, photon maps, computational radiometry, BDRFs, computational geometry, and statistical sampling, etc... that when I came to the field of data science & specifically the problem of visualizing large datasets, I realized the total lack of tooling in this space.
The field of information visualization lags behind general "computer-generated imagery" by decades. When I first presented my ideas around Abstract Rendering (which became Datashader) to my DARPA collaborators, even to famous visualization people like Bill Cleveland or Jeff Heer, it was clear that I was thinking about the problem in an entirely different way. I recall our DARPA PM asking Hanspeter Pfister how he would visualize a million points, and he said, "I wouldn't. I'd subsample, or aggregate the data."
Datashader eats a million points for breakfast.
Since you're clearly a computer graphics guy, the way to think about this problem is not one of naive rendering, but rather one of dynamically generating correct primitives & aesthetics at every image scale, so that the viewer has the most accurate understanding of what's actually in the dataset. So it's not just a particle cloud, nor is it nurbs with a normal & texture map; rather, it's a bunch of abstract values from which a data scientist may want to synthesize any combination of geometry and textures.
I chose the name "datashader" for a very specific and intentional reason: we are dynamically invoking a shader - usually a bunch of Python code for mathematical transformation - at every point, within a sampling volume (typically a square, but it doesn't have to be). One can imagine drawing a map of the rivers of the US, with the shading based on some function of all industrial plants in its watershed. Both the domain of integration and the function to evaluate are dynamic for each point in the view frustum.
[+] [-] IanCal|7 years ago|reply
They're not claiming to have reinvented the wheel, they're just explaining what it is.
> 'Turning data into images' isn't exactly a new concept.
No, but doing so on large data accurately (the last word is important that you cut off) is not something I know I can easily achieve in a different python library faster. I'd like to know if I could.
[+] [-] candu|7 years ago|reply
[+] [-] douglaswlance|7 years ago|reply
[+] [-] sevensor|7 years ago|reply
[+] [-] Eli_P|7 years ago|reply