Ask HN: What is the best software to visualize a graph with a billion nodes?
134 points| throwaway425933 | 1 year ago
I want to be able to zoom in and zoom out. Graph has upto 100B nodes and is directed cyclic graph.
134 points| throwaway425933 | 1 year ago
I want to be able to zoom in and zoom out. Graph has upto 100B nodes and is directed cyclic graph.
bane|1 year ago
It used to be thought that visualizing super large graphs would reveal some kind of macro-scale structural insight, but it turns out that the visual structure ends up becoming dominated by the graph layout algorithm and the need to squash often inherently high-dimensional structures into 2 or 3 dimensions. You end up basically seeing patterns in the artifacts of the algorithm instead of any real structure.
There's a similar, but unrelated desire to overlay sequenced transaction data (like transportation logs) on a geographical map as a kind of visualization, which also almost never reveals any interesting insights. The better technique is almost always a different abstraction like a sequence diagram with the lanes being aggregated locations.
There's a bunch of these kinds of pitfalls in visualization that people who work in the space inevitably end up grinding against for a while before realizing it's pointless or there's a better abstraction.
(source: I used to run an infoviz startup for a few years that dealt with this exact topic)
godelski|1 year ago
I want to stress this point and go a bit further. It can be worse as people have pareidolia[0], a tendency to see order in disorder. Like how you see familiar shapes in the clouds. There is a danger in that with large visualizations such as these that instead of conveying useful information, you counterproductively convince someone that something that isn't true is! Here's a relevant 3B1B video where this is kinda discussed. There is real meaning but the point is that it is also easy to be convinced of things that aren't true[1]. In fact, Grant's teaching style is so good you might even convince yourself of the thing he is disproving as he is revealing how we are tricked by the visualization. Remember what the original SO person latched onto.
I think it's important to recognize that visualization is a nontrivial exercise. Grant makes an important point at the end, restating how the visualization was an artifact and how if you dig deep enough into an arbitrary question, you _can_ find value. Because at the end of the day, there are rules to these things. The same is true about graphs. There will always be value in the graph, but the point of graphing is to highlight the concepts that we want to convey. In a way, many people interpret what graphs are doing and why we use them backwards. You don't create visualizations to then draw value from them, but rather your plots are a mathematical analysis that is in a more natural language for humans. This is subtle and might be confusing because people often are able to intuit what kind of graph should be used to convey data but are not thinking about what the process is doing. So what I'm saying is that you don't want to use arbitrary graphs, but there's the right graph for the job. You can find a lot of blogs on graph sins[2] and this point will become clearer.
At the heart, this is not so different than "lies, damned lies, and statistics." People often lie with data without stating anything that is untrue. With graphs, you can lie without stating a word, despite being worth a thousand. So the most important part of being a data scientist is not lying to yourself (which sounds harder than it is).
[0] https://en.wikipedia.org/wiki/Pareidolia
[1] https://www.youtube.com/watch?v=EK32jo7i5LQ
[2] Except this might be hard because if you Google this you'll have a hard time convincing google that you don't mean "sine". So instead search "graph deadly sins", "data visualization sins", "data is ugly", and so on. I'll pass you specifically the blog of "Dr. Moron" (Kennith Moreland) and one discussion of bad plots https://www.drmoron.org/posts/better-plots/ (Ken is a data visualization expert and both his blogs have a lot on vis). There's also vislies: https://www.vislies.org/2021/
(Source: started my PhD in viz and still have close friends in infoviz and sciviz who I get to hear their rants about their research, and occasionally I contribute)
throwaway425933|1 year ago
tobbe2064|1 year ago
InGoldAndGreen|1 year ago
elijahwright|1 year ago
Hairballs are not interesting, but the shapes that show up in a graph once you make a few cuts can be fascinating.
PaulHoule|1 year ago
I think there is a need for a tool that can extract and tell an interesting story based on a subgraph of a huge graph, but that takes thinking unlike hairball plotting, ai image generation and other seductive scourges.
I went to an posthumous art show based on this guy
https://www.amazon.com/Interlock-Conspiracy-Shadow-Worlds-Lo...
where they showed how he drew 40 drafts with pencil of one of his graphs and went from a senseless hairball to something that seems immediately meaningful. Funny that might have something to do with his mysterous death... Maybe a tool that would help you do that is too dangerous for "them" to let you have!
unknown|1 year ago
[deleted]
viraptor|1 year ago
But if you were my coworker I'd really press on why do you want the visualisation and if you can get your answers in some other way. And whether you can create aggregates of your data that reduces it to thousands of groups instead. Your data is a minimum of ~800GB if the graph is a single line (position + 64bit value encoding each edge, no labels), so you're not doing anything real-time with it anyway.
arrowleaf|1 year ago
quietbritishjim|1 year ago
That being the case, I think you're suggesting that this high level summarisation happens as a separate preprocessing step (which I agree with FWIW) whereas I think they're imagining it happening dynamically as part of rendering.
spiderxxxx|1 year ago
slightwinder|1 year ago
Even 8k-screens have not enough pixel to show that many nodes at the same time. So some visual optimization has to happen anyway.
nick238|1 year ago
Comparing it to a rendering engine I think is a bit of a cheat unless the points do have some intrinsic 2-D spatial coordinates (and no edges beyond immediate adjacency). You're ultimately viewing a 2-D surface, your brain can kinda infer some 3-D ideas about it, but if the whole volume is filled with something more complex than fog, it gets tricky. 4-D, forget about it. 100-D as many datasets are? lol.
Having worked in a lab where we often wanted to visualize large graphs without them just devolving into a hairball, you'd need to apply some clustering, but the choice of clustering algorithm is extremely impactful to how the whole graph ends up looking, and in some cases it feels like straight deception.
neomantra|1 year ago
throwaway425933|1 year ago
david_p|1 year ago
Context: I’m the CTO of a GraphViz company, I’ve been doing this for 10+ years.
Here are my recommendations:
- if you can generate a projection of your graph into millions of nodes, you might be able to get somewhere with Three.js, which is a JS library to generate WebGL graphics. The library is close enough to the metal to allow you to build something large and fast.
- if you can get the data below 1M nodes, your best shot is Ogma (spoiler: my company made it). It scales well thanks to WebGL and allows for complex interactions. It can run a graph layout on the GPU in your browser. See https://doc.linkurious.com/ogma/latest/examples/layout-force...
- If you want to keep your billions of nodes but are OK with not seeing the whole graph at once, my company builds Linkurious. It is an advanced exploration interface for a graph stored in Neo4j (or Amazon Neptune). We believe that local exploration up to 10k nodes on screen is enough, as long as you can run graph queries and full-text search queries against the whole graph with little friction. See https://doc.linkurious.com/user-manual/latest/running-querie...
OutThisLife|1 year ago
CuriouslyC|1 year ago
infecto|1 year ago
shoo|1 year ago
is there a way you can subsample or simplify or approximate the graph that'd be good enough?
in some domains, certain problems that are defined on graphs can be simplified by pre-processing the graph, to reduce the problem to a simpler problem. e.g. maybe trees can be contracted to points, or chains can be replaced with a single edge, or so on. these tricks are sometimes necessary to get scalable solution approaches in industrial applications of optimisation / OR methods to solve problems defined on graphs. a solution recovered on the simplified graph can be "trivially" extended back to the full original graph, given enough post-processing logic. if such graph simplifications make sense for your domain, can you preprocess and simplify your input graph until you hit a fixed point, then visualise the simplified result? (maybe it contracts to 1 node!)
heresie-dabord|1 year ago
Just to be clear, the OP already has a graph. There are nodes and relationships. The graph can be queried for understanding.
Rendering the graph is tractable for a small graph or a portion of the graph.
Trying to render all the nodes in an enormous graph is almost always an expensive quixotic adventure.
surrTurr|1 year ago
If you want other resources, I also have a GitHub list of Graph-related libraries (visualizations etc.) on GitHub[3].
[1]: https://js.cytoscape.org/ [2]: https://github.com/anvaka/VivaGraphJS [3]: https://github.com/stars/AlexW00/lists/graph-stuff
acomjean|1 year ago
It does tend to “hairball” (technical term) at about 500+ nodes. That’s not the tools fault, just large graphs tend to be difficult.
It’s just hard to imagine visualizing a million plus nodes without doing some clustering first.
simpaticoder|1 year ago
Note that visualizations are limited by human perception to ~10000 elements, more usefully 1000 elements. You might try a force directed graph, perhaps a hierarchical variant wherein nodes can contain sub-graphs. Unless you have obvious root nodes, this variant would be interesting in that the user could start from an arbitrary set of nodes, giving different insights depending on their starting point.
1 - An excerpt from "Harder Drive", a rather silly implementation of a unix block device using ping latency with any host that will let him. He visualizes the full ipv4 address space in a hilbert curve at this offset: https://youtu.be/JcJSW7Rprio?si=0AlyMgaZjH7dmh5y&t=363
zX41ZdbW|1 year ago
It visualizes the IPv4 space based on reverse DNS responses.
Xcelerate|1 year ago
There's almost never a use case where a customer wants to see a gigantic graph. Or researchers. Or family members for that matter. People's brains just don't seem to mesh with giant graphs. Tiny graphs, sure. Sub-graphs that display relevant information, sure. The whole thing? Nah. Unless it's for an art project, in which case giant graphs can be pretty cool looking.
bunderbunder|1 year ago
In moments like these your job is to not be the monkey's paw. Don't just blithely give them what they asked for. Ask more questions to find out what they're actually trying to accomplish, and help them compose a more specific request that's closer to what they actually want.
IanCal|1 year ago
https://datashader.org/
oersted|1 year ago
seinecle|1 year ago
https://gephi.wordpress.com/2024/06/13/gephi-week-2024-peek-...
_flux|1 year ago
- Deal with 100k node graphs, preferably larger
- Interactive filtering tools, e.g. filtering by node or edge data, transitive closures, highlighting paths matching a condition. Preferably filtering would result in minimally re-layouting the graph.
- Does not need an very sophisticated layout algorithms, if hiding or unranking nodes interactively is easy. E.g. centering on a node could layout other nodes using the selected node as the root.
- Ability to feed live data externally, add/remove nodes and edges programmatically
- Clusters (nodes would tell which clusters they belong in)
I'm actually thinking of writing that tool some day, but it would of course be nicer if it already existed ;). I'm thinking applications like studying TLA+ state traces, visualizing messaging graphs or debug data in real time, visualizing the dynamic state of a network.
Also if you have tips on applicable Rust crates to help creating that, those are appreciated!
mhx1138|1 year ago
seinecle|1 year ago
zdimension|1 year ago
Tested it up to 5M nodes, renders above 60fps on my laptop's iGPU and on my Pixel 7 Pro. Turns out, drawing lots of points using shaders is fast.
Though like everybody else here said you probably don't want to draw that many nodes. Create a lower LoD version of the graph and render it instead
simonsarris|1 year ago
Visualizations are great at helping humans parse data, but usually they work best at human scales. A billion nodes is at best looking at clouds, rather than nodes, which can be represented otherwise.
trueismywork|1 year ago
michaelt|1 year ago
You could copy their design, if you know how you want to project your nodes into 2D. Essentially dividing the visualisation into a very large number of tiles, generated at 18 different zoom levels, then the 'slippy map' viewer loads the tiles corresponding to the chosen field of view.
Then a PostGIS database alongside, letting you run a query to get all the nodes in a given rectangle - such as if you want to find the ID number of a given node.
vbrandl|1 year ago
sebstefan|1 year ago
InGoldAndGreen|1 year ago
I created an HTML page that used vis-network to created a force-directed nodegraph. I'd then just open it up and wait for it to settle.
The initial code is here, you should be able to dump it into an LLM to explain: https://github.com/HebeHH/skyrim-alchemy/blob/master/HTMLGra...
I later used d3 to do pretty much the same thing, but with a much larger graph (still only 100,000 nodes). That was pretty fragile though, so I added an `export to svg` button so you could load the graph, wait for it to settle, and then download the full thing. This kept good quality for zooming in and out.
However my nodegraphs were both incredibly messy, with many many connections going everywhere. That meant that I couldn't find a library that could work out how to lay it out properly first time, and needed the force-directed nature to spread them out. For your case of 1 billion nodes, force-directed may not be the way to go.
jarmitage|1 year ago
https://github.com/uwdata/mosaic
https://idl.uw.edu/mosaic/
williamdclt|1 year ago
If you're fully "zoomed out", is seeing 1B individual nodes the most useful representation? Wouldn't some form of clustering be more useful? Same at intermediate levels.
D3 has all sorts of graphing tooling and is very powerful. It likely wouldn't handle 1B nodes (even if it did, your browser can't) but it has primitives to build graphs
snickerd00dle|1 year ago
throwaway425933|1 year ago
rcarmo|1 year ago
ARothfusz|1 year ago
marcpicaud|1 year ago
viraptor|1 year ago
mro_name|1 year ago
macinjosh|1 year ago
https://github.com/latentcat/graphpu
varjag|1 year ago
https://tulip.labri.fr/site/
egberts1|1 year ago
A blog that covers only failures of large SVG viewers having 10,000+ of nodes.
https://egbert.net/blog/articles/comparison-svg-viewers-larg...
More on https://egbert.net/blog/tags/graphviz.html
throwaway425933|1 year ago
withinboredom|1 year ago
From there I could write better visualizations. I got laid off before the project was completed, though.
unknown|1 year ago
[deleted]
zamalek|1 year ago
FL33TW00D|1 year ago
viraptor|1 year ago
unknown|1 year ago
[deleted]
insomniacity|1 year ago
Thinking specifically about a graph of knowledge, so will be an iterative process.
Just looking for anything more than a text editor really!
oersted|1 year ago
learn_more|1 year ago
throwaway425933|1 year ago
IshKebab|1 year ago
atemerev|1 year ago
Neo4j, cytoscape, etc will not work.
FrustratedMonky|1 year ago
I'm finding even 10's of thousands can be difficult.
Just generally, is there a list of visualization products that is broken down by how many nodes they can handle?
ygra|1 year ago
While you can envision ways of laying out and rendering such large graphs (force-directed layout is frequent, as are hardware-accelerated rendering methods that typically only show nodes with size and color, but little more complex than that), you don't just want to stare at a pretty hairball. Graphs have structure, which the correct layout will emphasize or even make visible. And you want to be able to explore or interact with the data. And there's where this often breaks down.
If you're just interested in part of the data, reduce the graph to that part. Makes layout, rendering, and interaction way easier.
If you have ways of grouping or clustering the data beforehand, reduce the graph to the clusters and then drill down into them.
You might get lucky and your data already has a structure that's well suited for fast layout algorithms and the same structure makes it easy to figure out which part you want to look at more closely. But in my experience that's rare. Most requests for large graphs from customers come from requirements of the software (e.g. “should be able to handle 100k nodes and as many edges at 60 fps with a load time of no more than 2 seconds”) written by someone who pulled more or less reasonable maximum numbers from thin air, or from just looking at the amount of data without really having an idea of how to work with it and just wondering whether all that can somehow be turned into pixels. Dedicating less than a pixel on the screen to each node is very frequently not helpful, even though a visualization product may very well advertise that they can handle it. It may make for pretty pictures, but often not very useful ones.
There are a number of posts on the topic, e.g.
• https://cambridge-intelligence.com/how-to-fix-hairballs/
• https://www.yworks.com/pages/smooth-visualization-of-big-dat...
bee_rider|1 year ago
technologia|1 year ago
bjourne|1 year ago
randometc|1 year ago
unknown|1 year ago
[deleted]
wslh|1 year ago
gkorland|1 year ago
throwaway425933|1 year ago
hhh|1 year ago
Flam|1 year ago
tinsane|1 year ago
bee_rider|1 year ago
TZubiri|1 year ago
potatoicecoffee|1 year ago
bpanon|1 year ago
rockysharma|1 year ago
aaron695|1 year ago
[deleted]
thomassmith65|1 year ago
OhNoNotAgain_99|1 year ago
[deleted]
lori_Mann|1 year ago
[deleted]