top | item 41775048

Show HN: Squey, an open-source GPU-accelerated data visualization software

66 points| jbleonesio | 1 year ago |squey.org

While we hope you'll find it quite useful already, there is plenty of room for improvement so we greatly appreciate your feedback!

13 comments

jmakov|1 year ago

10GB, 1TB, 100TB? Memory mapping or does it need to fit into memory (RAM, VRAM?)? Is streaming supported - can I point to a 100TB dataset and cruise through it? 1 parquet file or parquet dataset? What about Delta lake? Are outliers drawn or are you doing some sort of sampling/smoothing? Also would be great to have some comparison to similar tools in this space e.g. https://github.com/finos/perspective and HvPlot+Datashader.

jbleonesio|1 year ago

Data needs to fit in RAM and graphics in VRAM. Let's say 100GB or more if you filter some rows during import. Data is ingested in a in-house database designed to refresh the ever changing selected rows as quickly as possible to conduct a true investigation. You can load as many parquet files as you want in one go provided they have the same structure. Any outlier in any visual representation will be drawn as this is a requirement to detect weak signals and anomalies

Comparisons with the tools you mentioned would indeed be interesting, writing a blog post would be a good idea I guess! I wrote a comparison with ELK here : https://squey.org/domains/cybersecurity/pentesteracademy-mac...

macros|1 year ago

Neat tool.

Couldn't find anything in the docs on mapping file sources to resource needs on the host, how much is too much data to dump into the tool on a single workstation?

jbleonesio|1 year ago

Thanks!

It depends on the number of rows/columns and the types of the values, but the application displays a dialog asking you if you want to stop the import before completion when it feels like resources are being exhausted.

The software was specifically developed to be able to handle as much data as possible while remaining responsive so the workstation resources will likely be the bottleneck here.

On my 32GB development machine, I can easily load tens of millions rows with tens of columns.

JacobiX|1 year ago

Impressive project, judging by commits and features, it's clear that significant effort has been poured into this :) Unfortunately, there's no specific MacOS installation method provided, unsure if buildable from source ?

jbleonesio|1 year ago

Thanks for your feedback. Unfortunately there is currently only a Linux build (which happens to also be running under Windows thanks to WSL2) because there is a lot of dependencies[1] to build. Any help to implement a MacOS build would of course be warmly welcomed :)

In the meantime, you can deploy the software from AWS Marketplace[2] and use it through your web browser but note that this is an on-demand paying product.

[1]: https://gitlab.com/squey/squey/-/tree/main/buildstream/eleme...

[2]: https://aws.amazon.com/marketplace/pp/prodview-l363lrih42bhm

bbor|1 year ago

Very cool, and it’s already on version five! I’m impressed. Only one question for now, since I’m don’t yet have experience with these specific data viz techniques:

Skew-ey? Skoo-ey? Squee?

jbleonesio|1 year ago

Version five indeed because it already has quite a bit of an history as an ex-proprietary product.

We pronounce it "Skwey" (like in "query") but you can really pronounce it as you wish since its not even an existing word x)

jmakov|1 year ago

Would be interestimg to see how this compares to hvplot+datashader

Iwan-Zotow|1 year ago

is it comparable to ParaView?

jbleonesio|1 year ago

While Squey does not claim to be as versatile as Paraview (it is not designed to visualize 3D mesh data for example) it is on the other hand focused on conducting iterative analyses over massive columnar datasets to improve its understanding and find weak signals and anomalies through the use of parallel coordinates, data series and scatter plots.