10GB, 1TB, 100TB? Memory mapping or does it need to fit into memory (RAM, VRAM?)? Is streaming supported - can I point to a 100TB dataset and cruise through it? 1 parquet file or parquet dataset? What about Delta lake? Are outliers drawn or are you doing some sort of sampling/smoothing?
Also would be great to have some comparison to similar tools in this space e.g. https://github.com/finos/perspective and HvPlot+Datashader.
Data needs to fit in RAM and graphics in VRAM. Let's say 100GB or more if you filter some rows during import. Data is ingested in a in-house database designed to refresh the ever changing selected rows as quickly as possible to conduct a true investigation. You can load as many parquet files as you want in one go provided they have the same structure. Any outlier in any visual representation will be drawn as this is a requirement to detect weak signals and anomalies
Couldn't find anything in the docs on mapping file sources to resource needs on the host, how much is too much data to dump into the tool on a single workstation?
It depends on the number of rows/columns and the types of the values, but the application displays a dialog asking you if you want to stop the import before completion when it feels like resources are being exhausted.
The software was specifically developed to be able to handle as much data as possible while remaining responsive so the workstation resources will likely be the bottleneck here.
On my 32GB development machine, I can easily load tens of millions rows with tens of columns.
Impressive project, judging by commits and features, it's clear that significant effort has been poured into this :)
Unfortunately, there's no specific MacOS installation method provided, unsure if buildable from source ?
Thanks for your feedback. Unfortunately there is currently only a Linux build (which happens to also be running under Windows thanks to WSL2) because there is a lot of dependencies[1] to build. Any help to implement a MacOS build would of course be warmly welcomed :)
In the meantime, you can deploy the software from AWS Marketplace[2] and use it through your web browser but note that this is an on-demand paying product.
Very cool, and it’s already on version five! I’m impressed. Only one question for now, since I’m don’t yet have experience with these specific data viz techniques:
While Squey does not claim to be as versatile as Paraview (it is not designed to visualize 3D mesh data for example) it is on the other hand focused on conducting iterative analyses over massive columnar datasets to improve its understanding and find weak signals and anomalies through the use of parallel coordinates, data series and scatter plots.
jmakov|1 year ago
jbleonesio|1 year ago
Comparisons with the tools you mentioned would indeed be interesting, writing a blog post would be a good idea I guess! I wrote a comparison with ELK here : https://squey.org/domains/cybersecurity/pentesteracademy-mac...
macros|1 year ago
Couldn't find anything in the docs on mapping file sources to resource needs on the host, how much is too much data to dump into the tool on a single workstation?
jbleonesio|1 year ago
It depends on the number of rows/columns and the types of the values, but the application displays a dialog asking you if you want to stop the import before completion when it feels like resources are being exhausted.
The software was specifically developed to be able to handle as much data as possible while remaining responsive so the workstation resources will likely be the bottleneck here.
On my 32GB development machine, I can easily load tens of millions rows with tens of columns.
JacobiX|1 year ago
jbleonesio|1 year ago
In the meantime, you can deploy the software from AWS Marketplace[2] and use it through your web browser but note that this is an on-demand paying product.
[1]: https://gitlab.com/squey/squey/-/tree/main/buildstream/eleme...
[2]: https://aws.amazon.com/marketplace/pp/prodview-l363lrih42bhm
bbor|1 year ago
Skew-ey? Skoo-ey? Squee?
jbleonesio|1 year ago
We pronounce it "Skwey" (like in "query") but you can really pronounce it as you wish since its not even an existing word x)
jmakov|1 year ago
Iwan-Zotow|1 year ago
jbleonesio|1 year ago