top | item 41527056

(no title)

Epicism | 1 year ago

Super interesting! I’m curious how this differs from InfluxDB’s German strings implementation https://www.influxdata.com/blog/faster-queries-with-stringvi...

discuss

order

aduffy|1 year ago

German strings are cool, and we're also using them in Vortex! They're also commonly referred to as "variable-length view arrays", which is what Arrow calls them [1]. They were first published by folks at TUM as part of the Umbra database (checkout Figure 4) [2].

German-style strings/views are not a compression algorithm, they're just a way for storing string data and making it quick to compare them in-memory. You can in fact store views, while storing the corresponding full-length strings in compressed format with FSST. We don't currently do that but we're working on it.

[1] https://arrow.apache.org/docs/format/Columnar.html#variable-...

[2] https://db.in.tum.de/~freitag/papers/p29-neumann-cidr20.pdf

Epicism|1 year ago

Thanks for the reply!