top | item 45034372

(no title)

mathieuh | 6 months ago

https://datafusion.apache.org/blog/2024/09/13/string-view-ge...

> The concept of inlined strings with prefixes (called “German Strings” by Andy Pavlo, in homage to TUM, where the Umbra paper that describes them originated) has been used in many recent database systems (Velox, Polars, DuckDB, CedarDB, etc.) and was introduced to Arrow as a new StringViewArray[^3] type. Arrow’s original StringArray is very memory efficient but less effective for certain operations. StringViewArray accelerates string-intensive operations via prefix inlining and a more flexible and compact string representation.

Seems to be nothing more than they were invented at a German university. I spent quite some time thinking it had something to do with German’s sometimes-SOV word order.

discuss

order

andai|6 months ago

Here is the paper in question:

Umbra: A Disk-Based System with In-Memory Performance

https://db.in.tum.de/~freitag/papers/p29-neumann-cidr20.pdf

Section 3.1 covers string handling.

This article (also linked from tfa) explains German strings in more detail.

https://cedardb.com/blog/german_strings

chombier|6 months ago

my tl;dr: after reading the article:

- two 64-bits words representation

- fixed, 32 bits length

- short strings (<12 bytes) are stored in-place

- long strings store a 4 byte prefix in-place + pointer to the rest

- two bits are used as flags in the pointer to further optimize some use-cases

jandrewrogers|6 months ago

This general string format style has been invented many times over the decades. Unfortunately, we seem to need to relearn the tradeoffs each time.

aleph_minus_one|6 months ago

> I spent quite some time thinking it had something to do with German’s sometimes-SOV word order.

If you refer to subclauses in the German language: here the rule is rather "the finite verb is at the end of the subclause".

yorwba|6 months ago

It also applies to infitives and participles and the verb in nominalized noun-verb compounds. So the rule is closer to "the verb is at the end of its grammatical unit, except for the finite verb in a main clause, which appears in second position." https://en.wikipedia.org/wiki/V2_word_order

kaladin-jasnah|6 months ago

I think this is also called V2 word order.