top | item 42541287

(no title)

kardos | 1 year ago

So why are they 3.27x slower to insert? Are they 3.27x longer in string form?

discuss

It's likely a function of the fact that `gen_random_uuid()` is implemented in C [0], and is essentially just reading from `/dev/urandom`, then modifying the variant and version bits. Whereas, assuming they're using something like what was described here [1], that's a lot of function calls within Postgres, which slows it down.

As an example, this small function that makes UUIDv4:

    postgres=# CREATE OR REPLACE FUNCTION custom_uuid_v4() RETURNS uuid AS $$
        SELECT encode(set_byte(set_byte(gen_random_bytes(16), 6, (get_byte(gen_random_bytes(1), 0) & 15) | 64), 8, (get_byte(gen_random_bytes(1), 0) & 63) | 128), 'hex')::uuid;
    $$ LANGUAGE sql;

Took 14.5 seconds to create / insert 1,000,000 rows into a temp table, compared to 7.1 seconds for `gen_random_uuid()`.

[0]: https://doxygen.postgresql.org/uuid_8c.html#a6296fbc32909d10...

[1]: https://blog.daveallie.com/ulid-primary-keys/

atombender|1 year ago

I don't think that's right. They show in the section titled "Generating" that the performance of calling the ULID function from SQL is only very slightly slower. It's the INSERT that performs worse.

Generally, inserting sorted values (like sequential integers or in this case, ULIDs) into a B-tree index is much faster than inserting random values. This is because inserted values go into the same, highly packed B-tree nodes, whereas random inserts will need to create a lot of scattered B-tree nodes, resulting in more pages written. Random values are generally faster to query, but slower to insert.

In this case I think the insert speed differences may come down to the sizes of the keys. Postgres's native UUID type is 128 bits, or 16 bytes, whereas the ULID is stored as the "text" type, encoded as base32, resulting in a string that is 26 bytes, plus a 32-bit string length header, so 240 bits in total, or 1.87x longer. In the benchmark, the ULID insert is about 3x that of the UUID. So the overhead may be not just the extra space but the overhead of string comparisons compared to just comparing 128-bit ints.

Edit: The article doesn't actually say which ULID implementation they use. The one implemented in PL/PGSQL mentioned in one of the article's links [1] is very slow. The other [2] is quite fast, but doesn't use base32. However, this [3] native C extension is fast, about 15% faster than the UUID function on my machine.

On my machine, using pg-ulid, inserting 1M rows was on average 1.2x faster for UUID than ULID (mean: 963ms vs 1131ms). This is probably all I/O, and reflects the fact that the ULIDs are longer. Raw output here: https://gist.github.com/atombender/7adccb17a95056313d0e8ff56....

Edit 2: They don't have an index on the column in the article, so my comment about B-tree performance doesn't apply here.

[1] https://blog.lawrencejones.dev/ulid

[2] https://blog.daveallie.com/ulid-primary-keys/

[3] https://github.com/andrielfn/pg-ulid