> Then, there’s a random byte like 0x25 or 0x07 followed by the column data – the rest of the columns are string types so they’re all stored in UTF-8. If you know what these inter-column bytes mean, leave a comment below! I can’t figure it out.
Next paragraph mentions TOAST and this byte is related to that. The low order bits (on little endian platforms) determine whether the value is stored inline (00, first 4 bytes are total length), is stored in TOAST table (11) or is shorter than 127 bytes (01 for even length, 10 for odd length, the total length is first byte >> 1). So for 0x25 you get 01, so length is 0x25 >> 1 = 18, which is that byte followed by "Equatorial Guinea".
Edit: the reason why endianness matters is that the same representation is also used in memory and the whole first word is interpreted as one length value. The toast tag bits have to be in first byte, which is most easily done as two highest order bits of that word on big endian. That means that it is placed in the two highest bits of the byte.
I remember us once giving a supplier access to our internal bug tracker for a collaborative project. They were unable to get to the “…/openissue” endpoint.
I once worked for a company that blocked Cuban sites because of .cu (which is the Portuguese word for the end of your digestive system), but did not block porn sites (or so I was told ;-).
If anyone is interested in contrasting this with InnoDB (MySQL’s default engine), Jeremy Cole has an outstanding blog series [0] going into incredible detail.
> The Arrow columnar format includes a language-agnostic in-memory data structure specification, metadata serialization, and a protocol for serialization and generic data transport.This document is intended to provide adequate detail to create a new implementation of the columnar format without the aid of an existing implementation. We utilize Google’s Flatbuffers project for metadata serialization, so it will be necessary to refer to the project’s Flatbuffers protocol definition files while reading this document.The columnar format has some key features:
> Data adjacency for sequential access (scans)
> O(1) (constant-time) random access
> SIMD and vectorization-friendly
> Relocatable without “pointer swizzling”, allowing for true zero-copy access in shared memory
Are the major SQL file formats already SIMD optimized and zero-copy across TCP/IP?
Arrow doesn't do full or partial indexes.
Apache Arrow supports Feather and Parquet on-disk file formats. Feather is on-disk Arrow IPC, now with default LZ4 compression or optionally ZSTD.
Some databases support Parquet as the database flat file format (that a DBMS process like PostgreSQL or MySQL provides a logged, permissioned, and cached query interface with query planning to).
IIUC with Parquet it's possible both to use normal tools to offline query data tables as files on disk and also to online query tables with a persistent process with tunable parameters and optionally also centrally enforce schema and referential integrity.
> Parquet format is designed for long-term storage, where Arrow is more intended for short term or ephemeral storage
> Parquet is more expensive to write than Feather as it features more layers of encoding and compression. Feather is unmodified raw columnar Arrow memory. We will probably add simple compression to Feather in the future.
> Due to dictionary encoding, RLE encoding, and data page compression, Parquet files will often be much smaller than Feather files
> Parquet is a standard storage format for analytics that's supported by many different systems: Spark, Hive, Impala, various AWS services, in future by BigQuery, etc. So if you are doing analytics, Parquet is a good option as a reference storage format for query by multiple systems
Those systems index Parquet. Can they also index Feather IPC, which an application might already have to journal and/or log, and checkpoint?
Edit: What are some of the DLT solutions for indexing given a consensus-controlled message spec designed for synchronization?
> This process of retrieving the expected database state from the WAL is called logical decoding and Postgres stores files related to this process in here.
While logical decoding is about WAL, it is not related to the recovery process. Logical decoding is a mechanism to convert the WAL entries back into the high-level operations that caused the WAL entries, for example for replication or audit.
"in case of disk loss, we wrote the things we were going to write to disk, to the WAL. which is also on disk"
I never entirely got it. Either your WAL is on more reliable media, or duplicated. If its just "easier" to write the WAL and faster to read off properly indexed state, ok, thats a local optimisation.
If your WAL is on the same filesystem behind a vendor specific RAID controller, you're still stuffed, if that RAID card dies.
Just curious if anyone else encountered this same error from the initial "docker run" command:
docker: Error response from daemon: create ./pg-data: "./pg-data" includes invalid characters for a local volume name, only "[a-zA-Z0-9][a-zA-Z0-9_.-]" are allowed. If you intended to pass a host directory, use absolute path.
It's probably complaining about the relative path, try replacing `-v ./pg-data:/var/lib/postgresql/data` with `-v "$PWD/pg-data:/var/lib/postgresql/data"`
When I started my dev career, NoSQL was the rage and I remember reading about BigTable, Cassandra, Dynamo, and most importantly LSMs. They made a big deal about how the data on stored on disk was sorted. I never knew why this was a big deal but always kept it in mind, but I never bothered to understand how it was done previously.
>Something really important about tables which isn’t obvious at first is that, even though they might have sequential primary keys, tables are not ordered.
I'mma pop up again with this, since it's not mentioned - there's a CLUSTER command that lets you reorder the table data to match an index. It's a one-off so you'll need to run it regularly from a crontab or something, but it's important to be aware of because postgres keeps a "correlation" statistic between the indexes and the order on disk. It affects the query planner, biasing it against random disk access and towards sequential disk access. It's one of the possible reasons postgres might not use an index that otherwise makes sense - the random disk access penalty is too high.
Depends very much on how the SSDs are designed internally. I think these days we have to settle for "can never be sure" of the real page size for atomicity. Pages can also become corrupt in other ways.
It is weird that "--data-checksums" isn't the default for new databases, even when it cost a bit in performance. Integrity should be more important than performance.
8k is a very common page size, but 4k isn't unheard of. Oracle's default is 4k.
The issue is that page size caps row size (for on-row storage). Also, if you have a smart clustering index, larger pages can be more efficient use of index addressing. So it's a trade-off.
Historically there was no atomicity at 4k boundaries, just at 512 byte boundaries (sectors). That'd have been too limiting. Lowering the limit now would prove problematic due to the smaller row sizes/ lower number of columns.
I guess it's a tradeoff. Ideally you'd want a small enough page size such that pages are unlikely to be split across multiple LBAs, but the performance wouldn't be good. Standard filesystems don't really guarantee LBA alignment anyway.
On mobile, it appears to open regardless of where you tap. This appears to be the culprit:
```const n = document.getElementById('nav-header');
document.addEventListener(
'click',
s => {
u.hidden ||
s.target === null ||
n === null ||
n.contains(s.target) ||
r()
}```
Above, in the same function, there exists the function `e.addEventListener('click', r);`, which is likely closer to what the author intended. This fires the 'click' event any time the page is clicked, which opens the nav menu when it shouldn't.
IIRC, if the original submitter edits the title once it has been posted, the edited version sticks, i.e. the filter only works the first time and you can override it if you notice it.
> Can’t we just store some data on disk and read / write from it when we need to? (Spoiler: no.)
I disagree. SQLite does a good job in uniting the 2 worlds: complex SQL queries with excellent data consistency and simple file(s). Although SQLite is for sure not the one size fits all solution.
> SQLite is for sure not the one size fits all solution
Nor is Postgres. PG is surprisingly versatile. E.g. with some extensions can be used as key-value storage (hashtable), document database, time-series db and so on. And it works quite well. Beyond "good enough" for many use cases. Added benefit, aside from having to run only one db-server, is that you can mix it: part relational, part document, etc.
But the PG versions nearly ever get as good as focused, dedicated solutions get. Which makes sense if you think about it: a team developing a dedicated key-value storage that does that and only that, for years, will always produce a better key-value storage then one bolted onto a generic RDBMS.
A practical example was where we used ltree extension to store ever growing hierarchies. We needed access control over subtrees (so that the X report for John only includes the entities of Johns devision and lower). While it worked in PG, it turned out that "simply replacing" it with OpenLDAP, which had all this built in, made it faster, easier and above all easier to maintain.
Thanks for the feedback, I really appreciate it :-) I've added the RSS feed to my home page now, as the other poster noted the URL is https://drew.silcock.dev/rss.xml.
dfox|1 year ago
Next paragraph mentions TOAST and this byte is related to that. The low order bits (on little endian platforms) determine whether the value is stored inline (00, first 4 bytes are total length), is stored in TOAST table (11) or is shorter than 127 bytes (01 for even length, 10 for odd length, the total length is first byte >> 1). So for 0x25 you get 01, so length is 0x25 >> 1 = 18, which is that byte followed by "Equatorial Guinea".
Edit: the reason why endianness matters is that the same representation is also used in memory and the whole first word is interpreted as one length value. The toast tag bits have to be in first byte, which is most easily done as two highest order bits of that word on big endian. That means that it is placed in the two highest bits of the byte.
indoordin0saur|1 year ago
MattJ100|1 year ago
In this case the substring is part of the author's name. Such names are not at all uncommon.
drewsberry|1 year ago
LVB|1 year ago
fullstop|1 year ago
forinti|1 year ago
crngefest|1 year ago
Is „tube“ on a blocklist as well?
twic|1 year ago
Brian_K_White|1 year ago
sgarland|1 year ago
[0]: https://blog.jcole.us/innodb/
westurner|1 year ago
> The Arrow columnar format includes a language-agnostic in-memory data structure specification, metadata serialization, and a protocol for serialization and generic data transport. This document is intended to provide adequate detail to create a new implementation of the columnar format without the aid of an existing implementation. We utilize Google’s Flatbuffers project for metadata serialization, so it will be necessary to refer to the project’s Flatbuffers protocol definition files while reading this document. The columnar format has some key features:
> Data adjacency for sequential access (scans)
> O(1) (constant-time) random access
> SIMD and vectorization-friendly
> Relocatable without “pointer swizzling”, allowing for true zero-copy access in shared memory
Are the major SQL file formats already SIMD optimized and zero-copy across TCP/IP?
Arrow doesn't do full or partial indexes.
Apache Arrow supports Feather and Parquet on-disk file formats. Feather is on-disk Arrow IPC, now with default LZ4 compression or optionally ZSTD.
Some databases support Parquet as the database flat file format (that a DBMS process like PostgreSQL or MySQL provides a logged, permissioned, and cached query interface with query planning to).
IIUC with Parquet it's possible both to use normal tools to offline query data tables as files on disk and also to online query tables with a persistent process with tunable parameters and optionally also centrally enforce schema and referential integrity.
From https://stackoverflow.com/questions/48083405/what-are-the-di... :
> Parquet format is designed for long-term storage, where Arrow is more intended for short term or ephemeral storage
> Parquet is more expensive to write than Feather as it features more layers of encoding and compression. Feather is unmodified raw columnar Arrow memory. We will probably add simple compression to Feather in the future.
> Due to dictionary encoding, RLE encoding, and data page compression, Parquet files will often be much smaller than Feather files
> Parquet is a standard storage format for analytics that's supported by many different systems: Spark, Hive, Impala, various AWS services, in future by BigQuery, etc. So if you are doing analytics, Parquet is a good option as a reference storage format for query by multiple systems
Those systems index Parquet. Can they also index Feather IPC, which an application might already have to journal and/or log, and checkpoint?
Edit: What are some of the DLT solutions for indexing given a consensus-controlled message spec designed for synchronization?
- cosmos/iavl: a Merkleized AVL+ tree (a balanced search tree with Merkle hashes and snapshots to prevent tampering and enable synchronization) https://github.com/cosmos/iavl/blob/master/docs/overview.md
- Google/trillion has Merkle hashed edges between rows in order in the table but is centralized
- "EVM Query Language: SQL-Like Language for Ethereum" (2024) https://news.ycombinator.com/item?id=41124567 : [...]
dfox|1 year ago
While logical decoding is about WAL, it is not related to the recovery process. Logical decoding is a mechanism to convert the WAL entries back into the high-level operations that caused the WAL entries, for example for replication or audit.
hinkley|1 year ago
ggm|1 year ago
I never entirely got it. Either your WAL is on more reliable media, or duplicated. If its just "easier" to write the WAL and faster to read off properly indexed state, ok, thats a local optimisation.
If your WAL is on the same filesystem behind a vendor specific RAID controller, you're still stuffed, if that RAID card dies.
drewsberry|1 year ago
jmholla|1 year ago
jayde2767|1 year ago
docker: Error response from daemon: create ./pg-data: "./pg-data" includes invalid characters for a local volume name, only "[a-zA-Z0-9][a-zA-Z0-9_.-]" are allowed. If you intended to pass a host directory, use absolute path.
drewsberry|1 year ago
nemothekid|1 year ago
>Something really important about tables which isn’t obvious at first is that, even though they might have sequential primary keys, tables are not ordered.
This was very surprising to read.
Izkata|1 year ago
nraynaud|1 year ago
silvestrov|1 year ago
It is weird that "--data-checksums" isn't the default for new databases, even when it cost a bit in performance. Integrity should be more important than performance.
da_chicken|1 year ago
The issue is that page size caps row size (for on-row storage). Also, if you have a smart clustering index, larger pages can be more efficient use of index addressing. So it's a trade-off.
anarazel|1 year ago
cedws|1 year ago
thomasben|1 year ago
shreddit|1 year ago
ZoomerCretin|1 year ago
```const n = document.getElementById('nav-header');
Above, in the same function, there exists the function `e.addEventListener('click', r);`, which is likely closer to what the author intended. This fires the 'click' event any time the page is clicked, which opens the nav menu when it shouldn't.drewsberry|1 year ago
topherjaynes|1 year ago
HPsquared|1 year ago
theblazehen|1 year ago
eitland|1 year ago
unknown|1 year ago
[deleted]
mharig|1 year ago
I disagree. SQLite does a good job in uniting the 2 worlds: complex SQL queries with excellent data consistency and simple file(s). Although SQLite is for sure not the one size fits all solution.
berkes|1 year ago
Nor is Postgres. PG is surprisingly versatile. E.g. with some extensions can be used as key-value storage (hashtable), document database, time-series db and so on. And it works quite well. Beyond "good enough" for many use cases. Added benefit, aside from having to run only one db-server, is that you can mix it: part relational, part document, etc.
But the PG versions nearly ever get as good as focused, dedicated solutions get. Which makes sense if you think about it: a team developing a dedicated key-value storage that does that and only that, for years, will always produce a better key-value storage then one bolted onto a generic RDBMS.
A practical example was where we used ltree extension to store ever growing hierarchies. We needed access control over subtrees (so that the X report for John only includes the entities of Johns devision and lower). While it worked in PG, it turned out that "simply replacing" it with OpenLDAP, which had all this built in, made it faster, easier and above all easier to maintain.
ozim|1 year ago
llimllib|1 year ago
(Enjoyed the post)
drewsberry|1 year ago
anotherevan|1 year ago
unknown|1 year ago
[deleted]
younes2701111|1 year ago
[deleted]