top | item 46267287

(no title)

setr | 2 months ago

I’m not clear on how you’re deviating from a normal columnar/OLAP database?

> I found that these columnar stores could also be used to create regular relational database tables.

Doesn’t every columnar store do this? Redshift, IQ, Snowflake, ClickHouse, DuckDB etc

> but it proves that it is possible to structure relational data such that query speeds can be optimal without needing separate indexing structures that have to be maintained.

Doesn’t every columnar database already prove this?

discuss

didgetmaster|2 months ago

I am not an expert on all the other columnar stores out there; but it is my understanding that they are used almost exclusively for OLAP workloads. By 'regular database tables', I meant those that handle transaction processing (inserts, updates, deletes) along with queries.

My system does analytics well, but it is also very fast with changing data.

I also think that some of those systems (e.g. Duckdb) also use indexes.

setr|2 months ago

They’re used by OLAP workloads because columnar properties fits better — namely, storing data column-wise obviously makes row-wise operations more expensive, and column-wise operations cheaper; this usually corresponds to point look-ups vs aggregations. Which cascades into things like constraint-maintenance being more expensive, row-level triggers becoming a psychotic pattern, etc. Column-wise (de-)compression also doubles-down on this.

They still do all the regular CRUD operations and maintain transactional semantics; they just naturally prefer bulk operations.

Redshift is the most pure take on this I’ve seen; to the point that they simply don’t support most constraints, triggers and data is allocated in 2MB immutable chunks such that non-bulk-operations undergo ridiculous amounts of write amplification and slow to a crawl. Afaik other OLAP databases are not this extreme, and support reasonable throughput on point-operations (and triggers, constraints, etc) — in the sense that it’s definitely slower, but not comically slower. (Aside: Aurora is also a pure take on transactional workloads, such that bulk aggregations are comically slow)

> I also think that some of those systems (e.g. Duckdb) also use indexes.

I’m pretty sure they all use indexes, in the same fashion I expect you to (I’m guessing your system doesn’t do table-scans for every single query). Columnar databases just get indexes like zone-maps for “free”, in the sense that it can simply be applied on top of the actual dataset without having to maintain a separate copy of the data ALA row-wise databases do. So it’s an implicit index automatically generated on every column — not user-maintained or specified. I expect your system does exactly the same (because it would be unreasonable not to)

> My system does analytics well, but it is also very fast with changing data.

Talk more, please & thank you. I expect everything above to be inherent properties/outcomes of the data layout so I’m quite curious what you’ve done