(no title)
setr | 2 months ago
> I found that these columnar stores could also be used to create regular relational database tables.
Doesn’t every columnar store do this? Redshift, IQ, Snowflake, ClickHouse, DuckDB etc
> but it proves that it is possible to structure relational data such that query speeds can be optimal without needing separate indexing structures that have to be maintained.
Doesn’t every columnar database already prove this?
didgetmaster|2 months ago
My system does analytics well, but it is also very fast with changing data.
I also think that some of those systems (e.g. Duckdb) also use indexes.
setr|2 months ago
They still do all the regular CRUD operations and maintain transactional semantics; they just naturally prefer bulk operations.
Redshift is the most pure take on this I’ve seen; to the point that they simply don’t support most constraints, triggers and data is allocated in 2MB immutable chunks such that non-bulk-operations undergo ridiculous amounts of write amplification and slow to a crawl. Afaik other OLAP databases are not this extreme, and support reasonable throughput on point-operations (and triggers, constraints, etc) — in the sense that it’s definitely slower, but not comically slower. (Aside: Aurora is also a pure take on transactional workloads, such that bulk aggregations are comically slow)
> I also think that some of those systems (e.g. Duckdb) also use indexes.
I’m pretty sure they all use indexes, in the same fashion I expect you to (I’m guessing your system doesn’t do table-scans for every single query). Columnar databases just get indexes like zone-maps for “free”, in the sense that it can simply be applied on top of the actual dataset without having to maintain a separate copy of the data ALA row-wise databases do. So it’s an implicit index automatically generated on every column — not user-maintained or specified. I expect your system does exactly the same (because it would be unreasonable not to)
> My system does analytics well, but it is also very fast with changing data.
Talk more, please & thank you. I expect everything above to be inherent properties/outcomes of the data layout so I’m quite curious what you’ve done