> A Datum now has a field for each possible type it may contain, rather than having separate interface implementations for each type. There is an additional enum field that serves as a type marker, so that when we do need to, we can inspect a type of a Datum without doing any expensive type assertions. This type uses extra memory due to having a field for each type, even though only one of them will be used at a time.
> The Go templating engine allows us to write a code template that, with a bit of work, we can trick our editor into treating as a regular Go file. We have to use the templating engine because the version of Go we are currently using does not have support for generic types.
Go’s lack of an expressive type system continues to disappoint :(
I'm shocked that they chose a garbage-collected language to implement a database with. Golang is great for building servers, but this is domain with potentially much greater resource constraints.
How do they deal with GC pause? I'm sure that they have an answer, but they'd have so much more latitude if they had the facility to reason about this from the ground up. They've sort of painted themselves into a corner now.
C, C++, or Rust would have been a better match and given me more faith in their product.
For the record, the first approach was just used for simplicity in the blog post. The production system works differently (and doesn't have to waste memory per-datum) - we have typed containers, each of which have a primitive slice within.
This is such a great post! Soooo many databases have implemented some kind of “column store extension” where they change the on disk representation and declare “look now we’re a hybrid database”. But the academic literature suggests that the bigger win for column stores may be in the execution layer. I love that Cockroach has taken the opposite of the usual approach and been so rigorous in measuring everything.
After doing all this work, is it your opinion that (hypothetically) a columnar on-disk representation would be a bigger or smaller win than a fully-built-our vectorized execution engine?
This is a complicated discussion because the relative performance is greatly impacted by design and implementation details. Broadly speaking, the execution engine and page representation designs are tightly coupled for performance optimization purposes. The kinds of optimizations outlined at the link will have a high return for most kinds of databases.
The term "columnar" covers a diverse set of architectures with very different operational characteristics. For OLTP (like CockroachDB), using a classic DSM-style columnar representation is going to offer poor write performance no matter what you do with the execution engine. On the other hand, if you are using one of the newer vectorized page representations (VSM) that are popular for mixed workloads, which are quasi-columnar but not DSM (nor one of the intra-page DSM hybrids like PAX), the loss of write performance may be minimal versus a classic row store (NSM) but with much faster query processing (faster than DSM for some types of queries). The execution engine design you would attach to any of these models is pretty different.
Note also that a practical limit on this kind of optimization is code complexity. While VSM-style on-disk representation and matching execution engine sounds like a nearly optimal hybrid of both NSM for write performance and DSM for query performance, an implementation that is general purpose and performs well across a diverse set of data models is massively more difficult and complex to build in practice so most database designers avoid it at all costs due to the engineering overhead. These tend to be more common when the set of supported data models are very limited at design time. It is a research area that still offers a lot of opportunity -- the literature mostly ignores parts of the productive design space that intrinsically have extremely high implementation complexity (too difficult to produce code in support of paper publication).
Thanks! I do think that a columnar on-disk representation would likely help with large analytical queries. But I would imagine they would negatively affect performance of point transactions in an OLTP workload. Note that this vectorized engine we built will only be used on a given query if our SQL planner estimates that a large number of rows will be read.
> "data needs to be read from disk in its original row-oriented format before processing, which will account for the lion’s share of the query’s execution latency."
IO is still a bottleneck without column-stores. The selectivity, compression and encoding alone can generate massive speedups because there's less data to process, and less to move through the execution pipeline.
But batch/vectorized processing on row-stores is gaining adoption. MemSQL and SQL Server also use similar techniques.
It's hard to say whether one would generally be a bigger gain than another without experimenting. We chose to focus on just execution because it was a relatively smaller project (with less side-effects) that still promised a large performance improvement. In the future, we might consider keeping data in columnar format on learner replicas to offer even better performance for users that would like to run OLAP-style queries on slightly stale data, but this would be a larger project.
On another note: Kdb has figured out how to optimize the heck out of columnar data stores and vector processing. They would be my benchmark for these types of optimizations.
does Go's compiler emit SIMD instructions? it'd be cool to see the disassembly for the final, column order version since that loop would vectorize well. the same question goes for the speed-of-light benchmark -- there may be room to push even further there if it's doing a single multiply at a time.
CockroachDB does target OLTP workloads. Note that the vectorized SQL engine covered in this blog post is execution-only (and used only for queries that operate on many rows). The storage layer remains row-oriented so the write performance is not affected. Rows are columnarized before processing by the execution engine.
A question I've never found a good answer to for columnar query engines is how to handle indexing.
For queries with very high selectivity it seems like any gains from increased cache-friendliness or SIMD would be erased by still needing to make random forward strides through the data.
The only solution I can really think of is to make the index point to a block of N elements and then take advantage of vector processing on the matching blocks. Have you run into this issue?
I haven't been able to find any examples of column SQL engine discussions that don't require whole-table scans.
One interesting solution is in Lucene — which isn't perhaps classically columnar, but something of a hybrid — which uses skip lists to allow random skipping through compressed index ("posting list") data.
Why is this sub-optimal? People may not like Cockroaches but they definitely have a reputation for durability, which isn't a bad mental link for a database company. It also provides an identity & personality which is so often missing from modern companies. Maybe they've found they get more from those who really like it than those who really hate it (or maybe they just like it themselves).
[+] [-] saagarjha|6 years ago|reply
> The Go templating engine allows us to write a code template that, with a bit of work, we can trick our editor into treating as a regular Go file. We have to use the templating engine because the version of Go we are currently using does not have support for generic types.
Go’s lack of an expressive type system continues to disappoint :(
[+] [-] echelon|6 years ago|reply
How do they deal with GC pause? I'm sure that they have an answer, but they'd have so much more latitude if they had the facility to reason about this from the ground up. They've sort of painted themselves into a corner now.
C, C++, or Rust would have been a better match and given me more faith in their product.
[+] [-] jordanlewis|6 years ago|reply
https://github.com/cockroachdb/cockroach/blob/master/pkg/col...
[+] [-] georgewfraser|6 years ago|reply
After doing all this work, is it your opinion that (hypothetically) a columnar on-disk representation would be a bigger or smaller win than a fully-built-our vectorized execution engine?
[+] [-] jandrewrogers|6 years ago|reply
The term "columnar" covers a diverse set of architectures with very different operational characteristics. For OLTP (like CockroachDB), using a classic DSM-style columnar representation is going to offer poor write performance no matter what you do with the execution engine. On the other hand, if you are using one of the newer vectorized page representations (VSM) that are popular for mixed workloads, which are quasi-columnar but not DSM (nor one of the intra-page DSM hybrids like PAX), the loss of write performance may be minimal versus a classic row store (NSM) but with much faster query processing (faster than DSM for some types of queries). The execution engine design you would attach to any of these models is pretty different.
Note also that a practical limit on this kind of optimization is code complexity. While VSM-style on-disk representation and matching execution engine sounds like a nearly optimal hybrid of both NSM for write performance and DSM for query performance, an implementation that is general purpose and performs well across a diverse set of data models is massively more difficult and complex to build in practice so most database designers avoid it at all costs due to the engineering overhead. These tend to be more common when the set of supported data models are very limited at design time. It is a research area that still offers a lot of opportunity -- the literature mostly ignores parts of the productive design space that intrinsically have extremely high implementation complexity (too difficult to produce code in support of paper publication).
[+] [-] rafiss|6 years ago|reply
[+] [-] manigandham|6 years ago|reply
IO is still a bottleneck without column-stores. The selectivity, compression and encoding alone can generate massive speedups because there's less data to process, and less to move through the execution pipeline.
But batch/vectorized processing on row-stores is gaining adoption. MemSQL and SQL Server also use similar techniques.
[+] [-] asubiotto|6 years ago|reply
[+] [-] anonu|6 years ago|reply
On another note: Kdb has figured out how to optimize the heck out of columnar data stores and vector processing. They would be my benchmark for these types of optimizations.
[+] [-] jordanlewis|6 years ago|reply
[+] [-] sujayakar|6 years ago|reply
[+] [-] qaq|6 years ago|reply
[+] [-] asubiotto|6 years ago|reply
[+] [-] xKingfisher|6 years ago|reply
For queries with very high selectivity it seems like any gains from increased cache-friendliness or SIMD would be erased by still needing to make random forward strides through the data.
The only solution I can really think of is to make the index point to a block of N elements and then take advantage of vector processing on the matching blocks. Have you run into this issue?
I haven't been able to find any examples of column SQL engine discussions that don't require whole-table scans.
[+] [-] atombender|6 years ago|reply
[+] [-] beders|6 years ago|reply
[+] [-] karlding|6 years ago|reply
[0] https://github.com/tbg/bikesheddb
[+] [-] ticmasta|6 years ago|reply
[+] [-] j1vms|6 years ago|reply
[+] [-] RussianCow|6 years ago|reply
[+] [-] beachy|6 years ago|reply