dobe | 11 years ago | on: Crate: Distributed SQL Database for the Age of Docker
dobe's comments
dobe | 11 years ago | on: Powerstrip: prototype Docker extensions today
dobe | 12 years ago | on: Crate.io – Big Data SQL in real time
- we come from the service business and discovered that nearly every database design for applications which needed to scale somewhere reached a point where data needed to be de-normalized because joins where simply too expensive in terms of cost and latency when data does not fit on a single affordable machine. therefore we do not have join support yet. however we already planned to allow joins in the future which still makes sense for smaller datasets of course, but it is currently not a top priority, since many join use-cases could also be implemented by using nested objects which we support.
- we have chosen SQL as a query language, since this allows us to re-use existing ORMs and tools. but most of all SQL is still a great language to define queries, so we thought "why re-invent the wheel and crate yet another query syntax"
- regarding sharding: we use a hash/modulo based sharding mechanisms - actually the same as elasticsearch, since we use elasticsearch under the hood for cluster state, sharding and replication. we also added partitioned table support in our current development branch.
there are still a lot of features on our roadmap; and apparently also a lot of things we need to document and explain in our documentation. so if you are interested in our progress you might keep an eye on our github project page https://github.com/crate/crate
thx, bernd
There are two options:
1. (Recommended for now) Export the table with COPY TO ( https://crate.io/docs/stable/sql/reference/copy_to.html). Drop the table and then import it again using COPY FROM (https://crate.io/docs/stable/sql/reference/copy_from.html ).
2. Use insert by query (see https://crate.io/docs/stable/sql/dml.html#inserting-data-by-... ) if it is ok for you to copy the whole data to another table (with more shards).
1) is recommended, since it allows for throttling on import time (see https://crate.io/docs/en/latest/best_practice/data_import.ht...) and also does not require a rename of a table, which is currently not implemented but is on our backlog. However i think once ES 2.0 is out we will have table renames and also throttling in insert by query, so option 2) will be recommended then.
Our genreal recommendation to the fixed number of shards limitation is to choose a higher number of shards upfront (number of expected cores matches the most use cases) or to use partitioned tables (https://crate.io/docs/en/latest/sql/partitioned_tables.html) where possible since those allow to change the number of shards for future partitions.