dobe's comments

dobe | 11 years ago | on: Crate: Distributed SQL Database for the Age of Docker

I guess you are talking about the plan to have the data of one shard only on one disk (see https://github.com/elastic/elasticsearch/issues/9498)? This does not necessarily mean that you will end up having only one shard per datapath - only if you have just one shard per node. But you are right, the change might lead to unbalanced disk usage in some scenarios, where increasing the number of shards would solve the problem.

There are two options:

1. (Recommended for now) Export the table with COPY TO ( https://crate.io/docs/stable/sql/reference/copy_to.html). Drop the table and then import it again using COPY FROM (https://crate.io/docs/stable/sql/reference/copy_from.html ).

2. Use insert by query (see https://crate.io/docs/stable/sql/dml.html#inserting-data-by-... ) if it is ok for you to copy the whole data to another table (with more shards).

1) is recommended, since it allows for throttling on import time (see https://crate.io/docs/en/latest/best_practice/data_import.ht...) and also does not require a rename of a table, which is currently not implemented but is on our backlog. However i think once ES 2.0 is out we will have table renames and also throttling in insert by query, so option 2) will be recommended then.

Our genreal recommendation to the fixed number of shards limitation is to choose a higher number of shards upfront (number of expected cores matches the most use cases) or to use partitioned tables (https://crate.io/docs/en/latest/sql/partitioned_tables.html) where possible since those allow to change the number of shards for future partitions.

dobe | 12 years ago | on: Crate.io – Big Data SQL in real time

hi, this is bernd from crate ... first - we are overwhelmed here at crate that we are mentioned at hacker news! by reading through the comments on this page, i thought it would make sense to give you some background, which should at least partly answer some questions mentioned here.

- we come from the service business and discovered that nearly every database design for applications which needed to scale somewhere reached a point where data needed to be de-normalized because joins where simply too expensive in terms of cost and latency when data does not fit on a single affordable machine. therefore we do not have join support yet. however we already planned to allow joins in the future which still makes sense for smaller datasets of course, but it is currently not a top priority, since many join use-cases could also be implemented by using nested objects which we support.

- we have chosen SQL as a query language, since this allows us to re-use existing ORMs and tools. but most of all SQL is still a great language to define queries, so we thought "why re-invent the wheel and crate yet another query syntax"

- regarding sharding: we use a hash/modulo based sharding mechanisms - actually the same as elasticsearch, since we use elasticsearch under the hood for cluster state, sharding and replication. we also added partitioned table support in our current development branch.

there are still a lot of features on our roadmap; and apparently also a lot of things we need to document and explain in our documentation. so if you are interested in our progress you might keep an eye on our github project page https://github.com/crate/crate

thx, bernd

page 1