Ex-Facebookers launch MemSQL (YC W11) to make your database fly

[+] Shish2k|13 years ago|reply

I've seen a lot of these new SQL databases, and they all seem to compare themselves to MySQL. That seems like a particularly easy target, since MySQL's main strength is speed, and it's a fundamentally old-school (ie slow) design which has sacrificed everything else to get half-fast. What I'd like to know is how any of these compare in features and reliability to the featureful, reliable databases like Oracle and Postgres.

(For specific examples that I've run into recently: does it have geo-spatial extensions, and does it crash and burn and corrupt all its data if a malloc() fails)

[+] chris_wot|13 years ago|reply

It still uses a transaction log, but seems to make snapshots of the logs at a period of time to create compressed snapshots of the data itself.

You can turn off durability, but then you risk losing everything.

For features... forget about geo-spatial extensions - they can only do read-committed isolation level... but as they only allow for one SQL statement per transaction I suppose this might not be such a big concern.

If you were using this for something like a session management database, or something where durability is not so important then it's probably fine. Not sure I'd use it for anything that relies heavily on transaction management features.

I see other limitations, like it cannot support CHANGE COLUMN, and it can't do joins on more than 2 tables. Actually, it doesn't look like it can do FULL OUTER or RIGHT OUTER joins, either. :(

[+] tintor|13 years ago|reply

"and does it crash and burn and corrupt all its data if a malloc() fails"

MemSQL is fully durable (with option of sync and async durability, similar to sync and async replication) with its transactional log. If MemSQL process is killed it will recover all commited transactions on the next startup. Durability has very little performance cost on throughput as MemSQL uses only sequential I/O. And of course, durability can be completely turned off for maximum performance.

[+] vessenes|13 years ago|reply

This company was probably an extremely easy investment decision:

1) Facebookers: Check

2) Data Scaling experience: Check

3) In-Memory with SQL semantics: Check

4) New-York based software that can be sold to quant funds: Check

I'm excited. It's downloading now.

[+] moe|13 years ago|reply

Is it just me or does this entire product smell like it was designed for the sole purpose of extracting money from less than tech-savvy investors and clueless institutions?

The bullshit-bingo-lingo on their homepage is mindnumbing.

Meanwhile their actual software seems rather underwhelming, bordering on SnakeOil.

[+] ericfrenkiel|13 years ago|reply

Thanks! We're super-excited to have the opportunity to build this. When we started we asked ourselves - what would a database look like if it were designed today?

We see a future where OLTP databases live in memory, and where you have hundreds of terabytes of memory and hundreds to thousands of CPUs at your disposal.

We built MemSQL to make it easy to go directly into memory and get the speed improvements we all need. What we're releasing aims to fulfills that vision.

[+] vessenes|13 years ago|reply

p.s. Eric, two annoyances about your site: once I've been dumped on to the developers subsite, it's non-trivial to get back to the main company homepage.

Also, pricing. Come on, man! Pricing!!

[+] moe|13 years ago|reply

Is this a joke?

Data must fit in RAM, no joins over >2 tables, no transactions, no builtin support for clustering/sharding/horizontal scaling whatsoever.

What is the advantage over memory-tables, MySQL on a ramdisk or something purpose-built (redis)?

[+] chris_wot|13 years ago|reply

That's a little harsh I think. It still has a transaction log, so it has certain advantages over memory tables. Some interesting enhancements to indexes also. And it is also binary compatible with MySQL, so you can actually add it as a MySQL node.

I'd say it's a database for a niche market, as one of the folks from MemSQL has been saying. It would fit in nicely under a few scenarios.

[+] AaronBBrown|13 years ago|reply

MEMORY tables have table-level locks and from a concurrency perspective, is just as bad as MyISAM, so really offer no advantage unless there is only a small number of consumers. A well tuned InnoDB table will outperform the MEMORY engine )(after the warm-up period) as soon as you have more than a handful of users performing write operations.

[+] tigerBL00D|13 years ago|reply

Advantages are clear. Blazing speed, transactional durability, ease of use and flexibility that come from using a well known protocol and SQL dialect. Did I miss something?

[+] old-gregg|13 years ago|reply

On the other hand, if your data fits in RAM and you don't need to join more than two tables and you don't need clustering/sharding whatsoever and you need the full power of SQL, what would you use?

> ..something purpose-built (redis)?

Errr maybe "features" like GROUP BY with HAVING? I think you're trolling, Mr Moe.

[+] snissn|13 years ago|reply

MemSQL does not yet support JOINs on more than 2 tables. [1]

1. http://developers.memsql.com/docs/1b/sql/join.html

[+] chris_wot|13 years ago|reply

Nor transactions with more than one statement. And only read committed isolation level. [1]

1. http://developers.memsql.com/docs/1b/isolationlevel.html

[+] alexro|13 years ago|reply

How it compares to VoltDB, which used similar approach and already "combines the proven power of relational processing with blazing speed, linear scalability and uncompromising fault tolerance" (according to VoldDB website) ?

[+] nikita|13 years ago|reply

There are few differences with VoltDB. 1. It's much easier to use (MySQL compatibility, ad-hoc sql no java dependency) 2. It doesn't have any issues with data skews. We are using lockfree skiplists vs b-trees partitioned by core. 3. I believe we are faster on a single box, however this claim should obviously be verified by a third party.

[+] spitfire|13 years ago|reply

There's another in memory mysql replacement whose name escapes me right now. How are they doing?

Ycombinator seems to have backed a few of these next-gen databases. It will be interesting to see who wins.

EDIT: Rethinkdb. But it looks liked they've dumped their mysql engine and gone to a pure key-value store.

[+] JonM|13 years ago|reply

Xeround looks really interesting (http://xeround.com/). Seems to do everything in RAM and takes care of any replication / backup & scailing for you. Looks like it would get expensive with large amounts of data though... anyone using this in production?

[+] sargun|13 years ago|reply

What do you think of Riak (basho.com)?

[+] zwischenzug|13 years ago|reply

I'm obviously missing something. Is SQL really the bottleneck for SQL performance? I'm not sure how much compiling down to C++ will help.

Is there more to this than the article suggests?

[+] chris_wot|13 years ago|reply

Most times that the press mention SQL, they actually mean "Relational Database Management System". But in this case, MemSQL seems to be doing two things: reducing locking semantics (the only transaction isolation level is read committed), and doing something or other to speed up query compilation.

Frankly, I can't see how their query compliation is much better than anything Oracle, SQL Server or PG does already. (see here: http://developers.memsql.com/docs/1b/codegen.html) Perhaps there is no plan cache in MySQL, or the plan cache is inadequate?

Incidentally, you don't want to use it if you need to do multi-statement transactions. Can't be done. Check out the documentation here:

http://developers.memsql.com/docs/1b/isolationlevel.html

It has a few extra tricks up its sleeve to enhance indexes in MySQL, but again nothing really particularly spectacular. http://developers.memsql.com/docs/1b/indexes.html

More interesting for me, is that it can be hosted on EC2, which is pretty neat!

[+] MattRogish|13 years ago|reply

Most DBMS products have "Stored Procedures" which, essentially, are compiled chunks of SQL that you can call by name, e.g.

CREATE PROCEDURE foobar AS SELECT * FROM users

On very large queries (many table joins) there can be a noticeable performance hit while the DBMS figures out the optimal query plans. Stored procedure compile and store this plan after first execution (some pre-compile) so the DBMS can just look it up in the cache.

It also removes the need to do runtime parsing (compiled vs. interpreted languages). There are more nuances to Stored Procedures than I can put here, but here's a link to a more in-depth analysis I wrote: http://stackoverflow.com/questions/59880/are-stored-procedur...

In general, MySQL has limited query plan cache (last I read, it was one cache per database connection and was wiped out when the client disconnected), so this is definitely a win. However, since every other DBMS out there already has robust query cache, I don't think the Postgres/Oracle/MS SQL etc. folks are losing any sleep.

[+] al_james|13 years ago|reply

I think its that its an in memory database, optimised for memory as opposed to disk. The SQL to C++ seems a bit of a confusion to me, I guess it means it has a JIT SQL compiler to optimise the query to native opcodes. Meh. But the in memory and mysql wire compatibility are big wins.

[+] imfine|13 years ago|reply

I am not sure this solves anything. From their implementation details I could just mount mysql on a memory filesystem and have 100% of MySQL's feature. If I need near 100% assurance that the data will be saved, there is always block replication, transaction logs and regular replication. their transition module adds nearly nothing but overhead, since stored procedures are better, and if you have the need for ultra performance, there is always the ability to access mysql at the command level and bypass sql entirely.

[+] nikita|13 years ago|reply

There are many bottlenecks and it really depends on the workload. Sometimes it's IO, sometimes it's CPU. When you have a throughput problem, yes, it's the SQL.

[+] continuations|13 years ago|reply

Why should I use this over MySQL Cluster?

MySQL Cluster is lock-free, distributed, HA, and has better performances:

http://mikaelronstrom.blogspot.com/2012/05/mysql-cluster-72-...

http://mikaelronstrom.blogspot.com/2012/05/mysql-cluster-727...

And MySQL Cluster has been used in mission critical apps for years.

What does MemSQL give me that MySQL Cluster doesn't?

[+] chris_wot|13 years ago|reply

Question - how well does the query parser handle stale queries? SQL Server for the longest time had issues with plan stability in that the plan became too stable. When the data distribution changed dramatically, the queries didn't age out of the cache and the queries would do such things as use the wrong index, or not work out the correct cardinality of a table and then use an index where really a full table scan would have been better... and so on.

How does the database handle this sort of thing?

[+] nikita|13 years ago|reply

Right now you can handle this with query hints. The plans are stable, but they are attached to the query text. When you throw a hint in there memsql generates a new plan since the query text changes.

[+] acdha|13 years ago|reply

It looks like this is a fork of MySQL. Did they actually pay Oracle for a proprietary license or is there a source release hidden on the website?

[+] ericfrenkiel|13 years ago|reply

yes, we have commercial licensing agreements in place.

[+] al_james|13 years ago|reply

Looks useful. Drop in mysql compatibility is a huge plus for many projects that are tied to mysql (for whatever reasons).

Does anyone know if there are any plans for having MongoDB style replica sets (e.g. sharded and replicated databases in a cluster)?

Also, it would be great if it supported the native mysql replication, so you could have MemSQL replicas of a master mysql DB.

[+] ankrgyl|13 years ago|reply

Sharded/replicated MemSQL is in the works. And yep, the replication will be compatible with MySQL replication to enable grouping MemSQL and MySQL nodes.

[+] factorialboy|13 years ago|reply

Looks promising. Love how it inter-operates with MySQL driven apps.

I should be able to try it for one of my apps over this weekend!

[+] ozgune|13 years ago|reply

Congrats guys on the launch! Looking forward to giving it a spin.

[+] prezjordan|13 years ago|reply

Looks like they're using jQuery Knobs: http://anthonyterrien.com/knob/

[+] nikita|13 years ago|reply

Yes, for the workload simulator :)

[+] alexrson|13 years ago|reply

Congratulations, Eric!

[+] Produce|13 years ago|reply

Why not just use objects in your favourite shitty OOP language as usual and take regular snapshots of the data in memory and save them to disk as rollback points instead? Bam! Nothing new to implement in the application. Bam! Lightening fast access and write times. Bam! Data persistence.

Bam!

[+] ericmoritz|13 years ago|reply

Wow, HN is harsh. This is probably a really solid product that solves a specific problem and you all nit-pick at it's trade-offs and poo-poo it.

[+] MrMan|13 years ago|reply

HN is not harsh. HN is one of the main marketing channels for YC-funded projects such as this.

104 comments