top | item 8093476

Aerospike goes Open Source

123 points| ZenoArrow | 11 years ago |aerospike.com | reply

84 comments

order
[+] ZenoArrow|11 years ago|reply
Worth pointing out that unlike some NoSQL engines, Aerospike does have access to its own query language (AQL) that is syntatically similar to SQL... https://docs.aerospike.com/pages/viewpage.action?pageId=3807...

From the AQL query documentation you have... SELECT name, age FROM users.profiles WHERE age BETWEEN 20 AND 29 ...which is pretty easy to understand.

There´s also a Python client (Apache licensed)... https://github.com/aerospike/aerospike-client-python

[+] yeukhon|11 years ago|reply
The APGL license caught my eyes. Does anyone take this license into account (deciding between an Apache/MIT/BSD DB vs APGL DB) when they use it in their service / software stack.

For example, this openstack thread always keeps me alert using APGL when I am developing a solution. http://lists.openstack.org/pipermail/openstack-dev/2014-Marc...

and here is MongoDB's FAQ explaining APGL in plain English: http://blog.mongodb.org/post/103832439/the-agpl

[+] throwaway6829|11 years ago|reply
Where I work[1], AGPL software is strictly and unconditionally forbidden to use for anything, even things that are completely internal and will never see a public user.

The fear that our lawyers have is that, since putting up the software in a service counts as a derived work, our whole software stack (including the stuff we don't open source) will have to be opened along with it. There have to be clear service boundaries between the AGPL software and the stuff we write ourselves, and the lawyers don't trust us to write in appropriate boundaries.

It's really kinda tragic, because we actually do submit source code upstream when we make changes to open source software that we run internally. As in, if it's an OSS product that we just use for some dumb internal automation thing, we'll submit patches if the license is BSD or MIT, but as soon as GPL (especially AGPL) hits anything suddenly the lawyers get paranoid because of what constitutes a "derived work", which can be interpreted as anything that links against the software to make a complete product.

The upshot of this is, if the OSS software is on an unrestrictive license like BSD or apache, we contribute upstream. If it's GPL or especially AGPL, we simply don't touch it, ever.

[1] A very, very well known technology company.

[+] ZenoArrow|11 years ago|reply
I can't see it being an issue. As mentioned in the useful MongoDB link you shared, the licence will require sharing only when modifying the database code, but not require automatic sharing of the rest of the software stack.

Have there been any court cases involving AGPL violations? I wonder if some of Gil Yehuda's fears are partly out of lack of clarity on where the reach of the AGPL ends? For example claiming the MongoDB drivers 'violate the AGPL license', I´d prefer to see a response from GNU on this.

[+] seiji|11 years ago|reply
AGPL is essentially the "corporate coward" license. They want to capture all of your private changes to basically get free (legally mandated, zero "community good will") development resources. Companies think it's "safer" than BSD or straight up GPL because lawyers feel "omgz, source codes, zero cost IP copies, instant competition!"

The next step after AGPL will probably be BrainGPL requiring you to publish all thoughts you have about any code you look at ever.

[+] remon|11 years ago|reply
"A database literally ten times faster than existing NoSQL solutions, and one hundred times faster than existing SQL solution".

The odds of that claim being true and/or supported by benchmarks is somewhere well below the 1% mark. Why do companies keep making those sort of obviously questionable claims knowing the negative backlash that surely will follow. Boggles the mind really.

[+] ZenoArrow|11 years ago|reply
It's likely from the marketing team, not the engineering team. That doesn't mean the product itself cannot be solid.
[+] bbulkow|11 years ago|reply
CTO, co-founder, initial coder here.

Aerospike really is a lot faster than Mongo and Cassandra. It's open source, and you can run whatever benchmarks you'd like yourself. It's about as fast a well-tuned multi-core sharded Redis system, except you don't have to write configure the sharding, and you can have a combination of RAM or Flash, different data in each, of course Flash is cheaper/slower but that's why we give you both.

You can run a single c3.8xlarge on amazon and see 1m tps, or 250K on a c3.2xlarge. We're doing a lot of benchmarks on EC2 and GCE because they're "reference platforms" that you'll all believe. More details in the coming weeks from us, or publish your own.

Just try it yourself; this isn't marketing.

Everyone I talk to coming from Cassandra is seeing a server reduction of 4x~5x, with higher levels of stability (overhead for peaks). I was at a conference late last week and the company I was with (adform's founder, Jakob) said they had a major Cassandra outage that week that cost them a lot of money, and Adform is a Cassandra contributor and knows what they're doing.

Same thing with Mongo shops. They do about 5x reduction and see much higher performance.

Technical points of why we're faster:

* Coded in C, multitheaded, with reference counting.

* Avoid malloc, but if you have to malloc, avoid the CLib memory allocator. We do a lot of slab allocation (a la memcache) and use JEmalloc for variable sized allocs.

* Use epoll directly and be careful about IO. Don't use mmap, which is 4x slower than read and write.

* Code directly to device, with your own data layout. Databases are a reliability layer, everything else is extra complexity. O_SYNC is better than fsync.

There's a lot of smaller tricks in the code, but it all adds up to speed, and I don't expect you to believe me. I've spent 25 years in silicon valley writing high performance software, and so has most of the team. We come from a strong background of embedded, settop box, cell phone programmers.

Let me tell you a short story. I brought my particular bag of tricks to a streaming video server company in the mid 90's. I produced an internal product that was 100x faster - that is, required 100x lower cost hardware than the company's existing product (133mhz Pentium instead of high end sun machines). The product got buried - because the sales guys couldn't make their commission checks.

I'm tired of that mentality.

Aerospike has been running in production at seriously high loads for years. I work with a lot of guys who say - "What else am I going to use?" For the use case where you want KVS, with decent API support (redis-like lists and UDFs), and a little analytics, and scale-out adding nodes under production load, it's the right choice.

If you're thinking of a Mongo KVS, Cassandra, Redis, you really need to look at Aerospike. Do yourself, and your startup, a favor.

( And, yes, the name is based on the Aerospike engine, but we were thinking more of the Trident II D5, which uses an Aerospike at the front, to essentially extend the aerodynamic length of the missile. The problem with sub-based missiles is they have to be short to fit in the sub, and a use of the aerospike was one of many techniques for making the US based deterrent accurate. We used the name Aerospike because there are a lot of small techniques that make an "unbelievable" difference - that's what engineering is, compared to theory. )

PM me directly if you're having trouble running benchmarks or anything.

[+] perlgeek|11 years ago|reply
Now I'm waiting for aphyr to test aerospike under cluster partition :-)
[+] alexnewman|11 years ago|reply
OK I can't find any tests. It also disturbs me when people "open source" projects without any real revision history.

Also is it true that https://github.com/aerospike/aerospike-server hasn't been updated?

[+] cstivers1978|11 years ago|reply
Aerospike maintains a comprehensive set of tests for the server. Every commit goes through functional and regression tests. Each release goes through a gauntlet of performance and clustering tests. The test system is a standalone system from the database, and is integrated with our CI system. Unfortunately, we have not been able to publish our test system, yet.
[+] ryanobjc|11 years ago|reply
A system without comprehensive tests is one that cannot be changed.

The big question is, do they have tests or not?

[+] TheCondor|11 years ago|reply
It's an impressive cache. Last I looked at it they were using lua and it looked like they were going squarely after mongo
[+] ZenoArrow|11 years ago|reply
The core code is written in C, but they're definitely using Lua in places, I believe they've incorporated some code from AlchemyDB... https://code.google.com/p/alchemydatabase/

From what I've read it could easily surpass Mongo, just look at the cost savings... http://www.datanami.com/2013/09/06/aerospike_says_secret_to_... "The second comparison (a video ad serving platform) had much bigger requirements, including a 5TB database processing 500,000 TPS. The hybrid SSD-DRAM setup running the AeroSpike database was able to handle the load with just 14 servers, at a total cost of $322,000, compared to 186 servers using NoSQL running on clusters of servers that use a lot of DRAM and cost $5.6 million."

They´re ACID compliant as well, which Mongo is not (AFAIK)... https://www.youtube.com/watch?v=nnxj77NNEeg

[+] meritt|11 years ago|reply
I don't think this press release has nearly enough buzzwords.
[+] jaseemabid|11 years ago|reply
Yep. Everything about Aerospike so far have been filled with buzzwords. There was a flash talk by an employee of theirs recently at a conference in Bangalore and it was just marketing BS. I'm still skeptical.
[+] rasz_pl|11 years ago|reply
I dont know, are they web scale?
[+] alexnewman|11 years ago|reply
Common atleast has some tests posix4es-MacBook-Air-3:aerospike-common posix4e$ cloc src/main/ - 6603 posix4es-MacBook-Air-3:aerospike-common posix4e$ cloc src/test/ - 1247

I worry about test coverage stats like that

Not to mention if you look at the tests

/ * TEST CASES /

TEST( msgpack_roundtrip_integer1, "roundtrip: 123" ) { as_integer i1; as_integer_init(&i1, 123);

        as_integer i2;
        as_integer_init(&i2, 456);

        as_val * v2 = roundtrip((as_val *) &i1);

        assert_val_eq(v2, &i1);

        as_integer_destroy(&i1);
        as_val_destroy(v2);
}

Not exactly terse and readable

[+] shabinesh|11 years ago|reply
Aerospike guys were at a big data workshop at Bangalore last week. It's performance is pretty impressive. They claimed about >1M TPS with just 3 nodes and compared it with Couchdb which is claimed to achieve 1M TPS in 330 nodes. But unclear about their benchmarking method.
[+] remon|11 years ago|reply
"But unclear about their benchmarking method". That can be said for every single performance claim they make. There's a rather distinct lack of objective facts.
[+] rolfvandekrol|11 years ago|reply
Hmm, if the community edition server is AGPL licensed, are they even allowed to have an enterprise edition that is not AGPL licensed? I suppose the enterprise edition is an altered version of the community edition, so can they be 'forced' to publish their changes?
[+] infinite8s|11 years ago|reply
This seems to be a common misconception about the interplay of copyright and GPL style licenses. Since they own the copyright, they can relicence it however they choose. The GPL and other open source licenses just give non-copyright holders additional rights beyond what copyright law provide (which for most works is nothing beyond a bit of fair-use). In that way the GPL is a clever hack of copyright, since it relies on the default of no-rights granted by copyright to enforce its terms.
[+] weitzj|11 years ago|reply
If the server uses the AGPL instead of the GPL, why does it matter that the clients are under an Apache License? I thought if you use the AGPL you have to contribute back the client code as well, when you use the server remotely.
[+] DoubleMalt|11 years ago|reply
That is not true.

If you expose an service based on AGPL licensed service, you have to make the source code available to the services that use it.

For example you could modify WordPress (which is GPL licensed), put it on your server and let it serve pages without providing your modified source code to anyone.

If WordPress was AGPL licensed you would have to provide your modified source code to anyone using the system.

This also effects services that use libraries that are AGPL licensed (like newer versions of iText), but not services, that use other services.

The point is AGPL only adds that if you consume it over the network, you have the right to the source code. If you use it as a network service, for your webapp, your webapp is the consumer.

MongoDB has the same licensing model, and nobody sued Foursquare for the source code, so I guess this is legally tested ;)

[+] teddyh|11 years ago|reply
I’m sorry, but people seem to believe all manner of wild and crazy things about the AGPL – this means that you have to back up your claims with references.
[+] blitzprog|11 years ago|reply
How does this perform in comparison to Riak (cluster, v2.0)?
[+] simi_|11 years ago|reply
> Aerospike’s mission is to rain bullshit on the entire field of databases by offering an addictive proposition: a database literally ten times faster than existing NoSQL solutions, and one hundred times faster than existing SQL solutions.

Gotta love Disrupt to Bullshit: https://chrome.google.com/webstore/detail/disrupt-to-bullshi...