top | item 6859767

ArangoDB

94 points| majidazimi | 12 years ago |arangodb.org | reply

An open-source database with a flexible data model for documents, graphs, and key-values. Build high performance applications using a convenient sql-like query language or JavaScript extensions.

67 comments

order
[+] saurik|12 years ago|reply
> As a relational database user we are used to treat the database as stupid and mainly use it to save and retrieve data. ArangoDB lets you extend the database using Javascript (production ready) and Mruby (experimental).

?!? A common complaint against relational database people are having "too much" logic in the database. (I clearly don't agree, using store procedures and custom extensions ;P.)

[+] Pxtl|12 years ago|reply
Personally I loathe stored procedures because often they include a lot of logic that shouldn't be on the database and they also generally involve SQL extenstions that are pretty terrible for the general-purpose computing you see them used for.

But if that layer is primarily used for providing, controlling, and optimizing access to the data I can see the appeal. And in that case? Being able to write the procedural parts in a language that's less-terrible than typical SQL extensions would be really nice.

[+] poseid|12 years ago|reply
Yet, if you have to rollout a database server and an application server, it can be quite some overhead for certain kind of applications. I think this data-layer on top of the data store is especially interesting for e.g. backends in a network world, where your data is distributed, or e.g. when you need to aggregate data from multiple sources.

Or, what use case is your comment about?

[+] shin_lao|12 years ago|reply
I have a feeling, tell me if you share it.

"Wouldn't it be cool to have a multipurpose database which we would be able to query with a language, but not SQL, because SQL sucks, for some reason.".

Put it differently, what does ArangoDB, MongoDB, whateverDB bring that relational databases didn't bring 30 years ago?

[+] sophacles|12 years ago|reply
Some things just aren't relationally shaped. You can model them relationally, but it can be a pain.

For instance, graphs are totally doable with a traditional rdbms, however it is painful. You end up joining a table against itself (or via an edges table) multiple times, or alternately bouncing many queries off the table as you iterate the graph. One common type of NoSQL db is the graph database that is designed with graphs in mind and you don't even have to think about this access. It is nice.

Another case that you can do with traditional RDBMS but is annoying is loose user defined fields, such as "tags" where you have to create a tags table and a join table to make it work, with a lot of potential inefficiency there (even with indexes). Or even worse, when you have user defined attributes - lots of custom table creation per user, or big joins against a star topology to do it properly. (Or if not, you end up with something that looks like an sql database to a bunch of frontend code).

Of course other times you'll find yourself doing something in a NoSQL database that is effectively doing a bunch lookups against a table then combining a result (at which point switching to RDBMS is the solution)...

I guess what I'm saying is that lately the data store is being looked at as a component more like a library than a subsystem. I'm not sure if this is good or bad, but it certainly has helped a few cases I've dealt with nicely - ripping out a horribly complex data layer and replacing it with a NoSQL solution where the data model is shaped like my data.

Basically it's a matter of the right tool for the job.

[+] jsteemann|12 years ago|reply
To name just a few: - being relaxed about schemas: no more long-running ALTER TABLE commands, no more up-front schema definitions that waste time when doing proof-of-concepts etc. - being friendly to variable and hierarchical data: no more entity-attribute-value patterns and necessity to store JSON etc. as BLOBs - integration of scripting languages such as JavaScript, so you can have one language for the full stack if you want - embracing web standards (HTTP, JSON) - no object-relational mismatch (there are no relations), as you can more easily map a single programming language object to a document

Relational databases partly offer solutions for this, too. But in a relational database, these things are (often clumsy) extensions and not well supported.

[+] programminggeek|12 years ago|reply
Sometimes it is just a lot easier and more query efficient to store a bunch of related data as just a hash or even with embedded hashes. Like, if you want to store a list of key/value pairs alongside a bunch of other data. Yes, you could do that via a bunch of tables and relations and joins and things, but conceptually if all that data can be seen as one self contained record, why spread it across a bunch of separate tables?

Sometimes a document collection is just a whole lot easier to reason about.

[+] mrinterweb|12 years ago|reply
I'd say that one of the biggest thing that many of these NoSQL data stores brings is auto-sharding and being able to query/map reduce a distributed data source. Relational dbs work pretty well so long as you can vertically scale your data and or don't need to query across databases. Horizontal scaling is the tricky part for relational dbs that NoSQL data stores market as their big selling point.
[+] neunhoef|12 years ago|reply
NOSQL databases in general (and multi-model databases like ArangoDB in particular) offer a greater flexibility in the choice of your data structures than traditional relational databases do. Furthermore, you can configure exactly the right compromise for your application between ACID and eventual consistency, and consistency/scalability.
[+] vicaya|12 years ago|reply
"Transactions in ArangoDB are atomic, consistent, isolated, and durable (ACID)." "Collections consist of memory-mapped datafiles...". "by default, ArangoDB uses the eventual way of synchronization...synchronizes data to disk in a background thread."

So it's not ACID by default and practically not usable with immedaite sync turned on (huge amount of seeks due to use of mmap), just like mongo.

[+] jsteemann|12 years ago|reply
As in many databases, ArangoDB allows some choices regarding durability. Immediate disk synchronisation is turned off by default in ArangoDB. Synchronisation is then performed by a background thread, which is frequently executing syncs. By the way, several other NoSQL databases have immediate synching turned off by default, e.g. CouchDB, MongoDB.

In ArangoDB you turn on immediate synchronisation on a per collection level, or use it for specific operations only. So it's up to you how you want to use it. This gives the database user a fine-grained choice.

I remember using some relational databases in the past where we turned immediate synchronisation off as well to get more throughput. So it's probably not fully uncommon to do it, but I understand the expectation of relational users that everything is fully durable by default.

Memory-mapped files don't have anything to do with ACID. It's just a detail of the internal organisation of buffers. You can have full durability with memory-mapped files. You just have to use msync instead of fsync/sync.

[+] nateberkopec|12 years ago|reply
> "In typical applications with "complex" database operations there is often no clean API to the persistence layer when to or more database operations are executed one after each other which belong together from an architectural perspective."

Is that even a real sentence?

[+] sedlich|12 years ago|reply
> Put it differently, what does ArangoDB, MongoDB, whateverDB bring that relational > databases didn't bring 30 years ago? (Let's leave MongoDB out here ;-) What I really love and what the relationals do not have are:

* Graphs as first class citizens! (try to view them in the web gui :-) * The tight V8 & JavaScript integration (FOXX is more then cool. Hope I will be able to use it from Clojure Script)

What you might find in earlier databases but not completely in others today is (my personal hitlist :-) : * The increadible amount of indices with even skip and n-gram! * MultiCore ready * Durability tuning (already mentioned by Jan) * AQL covering KV, JSON and Graphs! (Martin Fowler was quite sceptical that this model integration could work...) * And a MVCC that makes it SSD ready. * Capped Collections * Availablity on tons of OS versions as Windows, iOS, all UNIXes and even Travis-CI (how cool is that?!)

Try it. Might be fun in production compared to other famed NoSQL DBs.... (at least to me)

[+] mikro2nd|12 years ago|reply
> There are driver for all major language like Ruby, Python, PHP, JavaScript, and Perl.

I chuckled at the absence of the most widely deployed language on the planet.

I dare say that there is a driver for Java - didn't look, because after browsing through a reasonable portion of their site, I still couldn't get a simple explanation of what this DB allegedly does and doesn't do.

[+] RyanZAG|12 years ago|reply
There is a Java driver and even an object mapper based off jackson - https://www.arangodb.org/drivers

This seems to be a mongodb clone with some extra features added on to make it a bit closer to a relational db, I guess. Looks interesting but likely suffers from the same problems MongoDB suffers from (data safety, scaling difficulties, etc)

EDIT: Have to say, the idea of a mongodb database with graph operations built in is pretty attractive for small network oriented problems...

[+] mk3|12 years ago|reply
I have tried to run ArangoDB under Node.js had some little successes, but in general their claim that they have drivers for many platforms is far from reality. Also why not to release node.js binary drivers instead of pushing people to use foxx. Also simple browsable docs would be nice instead of chunked documents covering hell knows what. Finding link to their query language was a big hassle :) Also their graph traversing is still in infancy as I understand.
[+] poseid|12 years ago|reply
I agree somewhat with the documentation part. The getting started is nice, but the gap between reading about concepts and the actual references, how to use them is a bit large at times. Best would be a reference based documentation, e.g. as in http://underscorejs.org/
[+] klaustopher|12 years ago|reply
There is a node js driver and even an integration into JugglingDB. See https://www.arangodb.org/drivers. And in the end, it's all bascially HTTP calls that you are making to query the database.
[+] coolsunglasses|12 years ago|reply
I like that they're sufficiently ignorant of MongoDB's implementation to misattribute the primary cause of excessive space usage.

Gives me confidence trust them with my data.

[+] neunhoef|12 years ago|reply
The big advantage of ArangoDB with respect to memory/disk usage is that despite the Schema-less-ness, the database automatically recognises common "shapes" of the documents in a collection and thus usually does not have to store all attribute names many times. In addition, the possibility of transactions makes it less necessary to keep many old revisions of documents, in comparison to for example MongoDB.
[+] neunhoef|12 years ago|reply
What do you think is the primary cause of excessive space usage in MongoDB?
[+] eonil|12 years ago|reply
> In ArangoDB, a transaction is always a server-side operation, and is executed on the server in one go, without any client interaction.

It doesn't seem to support interactive transaction. That means only simple batch read & write, no complex transaction. It seems in between CAS and real generic transaction. Doesn't seem to be much useful.

[+] don71|12 years ago|reply
No - it goes way beyond simple batches of operations. Basically you have to write your transaction as JavaScript program. So, you can do anything you could do on the client-side - with the exception of waiting for another source (i. e. user interaction). You could read a document from one collection, chose different actions based on the attribute. Change documents in multiple collections. I think the PHP driver uses some kind of abstraction to hide the JavaScript from the developer.
[+] sgarg26|12 years ago|reply
This will be even cooler if they add 'turn-key' scaling. Their scaling approach is still a work in progress.

http://www.arangodb.org/2013/05/22/replication-and-sharding-...

Anyhow, good job so far to ArangoDB team.

[+] neunhoef|12 years ago|reply
Thanks, and: you are right, scaling by sharding is important, and that is why we have made this our top priority for the coming three months.
[+] obruehl|12 years ago|reply
What I particularly like is the functionality to process graphs and explore them interactively in the browser. This has been added in some recent version, and it makes working with graphs a lot easier than before.
[+] poseid|12 years ago|reply
What I like on ArangoDB is the speed of development, as well as its native support for building RESTful interfaces.

Last but not least, it is open-source!

[+] obruehl|12 years ago|reply
Building a REST interface with access to your data is easy thanks to ArangoDB's Foxx framework. You can implement all your backend code in JavaScript and upload to the server. Thus you can do any sort of preprocessing on the server and make that available to frontends. And it's easy to integrate with a front-end because it's all about passing JSON around via HTTP.
[+] etanazir|12 years ago|reply
And the tree v. table debate continues...
[+] aerolite|12 years ago|reply
Is this named after Juan Arango?
[+] whereismypw|12 years ago|reply
Arango is a special sort of avocado - but Moenchengladbach (where Juan ARango currently is under contract) is next to Cologne, ArangoDB's head quarter :-)
[+] jsteemann|12 years ago|reply
Oh, he's so good...! And I say that coming from Cologne. You need to know the local club really dislikes Arango's current club (based 50 km away from here).
[+] jsteemann|12 years ago|reply
the stuff was initially named AvocadoDB, but we have to rename last year because someone else claimed the same name (for whatever reason)