An open-source database with a flexible data model for documents, graphs, and key-values. Build high performance applications using a convenient sql-like query language or JavaScript extensions.
> As a relational database user we are used to treat the database as stupid and mainly use it to save and retrieve data. ArangoDB lets you extend the database using Javascript (production ready) and Mruby (experimental).
?!? A common complaint against relational database people are having "too much" logic in the database. (I clearly don't agree, using store procedures and custom extensions ;P.)
Personally I loathe stored procedures because often they include a lot of logic that shouldn't be on the database and they also generally involve SQL extenstions that are pretty terrible for the general-purpose computing you see them used for.
But if that layer is primarily used for providing, controlling, and optimizing access to the data I can see the appeal. And in that case? Being able to write the procedural parts in a language that's less-terrible than typical SQL extensions would be really nice.
Yet, if you have to rollout a database server and an application server, it can be quite some overhead for certain kind of applications. I think this data-layer on top of the data store is especially interesting for e.g. backends in a network world, where your data is distributed, or e.g. when you need to aggregate data from multiple sources.
"Wouldn't it be cool to have a multipurpose database which we would be able to query with a language, but not SQL, because SQL sucks, for some reason.".
Put it differently, what does ArangoDB, MongoDB, whateverDB bring that relational databases didn't bring 30 years ago?
Some things just aren't relationally shaped. You can model them relationally, but it can be a pain.
For instance, graphs are totally doable with a traditional rdbms, however it is painful. You end up joining a table against itself (or via an edges table) multiple times, or alternately bouncing many queries off the table as you iterate the graph. One common type of NoSQL db is the graph database that is designed with graphs in mind and you don't even have to think about this access. It is nice.
Another case that you can do with traditional RDBMS but is annoying is loose user defined fields, such as "tags" where you have to create a tags table and a join table to make it work, with a lot of potential inefficiency there (even with indexes). Or even worse, when you have user defined attributes - lots of custom table creation per user, or big joins against a star topology to do it properly. (Or if not, you end up with something that looks like an sql database to a bunch of frontend code).
Of course other times you'll find yourself doing something in a NoSQL database that is effectively doing a bunch lookups against a table then combining a result (at which point switching to RDBMS is the solution)...
I guess what I'm saying is that lately the data store is being looked at as a component more like a library than a subsystem. I'm not sure if this is good or bad, but it certainly has helped a few cases I've dealt with nicely - ripping out a horribly complex data layer and replacing it with a NoSQL solution where the data model is shaped like my data.
Basically it's a matter of the right tool for the job.
To name just a few:
- being relaxed about schemas: no more long-running ALTER TABLE commands, no more up-front schema definitions that waste time when doing proof-of-concepts etc.
- being friendly to variable and hierarchical data: no more entity-attribute-value patterns and necessity to store JSON etc. as BLOBs
- integration of scripting languages such as JavaScript, so you can have one language for the full stack if you want
- embracing web standards (HTTP, JSON)
- no object-relational mismatch (there are no relations), as you can more easily map a single programming language object to a document
Relational databases partly offer solutions for this, too. But in a relational database, these things are (often clumsy) extensions and not well supported.
Sometimes it is just a lot easier and more query efficient to store a bunch of related data as just a hash or even with embedded hashes. Like, if you want to store a list of key/value pairs alongside a bunch of other data. Yes, you could do that via a bunch of tables and relations and joins and things, but conceptually if all that data can be seen as one self contained record, why spread it across a bunch of separate tables?
Sometimes a document collection is just a whole lot easier to reason about.
I'd say that one of the biggest thing that many of these NoSQL data stores brings is auto-sharding and being able to query/map reduce a distributed data source. Relational dbs work pretty well so long as you can vertically scale your data and or don't need to query across databases. Horizontal scaling is the tricky part for relational dbs that NoSQL data stores market as their big selling point.
NOSQL databases in general (and multi-model databases like ArangoDB in particular) offer a greater flexibility in the choice of your data structures than traditional relational
databases do. Furthermore, you can configure exactly the right compromise for your application between ACID and eventual consistency, and consistency/scalability.
"Transactions in ArangoDB are atomic, consistent, isolated, and durable (ACID)." "Collections consist of memory-mapped datafiles...". "by default, ArangoDB uses the eventual way of synchronization...synchronizes data to disk in a background thread."
So it's not ACID by default and practically not usable with immedaite sync turned on (huge amount of seeks due to use of mmap), just like mongo.
As in many databases, ArangoDB allows some choices regarding durability.
Immediate disk synchronisation is turned off by default in ArangoDB. Synchronisation is then performed by a background thread, which is frequently executing syncs.
By the way, several other NoSQL databases have immediate synching turned off by default, e.g. CouchDB, MongoDB.
In ArangoDB you turn on immediate synchronisation on a per collection level, or use it for specific operations only. So it's up to you how you want to use it.
This gives the database user a fine-grained choice.
I remember using some relational databases in the past where we turned immediate synchronisation off as well to get more throughput. So it's probably not fully uncommon to do it, but I understand the expectation of relational users that everything is fully durable by default.
Memory-mapped files don't have anything to do with ACID. It's just a detail of the internal organisation of buffers. You can have full durability with memory-mapped files. You just have to use msync instead of fsync/sync.
> "In typical applications with "complex" database operations there is often no clean API to the persistence layer when to or more database operations are executed one after each other which belong together from an architectural perspective."
> Put it differently, what does ArangoDB, MongoDB, whateverDB bring that relational
> databases didn't bring 30 years ago?
(Let's leave MongoDB out here ;-)
What I really love and what the relationals do not have are:
* Graphs as first class citizens! (try to view them in the web gui :-)
* The tight V8 & JavaScript integration (FOXX is more then cool. Hope I will be able to use it from Clojure Script)
What you might find in earlier databases but not completely in
others today is (my personal hitlist :-) :
* The increadible amount of indices with even skip and n-gram!
* MultiCore ready
* Durability tuning (already mentioned by Jan)
* AQL covering KV, JSON and Graphs! (Martin Fowler was quite sceptical that this model integration could work...)
* And a MVCC that makes it SSD ready.
* Capped Collections
* Availablity on tons of OS versions as Windows, iOS, all UNIXes and even Travis-CI (how cool is that?!)
Try it. Might be fun in production compared to other famed NoSQL DBs.... (at least to me)
> There are driver for all major language like Ruby, Python, PHP, JavaScript, and Perl.
I chuckled at the absence of the most widely deployed language on the planet.
I dare say that there is a driver for Java - didn't look, because after browsing through a reasonable portion of their site, I still couldn't get a simple explanation of what this DB allegedly does and doesn't do.
This seems to be a mongodb clone with some extra features added on to make it a bit closer to a relational db, I guess. Looks interesting but likely suffers from the same problems MongoDB suffers from (data safety, scaling difficulties, etc)
EDIT: Have to say, the idea of a mongodb database with graph operations built in is pretty attractive for small network oriented problems...
I have tried to run ArangoDB under Node.js had some little successes, but in general their claim that they have drivers for many platforms is far from reality. Also why not to release node.js binary drivers instead of pushing people to use foxx. Also simple browsable docs would be nice instead of chunked documents covering hell knows what. Finding link to their query language was a big hassle :) Also their graph traversing is still in infancy as I understand.
I agree somewhat with the documentation part. The getting started is nice, but the gap between reading about concepts and the actual references, how to use them is a bit large at times. Best would be a reference based documentation, e.g. as in http://underscorejs.org/
There is a node js driver and even an integration into JugglingDB. See https://www.arangodb.org/drivers. And in the end, it's all bascially HTTP calls that you are making to query the database.
The big advantage of ArangoDB with respect to memory/disk
usage is that despite the Schema-less-ness, the database
automatically recognises common "shapes" of the documents
in a collection and thus usually does not have to store all
attribute names many times. In addition, the possibility
of transactions makes it less necessary to keep many old
revisions of documents, in comparison to for example
MongoDB.
> In ArangoDB, a transaction is always a server-side operation, and is executed on the server in one go, without any client interaction.
It doesn't seem to support interactive transaction. That means only simple batch read & write, no complex transaction. It seems in between CAS and real generic transaction. Doesn't seem to be much useful.
No - it goes way beyond simple batches of operations. Basically you have to write your transaction as JavaScript program. So, you can do anything you could do on the client-side - with the exception of waiting for another source (i. e. user interaction). You could read a document from one collection, chose different actions based on the attribute. Change documents in multiple collections. I think the PHP driver uses some kind of abstraction to hide the JavaScript from the developer.
I'm really excited about the look of this. Being a big fan of Mongo et al. for the structurelessness I also sometimes miss the graph-like structure that you can easily create with SQL. Arango looks cool. I shall try it :)
What I particularly like is the functionality to process graphs and explore them interactively in the browser. This has been added in some recent version, and it makes working with graphs a lot easier than before.
Building a REST interface with access to your data is easy thanks to ArangoDB's Foxx framework. You can implement all your backend code in JavaScript and upload to the server.
Thus you can do any sort of preprocessing on the server and make that available to frontends. And it's easy to integrate with a front-end because it's all about passing JSON around via HTTP.
Arango is a special sort of avocado - but Moenchengladbach (where Juan ARango currently is under contract) is next to Cologne, ArangoDB's head quarter :-)
Oh, he's so good...!
And I say that coming from Cologne. You need to know the local club really dislikes Arango's current club (based 50 km away from here).
[+] [-] saurik|12 years ago|reply
?!? A common complaint against relational database people are having "too much" logic in the database. (I clearly don't agree, using store procedures and custom extensions ;P.)
[+] [-] Pxtl|12 years ago|reply
But if that layer is primarily used for providing, controlling, and optimizing access to the data I can see the appeal. And in that case? Being able to write the procedural parts in a language that's less-terrible than typical SQL extensions would be really nice.
[+] [-] poseid|12 years ago|reply
Or, what use case is your comment about?
[+] [-] shin_lao|12 years ago|reply
"Wouldn't it be cool to have a multipurpose database which we would be able to query with a language, but not SQL, because SQL sucks, for some reason.".
Put it differently, what does ArangoDB, MongoDB, whateverDB bring that relational databases didn't bring 30 years ago?
[+] [-] sophacles|12 years ago|reply
For instance, graphs are totally doable with a traditional rdbms, however it is painful. You end up joining a table against itself (or via an edges table) multiple times, or alternately bouncing many queries off the table as you iterate the graph. One common type of NoSQL db is the graph database that is designed with graphs in mind and you don't even have to think about this access. It is nice.
Another case that you can do with traditional RDBMS but is annoying is loose user defined fields, such as "tags" where you have to create a tags table and a join table to make it work, with a lot of potential inefficiency there (even with indexes). Or even worse, when you have user defined attributes - lots of custom table creation per user, or big joins against a star topology to do it properly. (Or if not, you end up with something that looks like an sql database to a bunch of frontend code).
Of course other times you'll find yourself doing something in a NoSQL database that is effectively doing a bunch lookups against a table then combining a result (at which point switching to RDBMS is the solution)...
I guess what I'm saying is that lately the data store is being looked at as a component more like a library than a subsystem. I'm not sure if this is good or bad, but it certainly has helped a few cases I've dealt with nicely - ripping out a horribly complex data layer and replacing it with a NoSQL solution where the data model is shaped like my data.
Basically it's a matter of the right tool for the job.
[+] [-] jsteemann|12 years ago|reply
Relational databases partly offer solutions for this, too. But in a relational database, these things are (often clumsy) extensions and not well supported.
[+] [-] programminggeek|12 years ago|reply
Sometimes a document collection is just a whole lot easier to reason about.
[+] [-] mrinterweb|12 years ago|reply
[+] [-] neunhoef|12 years ago|reply
[+] [-] vicaya|12 years ago|reply
So it's not ACID by default and practically not usable with immedaite sync turned on (huge amount of seeks due to use of mmap), just like mongo.
[+] [-] jsteemann|12 years ago|reply
In ArangoDB you turn on immediate synchronisation on a per collection level, or use it for specific operations only. So it's up to you how you want to use it. This gives the database user a fine-grained choice.
I remember using some relational databases in the past where we turned immediate synchronisation off as well to get more throughput. So it's probably not fully uncommon to do it, but I understand the expectation of relational users that everything is fully durable by default.
Memory-mapped files don't have anything to do with ACID. It's just a detail of the internal organisation of buffers. You can have full durability with memory-mapped files. You just have to use msync instead of fsync/sync.
[+] [-] nateberkopec|12 years ago|reply
Is that even a real sentence?
[+] [-] unknown|12 years ago|reply
[deleted]
[+] [-] sedlich|12 years ago|reply
* Graphs as first class citizens! (try to view them in the web gui :-) * The tight V8 & JavaScript integration (FOXX is more then cool. Hope I will be able to use it from Clojure Script)
What you might find in earlier databases but not completely in others today is (my personal hitlist :-) : * The increadible amount of indices with even skip and n-gram! * MultiCore ready * Durability tuning (already mentioned by Jan) * AQL covering KV, JSON and Graphs! (Martin Fowler was quite sceptical that this model integration could work...) * And a MVCC that makes it SSD ready. * Capped Collections * Availablity on tons of OS versions as Windows, iOS, all UNIXes and even Travis-CI (how cool is that?!)
Try it. Might be fun in production compared to other famed NoSQL DBs.... (at least to me)
[+] [-] mikro2nd|12 years ago|reply
I chuckled at the absence of the most widely deployed language on the planet.
I dare say that there is a driver for Java - didn't look, because after browsing through a reasonable portion of their site, I still couldn't get a simple explanation of what this DB allegedly does and doesn't do.
[+] [-] RyanZAG|12 years ago|reply
This seems to be a mongodb clone with some extra features added on to make it a bit closer to a relational db, I guess. Looks interesting but likely suffers from the same problems MongoDB suffers from (data safety, scaling difficulties, etc)
EDIT: Have to say, the idea of a mongodb database with graph operations built in is pretty attractive for small network oriented problems...
[+] [-] mk3|12 years ago|reply
[+] [-] poseid|12 years ago|reply
[+] [-] neunhoef|12 years ago|reply
[+] [-] klaustopher|12 years ago|reply
[+] [-] coolsunglasses|12 years ago|reply
Gives me confidence trust them with my data.
[+] [-] neunhoef|12 years ago|reply
[+] [-] neunhoef|12 years ago|reply
[+] [-] eonil|12 years ago|reply
It doesn't seem to support interactive transaction. That means only simple batch read & write, no complex transaction. It seems in between CAS and real generic transaction. Doesn't seem to be much useful.
[+] [-] don71|12 years ago|reply
[+] [-] sgarg26|12 years ago|reply
http://www.arangodb.org/2013/05/22/replication-and-sharding-...
Anyhow, good job so far to ArangoDB team.
[+] [-] neunhoef|12 years ago|reply
[+] [-] poseid|12 years ago|reply
[+] [-] basicallydan|12 years ago|reply
[+] [-] bjerun|12 years ago|reply
[+] [-] obruehl|12 years ago|reply
[+] [-] tjbiddle|12 years ago|reply
Looking forward to learning more when it's online!
[+] [-] bjerun|12 years ago|reply
[+] [-] poseid|12 years ago|reply
[+] [-] poseid|12 years ago|reply
Last but not least, it is open-source!
[+] [-] obruehl|12 years ago|reply
[+] [-] etanazir|12 years ago|reply
[+] [-] saintfiends|12 years ago|reply
[+] [-] poseid|12 years ago|reply
[+] [-] poseid|12 years ago|reply
[+] [-] aerolite|12 years ago|reply
[+] [-] whereismypw|12 years ago|reply
[+] [-] jsteemann|12 years ago|reply
[+] [-] jsteemann|12 years ago|reply