Datomic's is perfect for probably 90% of small-ish backoffice systems that never has to be web scale (i.e. most of what I do at work).
Writing in a single thread removes a whole host of problems in understanding (and implementing) how data changes over time. (And a busy MVCC sql db spends 75% of its time doing coordination, not actual writes, so a single thread applying a queue of transactions in sequence can be faster than your gut feeling might tell you.)
Transactions as first-class entities of the system means you can easily add meta-data for every change in the system that explains who and why the change happened, so you'll never again have to wonder "hmm, why does that column have that value, and how did it happen". Once you get used to this, doing UPDATE in SQL feels pretty weird, as the default mode of operation of your _business data_ is to delete data, with no trace of who and why!
Having the value of the entire database at a point in time available to your business logic as a (lazy) immutable value you can run queries on opens up completely new ways of writing code, and lets your database follow "functional core, imperative shell". Someone needs to have the working set of your database in memory, why shouldn't it be your app server and business logic?
Looking forward to see what this does for the adoption of Datomic!
> Someone needs to have the working set of your database in memory, why shouldn't it be your app server and business logic?
This one confused me. The obvious reason why you don't want the whole working set of your database in the app server's memory is because you have lots of app servers, whereas you only have one database[1]. This suggests that you put the working set of the database in the database, so that you still only need the one copy, not in the app servers where you'd need N copies of it.
The rest of your post makes sense to me but the thing about keeping the database's working set in your app server's memory does not. That's something we specifically work to avoid.
[1] Still talking about "non-webscale" office usage here, that's the world I live in as well. One big central database server, lots of apps and app servers strewn about.
> Datomic's is perfect for probably 90% of small-ish backoffice systems that never has to be web scale (i.e. most of what I do at work).
I don’t think I agree with this as stated. It is too squishy and subjective to say “perfect”.
More broadly, the above is not and should not be a cognitive “anchor point” for reasonable use cases for Datomic. Making that kind of claim requires a lot more analysis and persuasion.
Datomic always seemed like a really cool thing to use. However, I'm not familiar with Clojure or any other JVM based language, nor do I have the time to learn it. And I can't find any supported way to use it with other languages (I'm not even talking about popular frameworks), or am I missing something?
It doesn't feel like the people behind Datomic actually want to have users outside of the Clojure world, which will be rather limiting to adoption.
Something I've been curious about: how well (or badly) would it scale to do something similar on a normal relational DB (say, Postgres)?
You could have one or more append-only tables that store events/transactions/whatever you want to call them, and then materialized-views (or whatever) which gather that history into a "current state" of "entities", as needed
If eventual-consistency is acceptable, it seems like you could aggressively cache and/or distribute reads. Maybe you could even do clever stuff like recomputing state only from the last event you had, instead of from scratch every time
One thing which is quite hard to do in Datomic is simple pagination on a large sorted dataset, as one can easily do with LIMIT/OFFSET in MySQL for example. There are solutions for some of the cases, but general case is not solved, as far as I remember (it’s been a while I used it extensively)
> doing UPDATE in SQL feels pretty weird, as the default mode of operation of your _business data_ is to delete data, with no trace of who and why!
It's a good idea to version your schema changes using something like liquibase into git, that gets rid of at least some of those pains. Liquibase works on a wide variety of databases, even graphs like Neo4j
I got the same feeling in Erlang many times, once write operations start getting parallel you worry about atomic operations, and making an Erlang process centralize writes through its message queue always feels natural and easy to reason about.
I guess NuBank (Cognitect's owners) have concluded that the paid licensing business wasn't worth the hassle compared to having the developer time involved spent on other things.
Releasing only binaries, while I understand people being grumpy about it, seems like an interesting way of keeping their options open going forwards. Since it was always closed source, it now being 'closed source but free' is still a net win.
The Datomic/Cognitect/NuBank relationship is an interesting symbiotic dynamic and while I'm sure we can all think of ways it might go horribly wrong in future I rather hope it doesn't.
Based on experience with Prolog, I always thought using Datalog in a database like Datomic would mean being able to model your data model using stored queries as a very expressive way of creating "classes". And that by modeling your data model using nested such queries, you alleviate the need for an ORM, and all the boilerplate and duplication of defining classes both in SQL and as objects in OO code ... since you already modelled your data model in the database.
Datomic Cloud is slow, expensive, resource intensive, designed in the baroque style of massively over-complicated CloudFormation astronautics. Hard to diagnose performance issues. Impossible to backup. Ran into one scenario where apparently we weren't quick enough to migrate to the latest version, AWS had dropped support for $runtime in Lambda, and it became impossible to upgrade the CloudFormation template. Had to write application code to export/reimport prod data from cluster to another—there was no other migration path (and yes, we were talking to their enterprise support).
We migrated to Postgres and are now using a 10th of the compute resources. Our p99 response times went from 1.3-1.5s to under 300ms once all the read traffic was cut over.
As someone who is using Datomic Pro in production for many years now I must agree with you. One time I began a project with Datomic Cloud and it was a disaster similar to what you described. I learned a lot about AWS, but after about half a year we switched to Datomic Pro.
There were some cool ideas in Datomic Cloud, like IONs and its integrated deployment CLI. But the dev workflow with Datomic Pro in the REPL, potentially connected to your live or staging database is much more interactive and fun than waiting for CodeDeploy.
I guess there is a reason Datomic Pro is the featured product on datomic.com again. It appears that Cognitect took a big bet with Datomic Cloud and it didn't take off. Soon after the NuBank acquisition happened. That being said, Datomic Cloud was not a bad idea, it just turned out that Datomic Pro/onPrem is much easier to use. Also of all their APIs, the "Peer API" of Pro is just the best IME, especially with `d/entity` vs. "pull" etc.
I don't doubt your story of course, and I love Postgres, but comparing apples to oranges no?
Datomic's killer feature is time travel.
Did you simply not use that feature once you moved off Datomic (and if so why'd you pick Datomic in the first place)? Or are you using Postgres using some extension to add in?
Are they _forcing_ you to use CloudFormation? Or is it just the officially supported mechanism?
> Mother Postgres can do no wrong.
I'll say that Postgres is usually the answer for the vast majority of use-cases. Even when you think you need something else to do something different, it's probably still a good enough solution. I've seen teams pitching other system just because they wanted to push a bunch of JSON. Guess what, PG can handle that fine and even run SQL queries against that. PG can access other database systems with its foreign data wrappers(https://wiki.postgresql.org/wiki/Foreign_data_wrappers).
The main difficulty is that horizontally scaling it is not trivial(although not impossible, and that can be improved with third party companies).
> Datomic Cloud is slow, expensive, resource intensive, designed in the baroque style of massively over-complicated CloudFormation astronautics. Hard to diagnose performance issues. Impossible to backup.
You should give TerminusDB a go (https://terminusdb.com/), it's really OSS, the cloud version is cheap, fast, there are not tons of baroque settings, and it's easy to backup using clone.
TermiusDB is a graph database with a git-like model with push/pull/clone semantics as well as a datalog.
They say it's under the Apache 2 licence, so it is open source.
EDIT: I was wrong. They actually released binaries under the Apache licence, not the source code. Which is, mildly said, deceptive. I don't even have an idea what that actually means.
Someone (forget who but he worked there) was giving a presentation of Datomics in some downtown (NYC) bank circa 2014 iirc. Per the presenter -- iirc someone asked a specific technical question -- even people working for the company don't get to see the full source. Only a small team has access to the full source, and he said he wasn't one of them.
Datomic is an event-sourced db, and it makes it hard to introduce retroactive corrections to the data when your program's semantic already rely on using datomic's time travelling abilities: at one point you'll need to to distinguish between event time and recording time as explained in this excellent blog post:
Datomic is an operational database management system - designed for transactional, domain-specific data. It is not designed to be a data warehouse, nor a high-churn high-throughput system (such as a time-series database or log store).
It is a good fit for systems that store valuable information of record, require developer and operational flexibility, need history and audit capabilities, and require read scalability.
> Datomic binaries are provided under the Apache 2 license which grants all the same rights to a work delivered in object form.
So... no?
(I say that, but "Datomic binaries" presumably refers to compiled JVM class files; and JVM bytecode is notoriously easy to decompile back to legible source code, with almost all identifiers intact. Would Apache-licensing a binary, imply that you have the right to decompile it, publish an Apache-licensed source-code repo of said decompilation, and then run your own FOSS project off of that?)
Every single day I wish the architects at my current job had chosen Datomic instead of Postgresql.
It would have saved us so so much time and trouble.
The time traveling ability alone would have been so useful so many times.
Also the ability to annotate transactions is awesome.
This doesn’t quite reflect the history. Datomic had various free/trial options. They evolved a little bit. Someone who watched the pricing and licenses very closely probably could do a better timeline than I could.
Not complaining about the actual announcement itself here: seems pretty sweet all things considered, But: the "Is it Open Source?" section should lead with "No." It's not a complicated question, and it's not a complicated answer. I think it's weird to talk about having "all the same rights" without explaining why that matters particularly (it does matter, it's just not explained much!) but it is somewhat tangential to the question being posed which has a very clear and straightforward answer.
I hope more companies consider this unusual arrangement at least as an alternative to other approaches. Permissively licensed binaries can come in handy, though it certainly comes with it's risks. For example, Microsoft released the binaries for its WebView2 SDK under the BSD license; this is nice of course, but the side-effect is that we can (and did) reverse engineer the loader binary back to source code. I suspect that's unlikely to happen for any substantially large commercial product, and I am not a lawyer so I can't be sure this isn't still legally dubious, but it's still worth considering: the protections of a EULA are completely gone here, if you just distribute binaries under a vanilla permissive open source license.
> Datomic binaries are provided under the Apache 2 license which grants all the same rights to a work delivered in object form.
That doesn't answer the question at all. I assume the answer is no, because otherwise they would just say yes, and have a link to the source code somewhere. But that is such a weird, and possibly duplicitous way to answer.
I really like Clojure and the ideas behind Datomic but free without source is a trap, every time. They have to make money somehow, but they already sold to a bank. If that bank wants devs willing to work on their systems after the current generation moves on, I think they'd be better off going open source and to continue paying good devs to work on it. Everyone already knows lock-in is bad for businesses. Devs will seek non-proprietary solutions first, if they can't find it, there are already plenty of proven proprietary solutions they'll settle on way before Datomic. Open the source, sell the support.
Since the conversation seems to be focusing on the Apache 2.0 license, what would you do? Clearly there isn't a lot of precedent for "closed-source, free-to-use" licenses.
In this case Datomic maintains development control over their product and "source of truth" is still themselves, and the implicit assumption is that you enthusiastically use their product for free with no strings attached because you respect them as the source of truth.
My personal experience was using Datomic backed by DynamoDB, at the second Clojure company I worked at. In particular I remember feeling like it was hard to anticipate and understand its performance characteristics in particular, and how indices can be leveraged effectively. Maybe if we had chosen Postgres as a backing store that would have been better? I dunno.
Using it was pretty nice at the scale of a small startup with a motivated team, but scaling it up organizationally-speaking was a challenge due to Datalog's relative idiosyncrasy and poor tooling around the database itself. This was compounded by the parallel challenge of keeping a Clojure codebase from going spaghetti-shaped, which happens in that language when teams scale without a lot of "convention and discipline"--it may be easier to manage otherwise. All of that said, this was years ago so maybe things have changed.
At this point I'd choose either PostgreSQL or SQLite for any project I'm getting started with, as they are both rock-solid, full-featured projects with great tooling and widespread adoption. If things need to scale a basic PostgreSQL setup can usually handle a lot until you need to move to e.g. RDS or whatever, and I'm probably biased but I think SQL is not really that much worse than Datalog for common use-cases. Datalog is nice though, don't get me wrong.
EDIT: one point I forgot to make: the killer feature of being an immutable data store that lets you go back in time is in fact super cool, and it's probably exactly what some organizations need, but it is also costly, and I suspect the number of organizations who really need that functionality is pretty small. The place I was at certainly didn't, which is probably part of the reason for the friction I experienced.
https://sayartii.com/ is using Datomic stored on postgres that I have set up on Linode. That was all done back in 2020 and haven't needed to touch it. Site now gets ~180M monthly reqs and I store an enormous amount of analytic data on Datomic (was supposed to be temporary) so users can see impressions/clicks per day for each advertisement. I'm surprised it's still working.
Development experience is extremely nice using clojure. I've used it for two other projects and has been very reliable. My latest project didn't really need any of its features compared to a traditional rdbms but I opted for it anyways so I don't have to write sql.
Congratulations to Rich Hickey's children!! I hope your college experience was excellent. Disclaimer: that is how Rich explained why Datomic stayed closed source.
[+] [-] augustl|2 years ago|reply
Writing in a single thread removes a whole host of problems in understanding (and implementing) how data changes over time. (And a busy MVCC sql db spends 75% of its time doing coordination, not actual writes, so a single thread applying a queue of transactions in sequence can be faster than your gut feeling might tell you.)
Transactions as first-class entities of the system means you can easily add meta-data for every change in the system that explains who and why the change happened, so you'll never again have to wonder "hmm, why does that column have that value, and how did it happen". Once you get used to this, doing UPDATE in SQL feels pretty weird, as the default mode of operation of your _business data_ is to delete data, with no trace of who and why!
Having the value of the entire database at a point in time available to your business logic as a (lazy) immutable value you can run queries on opens up completely new ways of writing code, and lets your database follow "functional core, imperative shell". Someone needs to have the working set of your database in memory, why shouldn't it be your app server and business logic?
Looking forward to see what this does for the adoption of Datomic!
[+] [-] electroly|2 years ago|reply
This one confused me. The obvious reason why you don't want the whole working set of your database in the app server's memory is because you have lots of app servers, whereas you only have one database[1]. This suggests that you put the working set of the database in the database, so that you still only need the one copy, not in the app servers where you'd need N copies of it.
The rest of your post makes sense to me but the thing about keeping the database's working set in your app server's memory does not. That's something we specifically work to avoid.
[1] Still talking about "non-webscale" office usage here, that's the world I live in as well. One big central database server, lots of apps and app servers strewn about.
[+] [-] epolanski|2 years ago|reply
So is any cloud-managed db offering and at that scale we talking very small costs anyway.
Why datomic instead?
[+] [-] xpe|2 years ago|reply
I don’t think I agree with this as stated. It is too squishy and subjective to say “perfect”.
More broadly, the above is not and should not be a cognitive “anchor point” for reasonable use cases for Datomic. Making that kind of claim requires a lot more analysis and persuasion.
[+] [-] fulafel|2 years ago|reply
This is Ions in the Cloud version, or for on-prem version the in-process peer library.
[+] [-] Lutger|2 years ago|reply
It doesn't feel like the people behind Datomic actually want to have users outside of the Clojure world, which will be rather limiting to adoption.
[+] [-] brundolf|2 years ago|reply
You could have one or more append-only tables that store events/transactions/whatever you want to call them, and then materialized-views (or whatever) which gather that history into a "current state" of "entities", as needed
If eventual-consistency is acceptable, it seems like you could aggressively cache and/or distribute reads. Maybe you could even do clever stuff like recomputing state only from the last event you had, instead of from scratch every time
How bad of an idea is this?
[+] [-] Scarbutt|2 years ago|reply
How do they scale it for Nubank? (millions of users)
[+] [-] spariev|2 years ago|reply
[+] [-] pachico|2 years ago|reply
[+] [-] JimmyRuska|2 years ago|reply
It's a good idea to version your schema changes using something like liquibase into git, that gets rid of at least some of those pains. Liquibase works on a wide variety of databases, even graphs like Neo4j
I got the same feeling in Erlang many times, once write operations start getting parallel you worry about atomic operations, and making an Erlang process centralize writes through its message queue always feels natural and easy to reason about.
[+] [-] mst|2 years ago|reply
Releasing only binaries, while I understand people being grumpy about it, seems like an interesting way of keeping their options open going forwards. Since it was always closed source, it now being 'closed source but free' is still a net win.
The Datomic/Cognitect/NuBank relationship is an interesting symbiotic dynamic and while I'm sure we can all think of ways it might go horribly wrong in future I rather hope it doesn't.
[+] [-] motoboi|2 years ago|reply
Open sourcing the database helps on that.
[+] [-] samuell|2 years ago|reply
Based on experience with Prolog, I always thought using Datalog in a database like Datomic would mean being able to model your data model using stored queries as a very expressive way of creating "classes". And that by modeling your data model using nested such queries, you alleviate the need for an ORM, and all the boilerplate and duplication of defining classes both in SQL and as objects in OO code ... since you already modelled your data model in the database.
Does Datomic live up to that vision?
[+] [-] bvanderveen|2 years ago|reply
Datomic Cloud is slow, expensive, resource intensive, designed in the baroque style of massively over-complicated CloudFormation astronautics. Hard to diagnose performance issues. Impossible to backup. Ran into one scenario where apparently we weren't quick enough to migrate to the latest version, AWS had dropped support for $runtime in Lambda, and it became impossible to upgrade the CloudFormation template. Had to write application code to export/reimport prod data from cluster to another—there was no other migration path (and yes, we were talking to their enterprise support).
We migrated to Postgres and are now using a 10th of the compute resources. Our p99 response times went from 1.3-1.5s to under 300ms once all the read traffic was cut over.
Mother Postgres can do no wrong.
Still, Datomic seems like a cool idea.
[+] [-] lgrapenthin|2 years ago|reply
There were some cool ideas in Datomic Cloud, like IONs and its integrated deployment CLI. But the dev workflow with Datomic Pro in the REPL, potentially connected to your live or staging database is much more interactive and fun than waiting for CodeDeploy. I guess there is a reason Datomic Pro is the featured product on datomic.com again. It appears that Cognitect took a big bet with Datomic Cloud and it didn't take off. Soon after the NuBank acquisition happened. That being said, Datomic Cloud was not a bad idea, it just turned out that Datomic Pro/onPrem is much easier to use. Also of all their APIs, the "Peer API" of Pro is just the best IME, especially with `d/entity` vs. "pull" etc.
[+] [-] JulianWasTaken|2 years ago|reply
Datomic's killer feature is time travel.
Did you simply not use that feature once you moved off Datomic (and if so why'd you pick Datomic in the first place)? Or are you using Postgres using some extension to add in?
[+] [-] outworlder|2 years ago|reply
> Mother Postgres can do no wrong.
I'll say that Postgres is usually the answer for the vast majority of use-cases. Even when you think you need something else to do something different, it's probably still a good enough solution. I've seen teams pitching other system just because they wanted to push a bunch of JSON. Guess what, PG can handle that fine and even run SQL queries against that. PG can access other database systems with its foreign data wrappers(https://wiki.postgresql.org/wiki/Foreign_data_wrappers).
The main difficulty is that horizontally scaling it is not trivial(although not impossible, and that can be improved with third party companies).
[+] [-] ggleason|2 years ago|reply
You should give TerminusDB a go (https://terminusdb.com/), it's really OSS, the cloud version is cheap, fast, there are not tons of baroque settings, and it's easy to backup using clone.
TermiusDB is a graph database with a git-like model with push/pull/clone semantics as well as a datalog.
[+] [-] ithrow|2 years ago|reply
[+] [-] panick21_|2 years ago|reply
[+] [-] avodonosov|2 years ago|reply
[+] [-] froggertoaster|2 years ago|reply
Simple, eloquent, damn true.
[+] [-] martypitt|2 years ago|reply
I guess they don't claim to be open source, they're claiming to be free, which is - in itself - awesome.
Last time I checked, you couldn't push binaries to maven central, without also releasing the source. That may have changed.
[+] [-] miroljub|2 years ago|reply
EDIT: I was wrong. They actually released binaries under the Apache licence, not the source code. Which is, mildly said, deceptive. I don't even have an idea what that actually means.
[+] [-] eternalban|2 years ago|reply
[+] [-] casion|2 years ago|reply
[+] [-] blatant303|2 years ago|reply
https://vvvvalvalval.github.io/posts/2018-11-12-datomic-even...
This is why I' rather use XTDB [1], a database similar to datomic in spirit, but with bitemporality baked in.
[1] https://www.xtdb.com
[+] [-] adamfeldman|2 years ago|reply
[+] [-] derefr|2 years ago|reply
> Datomic binaries are provided under the Apache 2 license which grants all the same rights to a work delivered in object form.
So... no?
(I say that, but "Datomic binaries" presumably refers to compiled JVM class files; and JVM bytecode is notoriously easy to decompile back to legible source code, with almost all identifiers intact. Would Apache-licensing a binary, imply that you have the right to decompile it, publish an Apache-licensed source-code repo of said decompilation, and then run your own FOSS project off of that?)
[+] [-] hombre_fatal|2 years ago|reply
I watched a lot of that and used Clojure fulltime for five years. Wonder what he's up to these days.
[+] [-] beders|2 years ago|reply
Also the ability to annotate transactions is awesome.
So many goodies.
Here's a good summary:
https://medium.com/@val.vvalval/what-datomic-brings-to-busin...
[+] [-] brianwawok|2 years ago|reply
I think they went way too fast to commercial, and needed to go a freemium model to actually get market share.
[+] [-] xpe|2 years ago|reply
[+] [-] rektide|2 years ago|reply
There's a reasonably interesting writeup of the tech details that helps show off Atomics value some, https://www.zsolt.blog/2021/01/Roam-Data-Structure-Query.htm... https://news.ycombinator.com/item?id=29295532
[+] [-] jchw|2 years ago|reply
I hope more companies consider this unusual arrangement at least as an alternative to other approaches. Permissively licensed binaries can come in handy, though it certainly comes with it's risks. For example, Microsoft released the binaries for its WebView2 SDK under the BSD license; this is nice of course, but the side-effect is that we can (and did) reverse engineer the loader binary back to source code. I suspect that's unlikely to happen for any substantially large commercial product, and I am not a lawyer so I can't be sure this isn't still legally dubious, but it's still worth considering: the protections of a EULA are completely gone here, if you just distribute binaries under a vanilla permissive open source license.
[+] [-] thayne|2 years ago|reply
> Datomic binaries are provided under the Apache 2 license which grants all the same rights to a work delivered in object form.
That doesn't answer the question at all. I assume the answer is no, because otherwise they would just say yes, and have a link to the source code somewhere. But that is such a weird, and possibly duplicitous way to answer.
[+] [-] kgwxd|2 years ago|reply
[+] [-] fulafel|2 years ago|reply
This is cool as well. It's a CloudFormation template based product you can deploy from AWS Marketplace.
[+] [-] CrimsonCape|2 years ago|reply
In this case Datomic maintains development control over their product and "source of truth" is still themselves, and the implicit assumption is that you enthusiastically use their product for free with no strings attached because you respect them as the source of truth.
[+] [-] yencabulator|2 years ago|reply
Freeware has been a thing for mere four decades now.
https://en.wikipedia.org/wiki/Freeware
[+] [-] endisneigh|2 years ago|reply
[+] [-] ddellacosta|2 years ago|reply
Using it was pretty nice at the scale of a small startup with a motivated team, but scaling it up organizationally-speaking was a challenge due to Datalog's relative idiosyncrasy and poor tooling around the database itself. This was compounded by the parallel challenge of keeping a Clojure codebase from going spaghetti-shaped, which happens in that language when teams scale without a lot of "convention and discipline"--it may be easier to manage otherwise. All of that said, this was years ago so maybe things have changed.
At this point I'd choose either PostgreSQL or SQLite for any project I'm getting started with, as they are both rock-solid, full-featured projects with great tooling and widespread adoption. If things need to scale a basic PostgreSQL setup can usually handle a lot until you need to move to e.g. RDS or whatever, and I'm probably biased but I think SQL is not really that much worse than Datalog for common use-cases. Datalog is nice though, don't get me wrong.
EDIT: one point I forgot to make: the killer feature of being an immutable data store that lets you go back in time is in fact super cool, and it's probably exactly what some organizations need, but it is also costly, and I suspect the number of organizations who really need that functionality is pretty small. The place I was at certainly didn't, which is probably part of the reason for the friction I experienced.
[+] [-] Naomarik|2 years ago|reply
Development experience is extremely nice using clojure. I've used it for two other projects and has been very reliable. My latest project didn't really need any of its features compared to a traditional rdbms but I opted for it anyways so I don't have to write sql.
[+] [-] jackrusher|2 years ago|reply
https://www.datomic.com/customers.html
[+] [-] adamfeldman|2 years ago|reply
[+] [-] raybb|2 years ago|reply
[+] [-] xmlblog|2 years ago|reply
[+] [-] AzzieElbab|2 years ago|reply
[+] [-] avodonosov|2 years ago|reply