top | item 39929185

Saving cloud costs by writing our own database

211 points| wolframhempel | 2 years ago |hivekit.io | reply

160 comments

order
[+] RHSeeger|2 years ago|reply
> we’ve replaced our $10k/month Aurora instances with a $200/month Elastic Block Storage (EBS) volume.

Without any intent to insult what you've done (because the information is interesting and the writeup is well done)... how do the numbers work out when you account for actually implementing and maintaining the database?

- Developer(s) time to initially implement it

- PjM/PM time to organize initial build

- Developer(s) time for maintenance (fix bugs and enhancement requirements)

- PjM/PM time to organize maintenance

The cost of someone to maintain the actual "service" (independent of the development of it) is, I assume, either similar or lower, so there's probably a win there. I'm assuming you have someone on board that was on charge of making sure Aurora was configured / being used correctly; and it would be just as easier if not easier to do the same for your custom database.

The cost of 120,000/year for Aurora seems like it would be less than the cost of development/organization time for the custom database.

Note: It's clear you have other reasons for needing your custom database. I get that. I was just curious about the costs.

[+] g9yuayon|2 years ago|reply
> PjM/PM time to organize initial build

This sounds what big companies or a disorganized company would need. For an efficient enough company, a project like this needs just one or two dedicated engineers.

In fact, I can't imagine why this project needs a PM at all. The database is used by engineers and is built by engineers. Engineers should be their own PMs. It's like we need a PM for a programming language, but no, the compiler writer must be the language designer and must use the the language. Those who do not use a product or do not have in-depth knowledge in the the domain should not be the PM of the product.

[+] preommr|2 years ago|reply
I feel like the word "database" is throwing people off because they're comparing it with something like MySql/Postgres, when this seems slightly more complex than a k/v store stored to a file, with some other indexing, where data integrity is a low priority. That shouldn't take too much time and should be fairly isolated on the tech side so little involvement from product/project managers.
[+] rdtsc|2 years ago|reply
> The cost of 120,000/year for Aurora seems like it would be less than the cost of development/organization time for the custom database

Only if they planned on hiring someone just to develop this new database and if they switch to Aurora they’d let them go immediately. If the said developer was already costing them $250k to maintain and develop the application and work on top of Aurora cost seems like a good way to save $100k/year.

[+] ReflectedImage|2 years ago|reply
Well presumably they need only 1/3 of a developer to do this and they intend to scale up 10x in the next 5 years.

$60,000 per year in-house vs $1,200,000 per year aurora. No brainer really.

[+] Spivak|2 years ago|reply
I think for this kind of thing their needs are so simple and well-suited to a bespoke implementation that it probably paid for itself in less than 4 months. This doesn't seem like a db implementation that's going to need dedicated maintenance.

They're operationally using a funny spelling of SQLite and I don't imagine anyone arguing that such a thing needs constant attention.

[+] deedasmi|2 years ago|reply
Don’t forget this is a largely one time cost vs Aurora, which scales cost with usage.

Also they said their current volume is around 13k/second. They’ve built the new platform for 30k/sec per node. This should last them a long time with minimal maintenance.

[+] kalleboo|2 years ago|reply
Using Aurora is also not free in terms of cost of development. Developers need to be trained on it's implementation, features, constraints, performance implications, keeping up with API changes, etc
[+] mbrumlow|2 years ago|reply
The problem is all those people you listed would still exist plus the 120k bill to Amazon.

They may or may not be doing other things depending on the company size and state.

You could drop the PM, engineers writing for engineers don’t need a PM.

You will likely hire similarly costed engineers to maintain the database stack anyways.

You basically hit all the talking points big cloud has brainwashed people into thinking into being true, but every day we see stories of a handful of engineers doing something we are told can’t be done and saving millions in cloud cost.

It’s so painful to watch. Software Engineering became a thing because you could hire a engineer solve your problem, and big businesses stepped in and told you that your problem was something else and gone out of its way to stifle innovation by settling industry standards on how to do things that only guarantee you use cloud services.

Any company that wants to own its destiny knows to stay away from lock-in.

[+] samatman|2 years ago|reply
I would imagine, as someone with no special insight into goings-on at Hivekit, that the answer is intended scale.

They mention 13.5k simultaneous connections. The US has 4.2 million tractors alone, just the US, just tractors. If they get 10% of those tractors on the network that's a 30x to their data storage needs. So multiply that across the entire planet, and all the use cases they hope to serve.

Investing time early on so that they can store 50x data-per-dollar is almost certainly time well spent.

[+] donohoe|2 years ago|reply
I came here to ask the same question.

If this db requires 1 full-time developer then the cost would immediately be not worth it (assuming salary + benefits > $120k/yr)

As you say, without details it’s hard to know if this was a good idea.

[+] exe34|2 years ago|reply
> PjM/PM

What do you need them for?

[+] jrockway|2 years ago|reply
Everyone seems fixated on the word database and the engineering cost of writing one. This is a log file. You write data to the end of it. You flush it to disk whenever you've filled up some unit of storage that is efficient to write to disk. Every query is a full table scan. If you have multiple writers, this works out very nicely when you have one API server per disk; each server writes its own files (with a simple mutex gating the write out of a batch of records), and queries involve opening all the files in parallel and aggregating the result. (Map, shuffle, reduce.)

Atomic: not applicable, as there are no transactions. Consistent: no, as there is no protection about losing the tail end of writes (consider "no space left on device" halfway through a record). Independent: not applicable, as there are no transactions. Durable: no, the data is buffered in memory before being written to the network (EBS is the network, not a disk).

So with all of this in mind, the engineering cost is not going to be higher than $10,000 a month. It's a print statement.

If it sounds like I'm being negative, I'm not. Log files are one of my favorite types of time series data storage. A for loop that reads every record is one of my favorite query plans. But this is not what things like Postgres or Aurora aim to do, they aim for things like "we need to edit past data several times per second and derive some of those edits from data that is also being edited". Now you have some complexity and a big old binary log file and some for loops isn't really going to get you there. But if you don't need those things, then you don't need those things, and you don't need to pay for them.

The question you always have to ask, though, is have you reasoned about the business impacts of losing data through unhandled transactional conflicts? "read committed" or "non-durable writes" are often big customer service problems. "You deducted this bill payment twice, and now I can't pay the rent!" Does it matter to your end users? If not, you can save a lot of time and money. If it does, well, then the best-effort log file probably isn't going to be good for business.

[+] mdaniel|2 years ago|reply
Anytime I hear "we need to blast in per-second measurements of ..." my mind jumps to "well, have you looked at the bazillions of timeseries databases out there?" Because the fact those payloads happen to be (time, lat, long, device_id) tuples seems immaterial to the timeseries database and can then be rolled up into whatever level of aggregation one wishes for long-term storage

It also seems that just about every open source "datadog / new relic replacement" is built on top of ClickHouse, and even they themselves allege multi-petabyte capabilities <https://news.ycombinator.com/item?id=39905443>

OT1H, I saw the "we did research" part of the post, and I for sure have no horse in your race of NIH, but "we write to EBS, what's the worst that can happen" strikes me as ... be sure you're comfortable with the tradeoffs you've made in order to get a catchy blog post title

[+] speedgoose|2 years ago|reply
ClickHouse is one of the few databases that can handle most of the time-series use cases.

InfluxDB, the most popular time-series database, is optimised for a very specific kind of workloads: many sensors publishing frequently to a single node, and frequent queries that are not going far back in time. It's great for that. But it doesn't support doing slightly advanced queries such an average over two sensors. It also doesn't scale and is pretty slow to query far back in time due to its architecture.

TimeScaleDB is a bit more advanced, because it's built on top of PostGreSQL, but it's not very fast. It's better than vanilla PostGreSQL for time-series.

The TSM Bench paper has interesting figures, but in short ClickHouse wins and manage well in almost all benchmarks.

https://dl.acm.org/doi/abs/10.14778/3611479.3611532

https://imgur.com/a/QmWlxz9

Unfortunately, the paper didn't benchmark DuckDB, Apache IoTDB, and VictoriaMetrics. They also didn't benchmark proprietary databases such as Vertica or BigQuery.

If you deal with time-series data, ClickHouse is likely going to perform very well.

[+] Too|2 years ago|reply
Apache Parquet as data format on disk seems to be popular these days for similar DIY log/time series applications. It can be appended locally and flushed to S3 for persistence.
[+] robertlagrant|2 years ago|reply
> but "we write to EBS, what's the worst that can happen" strikes me as ... be sure you're comfortable with the tradeoffs you've made in order to get a catchy blog post title

In what way?

[+] Spivak|2 years ago|reply
I mean if you spun up Postgres on EC2 you would be directly writing to EBS so that's not really the part I'm worried about. I'm more worried about the lack of replication, seemingly no way to scale reads or writes, beyond a single server, and no way to failover uninterrupted.

I'm guessing it doesn't matter for their use-case which is a good thing. When you realize you only need like this teeny subset of db features and none of the hard parts writing you own starts to get feasible.

[+] yau8edq12i|2 years ago|reply
Wasn't this already discussed here yesterday? The main criticism of the article is that they didn't write a database, they wrote an append-only log system with limited query capabilities. Which is fine. But it's not a "database" in the sense that someone would understand when reading the title.
[+] throwaway63467|2 years ago|reply
Why isn’t that a database? In my understanding a DB needs to be able to store structured data and retrieve it, so not sure what’s missing here? Many modern DBs are effectively append only logs with compaction and some indexing on top as well as a query engine, so personally I don’t think it’s weird to call this a DB.
[+] superq|2 years ago|reply
It's difficult to be pedantic about an ambiguous term like database without additional qualification or specificity.

There are more types of databases than those that end in "SQL".

A CSV file alone is a database. The rows are, well, rows. So is a DBM file, which is what MySQL was originally built on (might still be). Or an SQLite file.

The client or server API doesn't have to be part of the database itself.

[+] eatonphil|2 years ago|reply
> they wrote an append-only log system with limited query capabilities.

This sounds like a database to me.

[+] forrestthewoods|2 years ago|reply
Writing custom code that does exactly what you need and nothing else is underrated. More people should do that! This is a great example.
[+] mamcx|2 years ago|reply
> But it's not a "database" in the sense that someone would understand when reading the title.

Sure, because it is common for people to mix a "database" (aka: data in some kind of structure) with a paradigm (relational, SQL, document, kv) with a "database system" aka: and app that manages the database.

[+] intelVISA|2 years ago|reply
> purpose built, in process storage engine that’s part of the same executable as our core server. It writes a minimal, delta based binary format

Get that engineer a sales gig, that's insane upselling of the reality: git commit -am 'added array to store structs'

[+] Retr0id|2 years ago|reply
If you described those needs to the average engineer, they'd correctly say "use a database".
[+] hmottestad|2 years ago|reply
Yeah. They basically defined a binary format. I wouldn’t call it a database either.
[+] xyst|2 years ago|reply
Sounds like Kafka to me. Except have to rewrite components like ksqldb
[+] zX41ZdbW|2 years ago|reply
Sounds totally redundant to me. You can write all location updates into ClickHouse, and the problem is solved.

As a demo, I've recently implemented a tool to browse 50 billion airplane locations: https://adsb.exposed/

Disclaimer: I'm the author of ClickHouse.

[+] MuffinFlavored|2 years ago|reply
> We want to be able to handle up to 30k location updates per second per node. They can be buffered before writing, leading to a much lower number of IOPS.

> This storage engine is part of our server binary, so the cost for running it hasn’t changed. What has changed though, is that we’ve replaced our $10k/month Aurora instances with a $200/month Elastic Block Storage (EBS) volume. We are using Provisioned IOPS SSD (io2) with 3000 IOPS and are batching updates to one write per second per node and realm.

I would be curious to hear what that "1 write per second" looks like in terms of throughput/size?

[+] time0ut|2 years ago|reply
Good article.

> EBS has automated backups and recovery built in and high uptime guarantees, so we don’t feel that we’ve missed out on any of the reliability guarantees that Aurora offered.

It may not matter for their use case, but I don't believe this is accurate in a general sense. EBS volumes are local to an availability zone while Aurora's storage is replicated across a quorum of AZs [0]. If a region loses an AZ, the database instance can be failed over to a healthy one with little downtime. This has only happened to me a couple times over the past three years, but it was pretty seamless and things were back on track pretty fast.

I didn't see anything in the article about addressing availability if there is an AZ outage. It may simply not matter or maybe they have solved for it. Could be a good topic for a follow up article.

[0] https://aws.amazon.com/blogs/database/introducing-the-aurora...

[+] kumarm|2 years ago|reply
I have built similar system in 2002 using JGroups (JavaGroups at the time before open source project was acquired by JBoss) while persisting asynchronously to DB (Oracle at the time). Our scale even in 2002 was much higher than 13,000 vehicles.

The project I believe still appears in success story on JGroups website after 20+ years. I am surprised people are writing their own databases for location storage in 2024 :). There was no need to invent new technology in 2002 and definitely not in 2024.

[+] afro88|2 years ago|reply
These two sentences don't work together:

> [We need to cater for] Delivery companies that want to be able to replay the exact seconds leading up to an accident.

> We are ok with losing some data. We buffer about 1 second worth of updates before we write to disk

Impressive engineering effort on it's own though!

[+] xyst|2 years ago|reply
This seems like they rewrote Kafka to me.

Even moderately sized Kafka clusters can handle the throughput requirement. Can even optimize for performance over durability.

Some limited query capability with components such as ksqldb.

Maybe offload historical data to blob storage.

Then again, Kafka is kind of complicated to run at these scales. Very easy to fuck up.

[+] the_duke|2 years ago|reply
I don't know what geospatial features are needed, but otherwise time series databases are great for this use case.

I especially like Clickhouse, it's generic but also a powerhouse that handles most things you throw at it, handles huge write volumes (with sufficient batching), supports horizontal scaling, and offloading long-term storage to S3 for much smaller disk requirements. The geo features in clickhouse are pretty basic, but it does have some builtin geo datatypes and functions for eg calculating the distance.

[+] kaladin_1|2 years ago|reply
I love the attitude, we didn't see a good fit so we rolled ours.

Sure it won't cover the bazillion cases the DBs out there do but that's not what you need. The source code is small enough for any team member to jump in and debug while pushing performance in any direction you want.

Cudos!

[+] CapeTheory|2 years ago|reply
It's amazing what can happen when software companies start doing something approximating real engineering, rather than just sitting a UI on top of some managed services.
[+] yunohn|2 years ago|reply
This is more a bespoke file format than a full blown database. It’s optimized for one table schema and a few specific queries.

Not a negative though, not everything needs a general purpose database. Clearly this satisfies their requirements, which is the most important thing.

[+] diziet|2 years ago|reply
As others had mentioned, probably hosting your own clickhouse instance could yield major savings while allowing for much more flexibility in the future for querying data. If your use case can be served by what clickhouse offers, gosh is it an incredibly fast and reliable open source solution that you can host yourself.
[+] bawolff|2 years ago|reply
Kind of misleading to not include the cost of developing it yourself.

I think everything is cheaper than cloud if you do it yourself when you don't count staffing cost.

[+] benrutter|2 years ago|reply
Yeah and for most companies without a huge supply of developers the financial risk of having all your stuff blitzed when your home spun solution fails.
[+] Simon_ORourke|2 years ago|reply
I've no doubt this is true, however, anyone I've ever met who exclaimed "let's create our own database" would be viewed as dangerous, unprofessional or downright uneducated in any business meeting. There's just too much can go badly wrong, for all the sunk cost in getting anything up and running.
[+] mavili|2 years ago|reply
That is such a problem in today's world. Of course you don't want to re-invent the wheel and all that, but we must be open to the idea of having to do it. Innovation stagnates if people suggesting redoing something are immediately seen as "dangerous, unprofessional or downright uneducated"
[+] akira2501|2 years ago|reply
> would be viewed as dangerous, unprofessional or downright uneducated in any business meeting

Sounds like a great place to work.

> There's just too much can go badly wrong, for all the sunk cost in getting anything up and running.

Engineering is the art of compromise. In many cases the compromises would not be worth it, but that doesn't mean there are zero places where it would be, and eschewing the discussion out fear of how it would be perceived is the opposite of Engineering.

[+] rstuart4133|2 years ago|reply
A lot of people here are making very confident sounding assertions, yet some as saying it's just an append only log file and some imply it's sharded. Something everyone does agree on is they are very vague about what geospartial features they need.

The one thing they do say is "no ACID". That implies no b-trees, because an unexpected stop means a corrupted b-tree. Perhaps they use a hash instead, but it would have to be a damned clever hash tree implementation to avoid the same problem. Or perhaps they just rebuild the index after a crash.

Even a append only log file has to be handled carefully without ACID. An uncontrolled shutdown in more file systems will at leave blocks of nulls in the file and 1/2 written blocks if they cross disk block boundaries.

It's a tantalising headline, but after reading the 1,200 words I'm none the wiser on what they built or whether it meets their own specs. A bit of a disappointment.

[+] INTPenis|2 years ago|reply
That is such an insane headline.

You might as well say "we saved 100% of cloud costs by writing our own cloud".

[+] endisneigh|2 years ago|reply
It would be interesting to see a database built from the ground up for being trivial to maintain.

I use managed databases, but is there really that much to do for maintaining a database? The host requires some level of maintenance - changing disks, updating the host operating system, failover during downtime for machine repair, etc. if you use a database built for failover I imagine much of this doesn’t actually affect the operations that much assuming you slightly over provision.

For a database alone I think the work needed to maintain is greatly exaggerated. That being said I still think it’s more than using a managed database, which is why my company still does so.

In this case though, an append log seems pretty simple imo. Better to self host.