tdrhq's comments | WingNews

tdrhq | 10 months ago | on: Using Git-upload-pack for a simpler CI integration

Yes, but our SaaS tool required our customers to not use sparse or shallow checkouts since we needed the git commit-graph.

Using upload-pack allowed us to remove that constraint, since even in a shallow clone we can still get the commit graph via SSH from the remote.

tdrhq | 10 months ago | on: Using Git-upload-pack for a simpler CI integration

Azure DevOps has an additional requirement that Git clients support a protocol feature called "multi-ack". We don't support it yet, and we didn't think we need it.

Rather than blocking our roll-out on implementing multi-ack, we just disabled this for Azure DevOps for now. We do have a fallback as long as the user isn't using shallow clones.

tdrhq | 10 months ago | on: Using Git-upload-pack for a simpler CI integration

Thanks! (author here)

tdrhq | 1 year ago | on: Building a highly-available web service without a database

> The paper that definitionally is Raft doesn't tell you how to interact with durable storage.

That's being a bit pedantic. Yeah, I did mean that any respectable library implementing Raft would handle all of this correctly.

> without having to crawl some Btrees by hand.

This is not how I query an index. First, we don't even use Btrees, most of the times it's just hash-tables, and otherwise it's a simpler form of binary search trees. But in both cases, it's completely abstracted away in library I'm using. So if I'm trying to search for companies with a given name, in my code it looks like '(company-with-name "foobar")'. If I'm looking for users that belong to a specific company, it'll look like '(users-for-company company)'.

So I still think you're overestimating the benefits of a query engine.

tdrhq | 1 year ago | on: Building a highly-available web service without a database

> Upgrades are especially challenging

Yes indeed! But this doesn't apply to a startup in the Explore phase, where you don't need replication, and how we did it for a long time. This is the phase where this architecture is the most use for product iteration.

But you're right, once you start using replication in the Expand phase, there certainly are engineering challenges, but they're all solvable challenges. It might help that in Common Lisp we can hot-reload code, which makes some migrations a lot easier.

tdrhq | 1 year ago | on: Building a highly-available web service without a database

We used an existing library called bknr.datastore to handle this part, so we didn't have to reinvent the wheel :) I mentioned that at the end of the blog post, but I wanted to build up the idea for people who have no prior knowledge about how such things work.

tdrhq | 1 year ago | on: Building a highly-available web service without a database

To clarify, as I think some people have misunderstood: we used an existing library called bknr.datastore to handle the "database" part of the in-memory store, so we didn't have to invent too much. Our only innovation here was during the Expand phase, where we put that datastore behind a Raft replication.

tdrhq | 1 year ago | on: Building a highly-available web service without a database

Good catch on the metrics!

We do use Preset for metrics and dashboards, and obviously Preset isn't going to talk to our in-memory database.

So we do have a separate MySQL database where we just push analytics data. (e.g. each time an event of interest happens.) We never read from this database, and the data is a schemaless JSON.

Preset then queries from this database for our metrics purposes.

tdrhq | 1 year ago | on: Building a highly-available web service without a database

Ah, sure: I did not consider Redis at all. My goal in the Explore phase was to keep the data in the same process as my code, and replacing MySQL with any other database doesn't really help here. This was a developer-productivity goal, not a performance goal.

tdrhq | 1 year ago | on: Building a highly-available web service without a database

Raft does do persistence and crash recovery, at least of the transaction logs.

What you need from your side (and there are libraries that already do this): a) A mechanism to snapshot all the data b) An easy in-memory mechanism to create indexes on fields--not strictly needed, but definitely makes things a lot more easier to work with.

Bespoke data structures are just simple classes, so if you're familiar with traversing simple objects in the language of your choice, you're all set. You might be over-estimating the benefits of a query engine (and I have worked at multiple places that used MySQL extensively, and used MySQL to build heavily scaled software in the past).

tdrhq | 1 year ago | on: Building a highly-available web service without a database

This is fascinating, thanks for the data! I agree with the the other reply to this: I probably should've said that it's easy to get a machine with 100s of GB of RAM instead of saying it's "cheap".

tdrhq | 1 year ago | on: Building a highly-available web service without a database

> It's quite odd that an argument grounded on performance claims

I probably did a bad job then, because everything in the blog post was meant to be developer productivity claims, not performance claims. (I come from a developer productivity background. I'm decent at performance stuff, but it's not what excites me, since for most companies my size performance is not critical as long as it scales.)

tdrhq | 1 year ago | on: Building a highly-available web service without a database

Good question!

And this comes to the difference between Explore phase and Expand phase.

In the Explore phase, data migration was just running code on the production server via a REPL. Some migrations such as adding/removing fields are just a matter of hot-reloading code in CL, so there weren't a lot of migrations that we had to manually run.

In the Expand phase, once you add replication, this does become hard and we did roll out our own migration framework. But by this point we already have a lot of code built out, so we weren't going to replace it with a transactional database.

Essentially, we optimized for the Explore phase, and "dealt with the consequences" in the Expand phase (but the consequences aren't as bad as you might think).

tdrhq | 1 year ago | on: Building a highly-available web service without a database

Absolutely. By the way, if it wasn't clear from my blog post, in the Explore phase, I used an existing library to do this. It was only in the Expand phase that I put this existing library behind a Raft replication.

tdrhq | 1 year ago | on: Building a highly-available web service without a database

> Wait, so you’re blocking on a Raft round-trip to make forward progress? That’s the correct decision wrt durability, but…

Yeah. I hope it was clear in my post that the goal was developer productivity, not performance.

The round trip is only an issue on writes, and reads are super fast. At least in my app, this works out great. The writes also parallelize nicely with respect to the round trips, since the underlying Raft library just bundles multiple transactions together. Where it is a bottleneck is if you're writing multiple times sequentially on the same thread.

The solution there is you create a single named transaction that does the multiple writes. Then the only thing that needs to be replicated is that one transaction even though you might be writing multiple fields.

> it’s been a few years since I’ve had to debug Paxos

And this is why I wouldn't have recommended this with Paxos. Raft on the other hand is super easy for anyone to understand.

tdrhq | 1 year ago | on: Building a highly-available web service without a database

I haven't seen either of these! But I have to say, my inspiration here came from existing libraries. (My only innovation here is taking an existing library that did the whole transaction log thing, and putting it behind a Raft cluster.)

The batching thing is something we can easily do with the library that I'm using. It allows us to define functions as arbitrary transactions. Within the transaction I can do anything that changes state, including changing multiple fields, so we don't have to keep flushing the log after every field.

tdrhq | 1 year ago | on: Building a highly-available web service without a database

There's a list of libraries here, which include a few Python libraries: https://raft.github.io/

I don't know if they're production grade. I was drawn to Braft because of Baidu's backing.

tdrhq | 1 year ago | on: Building a highly-available web service without a database

> but checking his socials he is not really experienced person.

I'm not sure what qualifies as experience if Meta/Google doesn't. ;)

tdrhq | 1 year ago | on: Building a highly-available web service without a database

I think it's important to understand that every startup goes through three phases: Explore, Expand, Extract. What's simple in one phase isn't simple in the other.

A transactional database is simple in Expand and Extract, but adds additional overhead during the Explore phase, because you're focusing on infrastructure issues rather than product. Data reliability isn't critical in the Explore phase either, because you just don't have customers, so you just don't have data.

Having everything in memory with bknr.datastore (without replication) is simple in the Explore phase, but once you get to Expand phase it adds operational overhead to make sure that data is consistent.

But by the time I've reached the Expand phase, I've already proven my product and I've already written a bunch of code. Rewriting it with a transactional database doesn't make sense, and it's easier to just add replication on top of it with Raft.

tdrhq | 1 year ago | on: Building a highly-available web service without a database

> This is actually not an easy thing to do. If your shutdowns are always clean SIGSTOPs, yes, you can reliably flush writes to disk. But if you get a SIGKILL at the wrong time, or don’t handle an io error correctly, you’re probably going to lose data.

Thanks for the comment! This is handled correctly by Raft/Braft. With Raft, before a transaction is considered committed it must be committed by a majority of nodes. So if the transaction log gets corrupted, it will restore and get the latest transaction logs from the other node.

> I’m sorry, but I don’t think this was as persuasive as you meant it to be.

I wasn't trying to be persuasive about this. :) I was trying to drive home the point that you don't need a massively distributed system to make a useful startup. I think some founders go the opposite direction and try to build something that scales to a billion users before they even get their first user.