tdrhq | 10 months ago | on: Using Git-upload-pack for a simpler CI integration
tdrhq's comments
tdrhq | 10 months ago | on: Using Git-upload-pack for a simpler CI integration
Rather than blocking our roll-out on implementing multi-ack, we just disabled this for Azure DevOps for now. We do have a fallback as long as the user isn't using shallow clones.
tdrhq | 10 months ago | on: Using Git-upload-pack for a simpler CI integration
tdrhq | 1 year ago | on: Building a highly-available web service without a database
That's being a bit pedantic. Yeah, I did mean that any respectable library implementing Raft would handle all of this correctly.
> without having to crawl some Btrees by hand.
This is not how I query an index. First, we don't even use Btrees, most of the times it's just hash-tables, and otherwise it's a simpler form of binary search trees. But in both cases, it's completely abstracted away in library I'm using. So if I'm trying to search for companies with a given name, in my code it looks like '(company-with-name "foobar")'. If I'm looking for users that belong to a specific company, it'll look like '(users-for-company company)'.
So I still think you're overestimating the benefits of a query engine.
tdrhq | 1 year ago | on: Building a highly-available web service without a database
Yes indeed! But this doesn't apply to a startup in the Explore phase, where you don't need replication, and how we did it for a long time. This is the phase where this architecture is the most use for product iteration.
But you're right, once you start using replication in the Expand phase, there certainly are engineering challenges, but they're all solvable challenges. It might help that in Common Lisp we can hot-reload code, which makes some migrations a lot easier.
tdrhq | 1 year ago | on: Building a highly-available web service without a database
tdrhq | 1 year ago | on: Building a highly-available web service without a database
tdrhq | 1 year ago | on: Building a highly-available web service without a database
We do use Preset for metrics and dashboards, and obviously Preset isn't going to talk to our in-memory database.
So we do have a separate MySQL database where we just push analytics data. (e.g. each time an event of interest happens.) We never read from this database, and the data is a schemaless JSON.
Preset then queries from this database for our metrics purposes.
tdrhq | 1 year ago | on: Building a highly-available web service without a database
tdrhq | 1 year ago | on: Building a highly-available web service without a database
What you need from your side (and there are libraries that already do this): a) A mechanism to snapshot all the data b) An easy in-memory mechanism to create indexes on fields--not strictly needed, but definitely makes things a lot more easier to work with.
Bespoke data structures are just simple classes, so if you're familiar with traversing simple objects in the language of your choice, you're all set. You might be over-estimating the benefits of a query engine (and I have worked at multiple places that used MySQL extensively, and used MySQL to build heavily scaled software in the past).
tdrhq | 1 year ago | on: Building a highly-available web service without a database
tdrhq | 1 year ago | on: Building a highly-available web service without a database
I probably did a bad job then, because everything in the blog post was meant to be developer productivity claims, not performance claims. (I come from a developer productivity background. I'm decent at performance stuff, but it's not what excites me, since for most companies my size performance is not critical as long as it scales.)
tdrhq | 1 year ago | on: Building a highly-available web service without a database
And this comes to the difference between Explore phase and Expand phase.
In the Explore phase, data migration was just running code on the production server via a REPL. Some migrations such as adding/removing fields are just a matter of hot-reloading code in CL, so there weren't a lot of migrations that we had to manually run.
In the Expand phase, once you add replication, this does become hard and we did roll out our own migration framework. But by this point we already have a lot of code built out, so we weren't going to replace it with a transactional database.
Essentially, we optimized for the Explore phase, and "dealt with the consequences" in the Expand phase (but the consequences aren't as bad as you might think).
tdrhq | 1 year ago | on: Building a highly-available web service without a database
tdrhq | 1 year ago | on: Building a highly-available web service without a database
Yeah. I hope it was clear in my post that the goal was developer productivity, not performance.
The round trip is only an issue on writes, and reads are super fast. At least in my app, this works out great. The writes also parallelize nicely with respect to the round trips, since the underlying Raft library just bundles multiple transactions together. Where it is a bottleneck is if you're writing multiple times sequentially on the same thread.
The solution there is you create a single named transaction that does the multiple writes. Then the only thing that needs to be replicated is that one transaction even though you might be writing multiple fields.
> it’s been a few years since I’ve had to debug Paxos
And this is why I wouldn't have recommended this with Paxos. Raft on the other hand is super easy for anyone to understand.
tdrhq | 1 year ago | on: Building a highly-available web service without a database
The batching thing is something we can easily do with the library that I'm using. It allows us to define functions as arbitrary transactions. Within the transaction I can do anything that changes state, including changing multiple fields, so we don't have to keep flushing the log after every field.
tdrhq | 1 year ago | on: Building a highly-available web service without a database
I don't know if they're production grade. I was drawn to Braft because of Baidu's backing.
tdrhq | 1 year ago | on: Building a highly-available web service without a database
I'm not sure what qualifies as experience if Meta/Google doesn't. ;)
tdrhq | 1 year ago | on: Building a highly-available web service without a database
A transactional database is simple in Expand and Extract, but adds additional overhead during the Explore phase, because you're focusing on infrastructure issues rather than product. Data reliability isn't critical in the Explore phase either, because you just don't have customers, so you just don't have data.
Having everything in memory with bknr.datastore (without replication) is simple in the Explore phase, but once you get to Expand phase it adds operational overhead to make sure that data is consistent.
But by the time I've reached the Expand phase, I've already proven my product and I've already written a bunch of code. Rewriting it with a transactional database doesn't make sense, and it's easier to just add replication on top of it with Raft.
tdrhq | 1 year ago | on: Building a highly-available web service without a database
Thanks for the comment! This is handled correctly by Raft/Braft. With Raft, before a transaction is considered committed it must be committed by a majority of nodes. So if the transaction log gets corrupted, it will restore and get the latest transaction logs from the other node.
> I’m sorry, but I don’t think this was as persuasive as you meant it to be.
I wasn't trying to be persuasive about this. :) I was trying to drive home the point that you don't need a massively distributed system to make a useful startup. I think some founders go the opposite direction and try to build something that scales to a billion users before they even get their first user.
Using upload-pack allowed us to remove that constraint, since even in a shallow clone we can still get the commit graph via SSH from the remote.