What I find particularly ironic is that the title make it feel like Rust gives a 5x performance improvement when it actually slows thing down.
The problem they have software written in Rust, and they need to use the libpg_query library, that is written in C. Because they can't use the C library directly, they had to use a Rust-to-C binding library, that uses Protobuf for portability reasons. Problem is that it is slow.
So what they did is that they wrote their own non-portable but much more optimized Rust-to-C bindings, with the help of a LLM.
But had they written their software in C, they wouldn't have needed to do any conversion at all. It means they could have titled the article "How we lowered the performance penalty of using Rust".
I don't know much about Rust or libpg_query, but they probably could have gone even faster by getting rid of the conversion entirely. It would most likely have involved major adaptations and some unsafe Rust though. Writing a converter has many advantages: portability, convenience, security, etc... but it has a cost, and ultimately, I think it is a big reason why computers are so fast and apps are so slow. Our machines keep copying, converting, serializing and deserializing things.
Note: I have nothing against what they did, quite the opposite, I always appreciate those who care about performance, and what they did is reasonable and effective, good job!
> What I find particularly ironic is that the title make it feel like Rust gives a 5x performance improvement when it actually slows thing down.
Rust didn't slow them down. The inefficient design of the external library did.
Calling into C libraries from Rust is extremely easy. It takes some work to create a safer wrapper around C libraries, but it's been done for many popular libraries.
This is the first and only time I've seen an external library connected via a Rube Goldberg like contraption with protobufs in the middle. That's the problem.
Sadly they went with the "rewrite to Rust" meme in the headline for more clickability.
>> But had they written their software in C, they wouldn't have needed to do any conversion at all. It means they could have titled the article "How we lowered the performance penalty of using Rust".
That's not really fair. The library was doing serialization/deserialization which was poor design choice from a performance perspective. They just made a more sane API that doesn't do all that extra work. It might best be titles "replacing protobuf with a normal API to go 5 times faster."
BTW what makes you think writing their end in C would yield even higher performance?
I wonder why they didn't immediately FFI it: C is the easiest lang to write rust binding for. It can get tedious if using many parts of a large API, but otherwise is straightforward.
I write most of my applications and libraries in Rust, and lament that most of the libraries I wish I would FFI are in C++ or Python, which are more difficult.
Protobuf sounds like the wrong tool. It has applications for wire serialization and similar, but is still kind of a mess there. I would not apply it to something that stays in memory.
Given they heavily used LLMs for this optimization, makes you wonder why they didn’t use them to just port the C library to rust entirely. I think the volume of library ports to more languages/the most performant languages is going to explode, especially given it’s a relatively deterministic effort so long as you have good tests and api contracts, etc
> they had to use a Rust-to-C binding library, that uses Protobuf for portability reasons.
That sounds like a performance nightmare, putting Protobuf of all things between the language and Postgres, I'm surprised such a library ever got popular.
I find the title a bit misleading. I think it should be titled It’s Faster to Copy Memory Directly than Send a Protobuf. Which then seems rather obvious that removing a serialization and deserialization step reduces runtime.
Protobuf does something important that copying memory cannot do: a protocol that can be changed separately on either end and things can still work. You have to build for "my client doesn't send some new data" (make a good default), or "I got extra data I don't understand" (ignore it). However the ability to upgrade part of the system is critical when the system is large and complex since you can't fix everything to understand your new feature without making the new feature take ages to roll out.
Protobuf also handles a bunch of languages for you. The other team wants to write in a "stupid language" - you don't have to have a political fight to prove your preferred is best for everything. You just let that team do what they want and they can learn the hard way it was a bad language. Either it isn't really that bad and so the fight was pointless, or it was but management can find other metrics to prove it and it becomes their problem to decide if it is bad enough to be worth fixing.
TIL serializing a protobuf is only 5 times slower than copying memory, which is way faster than I thought it’d be. Impressive given all the other nice things protobuf offers to development teams.
Why don't we use standardized zero-copy data formats for this kind of thing? A standardized layout like Arrow means that the data is not tied to the layout/padding of a particular language, potential security problems like bounds checks are automatically handled by the tooling, and it works well over multiple communication channels.
If they made the headline something on the line of "replacing protobuf with a native, optimized implementation" would not get the same attention as putting rust in the title to attract the everything-in-rust-is-better crowd.
It's devbait, not many of us can resist bikeshedding about the title which obviously doesn't accurately reflect the article contents. And the article contents are self-aware enough to admit this to itself too, yet the title remains.
Since there seems to be some confusion in the comments about why pg_query chose Protobufs in the first place, let me add some context as the original author of pg_query (but not involved with PgDog, though Lev has shared this work by email beforehand).
The initial motivation for developing pg_query was for pganalyze, where we use it to parse queries extracted from Postgres, to find the referenced tables, and these days also rewrite and format queries. That use case runs in the background, and as such is much less performance critical.
pg_query actually initially used a JSON format for the parse output (AST), but we changed that to Protobuf a few major releases ago, because Protobuf makes it easy to have typed bindings in the different languages we support (Ruby, Go, Rust, Python, etc). Alternatives (e.g. using FFI directly) make sense for Rust, but would require a lot of maintained glue code for other languages.
All that said, I'm supportive of Lev's effort here, and we'll add some additional functions (see [0]) in the libpg_query library to make using it directly (i.e. via FFI) easier. But I don't see Protobuf going away, because in non-performance critical cases, it is more ergonomic across the different bindings.
I mean, cap'n'proto is written by the same person who created protobuf, so they are legit (and that somewhat jokish claim is simply that it requires no parsing).
The title is misleading but the actual work is impressive - they optimized their Protobuf usage, not replaced it entirely.
This is a common pattern: "We switched to X and got 5x faster" often really means "We fixed our terrible implementation and happened to rewrite it in X."
Key lessons from this:
1. Serialization/deserialization is often a hidden bottleneck, especially in microservices where you're doing it constantly
2. The default implementation of any library is rarely optimal for your specific use case
3. Benchmarking before optimization is critical - they identified the actual bottleneck instead of guessing
For anyone dealing with Protobuf performance issues, before rewriting:
- Use arena allocation to reduce memory allocations
- Pool your message objects
- Consider if you actually need all the fields you're serializing
- Profile the actual hot path
Rust FFI has overhead too. The real win here was probably rethinking their data flow and doing the optimization work, not just the language choice.
I also thought I could trust mega Corp. That's why I put all my code on their platform, code.google.com, and not on this obscure platform without any business model, github.
Well, that sucked. And why should I use protobuf, when I just need to share structs and arrays in memory (aka zero copy) with a version field? Like everyone else does for decades?
Performance of Protobuf is a joke. Why not use a zero copy format so that serialization is free? For example, my format Lite³ which outperforms Google Flatbuffers by 242x: https://github.com/fastserial/lite3
Mmmm, I don't know maybe because your library DIDN'T EXIST before November 2025? Or perhaps for any other million reasons why people use Protobuf, and don't use Cap'n'proto and other 0-serialise libraries, like requiring a schema, established tooling for language of their choice, etc?
Seems like this has nothing to do with Rust or protobufs. The underlying PostgreSQL abstraction engine they'd picked had a wasteful serialization implementation (that happens to have been using protobuf). So pgdog dropped it and open-coded a serialization-free transfer using the C API.
Well, yeah. If there's a feature you don't need, you'll see value by coding around it. Some features turn out not to be needed by anyone, maybe this is one. But some people need serialization, and that's what protobufs are for[1]. Those people are very (!) poorly served by headlines telling them to use Rust (!!) instead of serialization.
[1] Though as always the standard litany applies: you actually want JSON, and not protobus or ASN.1 or anything else. If you like some other technology better, you're wrong and you actually want JSON. If you think you need something faster, you probably don't and JSON would suit your needs better. If you really, 100%, know for sure that you need it faster than JSON, then you're probably isomorphic to the folks in the linked article, shouldn't have been serializing at all, and should get to work open coding your own hooks on the raw backend.
using a transport serialization and deserialization protocol for IPC. It is obvious why there was an overhead because it was architectural decision to manage the communication.
I guess the old adage of if something goes 20% faster something was improved if it is 10x faster, it was just built wrong is true here.
Can someone explain how protobuf ended up in the middle here? I'm just totally confused; the C ABI exists in almost every language, why did they need protobuf here?
I don't know, but I have a guess. Someone didn't want to deal with unsafety of dealing with memory allocated in C code. Serialize/deserialize makes it easy, no need for unsafe, no need to learn all the quirks of the C-library allocating the memory.
I had experience with writing safe bindings to structures created in C library, and it is a real pain. You spend a lot of times reverse engineering C code to get an idea of the intent of those who had wrote the code. You need to know which pointers can address the same memory. You need to know which pointers can be NULL or just plain invalid. You need to know which pointers you get from C code or pass to it along with ownership, and which are just borrowed. It maybe (and often is) unclear from the documentation, so you are going to read a lot of C code, trying to guess what the authors were thinking when writing it. Generating hypotheses about the library behavior (like 'library never does THIS with the pointer') and trying to prove them by finding all the code dealing with the pointer.
It can be easy in easy situations, or it can be really tricky and time consuming. So it can make sense to just insert serialization/deserialization to avoid dealing with C code.
Gotta say, I love using PGDog. It has some fantastic built in features, and I'm looking forward to testing out the improved query parser. Lev and the team are heroes.
At the scale we were using PGDog, enabling the previous form of the query parser was extremely expensive (we would have had to 16x our pgdog fleet size).
Many people are exclaiming that the title is baity, but I disagree. It seems like a perfectly fine title in the context of this blog, which is about a specific product. It's unlikely they wrote the blog with a HN submission in mind. They're not a news publication, either.
I vaguely recall that there's a Rust macro to automatically convert recursive functions to iterative.
But I would just increase the stack size limit if it ever becomes a problem. As far as I know the only reason it is so small is because of address space exhaustion which only affects 32-bit systems.
(Note that enabling release mode on that link will have the compiler pre-calculate the result so you need to put it to debug mode if you want to see the assembly this generates)
I don't understand, I used protobuf for map data, but it is a hardcore simple format, this is the whole purpose of it.
I wrote assembly, memory mapping oriented protobuf software... in assembly, then what? I am allowed to say I am going 1000 times faster than rust now???
Just for fun, how often do regular-sized companies that deal in regular-sized traffic need Protobuf to accomplish their goals in the first place, compared to JSON or even XML with basic string marshalling?
I dunno, are you sure you can manually write correct de/serializaiton for JSON and XML so strings, floats and integer formats correctly get parsed between JavaScript, Java, Python, Go, Rust, C++ and any other languages?
Do you want to maintain that and debug that? Do you want to do all of that without help of a compiler enforcing the schema and failing compiles/CI when someone accidentally changes the schema?
Because you get all of that with protobuf if you use them appropriately.
You can of course build all of this yourself... and maybe it'll even be as efficient, performant and supported. Maybe.
Well, protobuf allows to generate easy to use code for parsing defined data and service stubs for many languages and is one of the faster and less bandwidth wasting options
Besides the other comments already here about code gen & contracts, a bigger one for me to step away from json/xml is binary serialization.
It sounds weird, and its totally dependent on your use case, but binary serialization can make a giant difference.
For me, I work with 3D data which is primarily (but not only) tightly packed arrays of floats & ints. I have a bunch of options available:
1. JSON/XML, readable, easy to work with, relatively bulky (but not as bad as people think if you compress) but no random access, and slow floating point parsing, great extensibility.
2. JSON/XML + base64, OK to work with, quite bulky, no random access, faster parsing, but no structure, extensible.
3. Manual binary serialization: hard to work with, OK size (esp compressed), random access if you put in the effort, optimal parsing, not extensible unless you put in a lot of effort.
4. Flatbuffers/protobuf/capn-proto/etc: easy to work with, great size (esp compressed), random access, close-to-optimal parsing, extensible.
Basically if you care about performance, you would really like to just have control of the binary layout of your data, but you generally don't want to design extensibility and random access yourself, so you end up sacrificing explicit layout (and so some performance) by choosing a convenient lib.
We are a very regularly sized company, but our 3D data spans hundreds of terabytes.
(also, no, there is no general purpose 3D format available to do this work, gltf and friends are great but have a small range of usecases)
In most languages protobuf is eaiser because it generates the boilerplate. And protobuf is cross language so even if you are working in javascript where json is native protobuf is still faster because the other side can be whatever and you are not spending their time parsing.
Protobuf is fantastic because it separates the definition from the language. When you make changes, you recompile your definitions to native code and you can be sure it will stay compatible with other languages and implementations.
It's not just about traffic. IoT devices (or any other low-powered devices for that matter) also like protobuf because of its comparatively high efficiency.
You should be terrified of the instability you're introducing to achieve this. Memory sharing between processes is very difficult to keep stable, it is half the reason kernels exist.
I was terrified until it worked. The Postgres "ABI" is relatively stable - the parser only really changes between major versions and we bake the whole code into the same executable - largely thanks to the work done by team behind pg_query!
The output is machine-verifiable, which makes this uniquely possible in today's vibe-coded world!
Now and then I find a wild place people shove protobuf in. It's like zero consideration were given sometimes beyond "multiple languages from the same IDL" like it's some magical zero-overhead abstraction over bytes on a wire.
GuB-42|1 month ago
The problem they have software written in Rust, and they need to use the libpg_query library, that is written in C. Because they can't use the C library directly, they had to use a Rust-to-C binding library, that uses Protobuf for portability reasons. Problem is that it is slow.
So what they did is that they wrote their own non-portable but much more optimized Rust-to-C bindings, with the help of a LLM.
But had they written their software in C, they wouldn't have needed to do any conversion at all. It means they could have titled the article "How we lowered the performance penalty of using Rust".
I don't know much about Rust or libpg_query, but they probably could have gone even faster by getting rid of the conversion entirely. It would most likely have involved major adaptations and some unsafe Rust though. Writing a converter has many advantages: portability, convenience, security, etc... but it has a cost, and ultimately, I think it is a big reason why computers are so fast and apps are so slow. Our machines keep copying, converting, serializing and deserializing things.
Note: I have nothing against what they did, quite the opposite, I always appreciate those who care about performance, and what they did is reasonable and effective, good job!
Aurornis|1 month ago
Rust didn't slow them down. The inefficient design of the external library did.
Calling into C libraries from Rust is extremely easy. It takes some work to create a safer wrapper around C libraries, but it's been done for many popular libraries.
This is the first and only time I've seen an external library connected via a Rube Goldberg like contraption with protobufs in the middle. That's the problem.
Sadly they went with the "rewrite to Rust" meme in the headline for more clickability.
phkahler|1 month ago
That's not really fair. The library was doing serialization/deserialization which was poor design choice from a performance perspective. They just made a more sane API that doesn't do all that extra work. It might best be titles "replacing protobuf with a normal API to go 5 times faster."
BTW what makes you think writing their end in C would yield even higher performance?
the__alchemist|1 month ago
I write most of my applications and libraries in Rust, and lament that most of the libraries I wish I would FFI are in C++ or Python, which are more difficult.
Protobuf sounds like the wrong tool. It has applications for wire serialization and similar, but is still kind of a mess there. I would not apply it to something that stays in memory.
dchuk|1 month ago
logicchains|1 month ago
That sounds like a performance nightmare, putting Protobuf of all things between the language and Postgres, I'm surprised such a library ever got popular.
cranx|1 month ago
bluGill|1 month ago
Protobuf also handles a bunch of languages for you. The other team wants to write in a "stupid language" - you don't have to have a political fight to prove your preferred is best for everything. You just let that team do what they want and they can learn the hard way it was a bad language. Either it isn't really that bad and so the fight was pointless, or it was but management can find other metrics to prove it and it becomes their problem to decide if it is bad enough to be worth fixing.
MrDarcy|1 month ago
miroljub|1 month ago
Just doing memcpy or mmap would be even faster. But the same Rust advocates bragging about Rust speed frown upon such unsecure practices in C/C++.
infogulch|1 month ago
unknown|1 month ago
[deleted]
lenkite|1 month ago
nottorp|1 month ago
They changed the persistence system completely. Looks like from a generic solution to something specific to what they're carrying across the wire.
They could have done it in Lua and it would have been 3x faster.
consp|1 month ago
embedding-shape|1 month ago
alias_neo|1 month ago
I wonder if it's just poorly worded and they meant to say something like "Replacing Protobuf with some native calls [in Rust]".
misja111|1 month ago
unknown|1 month ago
[deleted]
win311fwg|1 month ago
locknitpicker|1 month ago
> Protobuf is fast, but not using Protobuf is faster.
The blog post reads like an unserious attempt to repeat a Rust meme.
lfittl|1 month ago
The initial motivation for developing pg_query was for pganalyze, where we use it to parse queries extracted from Postgres, to find the referenced tables, and these days also rewrite and format queries. That use case runs in the background, and as such is much less performance critical.
pg_query actually initially used a JSON format for the parse output (AST), but we changed that to Protobuf a few major releases ago, because Protobuf makes it easy to have typed bindings in the different languages we support (Ruby, Go, Rust, Python, etc). Alternatives (e.g. using FFI directly) make sense for Rust, but would require a lot of maintained glue code for other languages.
All that said, I'm supportive of Lev's effort here, and we'll add some additional functions (see [0]) in the libpg_query library to make using it directly (i.e. via FFI) easier. But I don't see Protobuf going away, because in non-performance critical cases, it is more ergonomic across the different bindings.
[0]: https://github.com/pganalyze/libpg_query/pull/321
rozenmd|1 month ago
gf000|1 month ago
7777332215|1 month ago
jpalepu33|1 month ago
This is a common pattern: "We switched to X and got 5x faster" often really means "We fixed our terrible implementation and happened to rewrite it in X."
Key lessons from this:
1. Serialization/deserialization is often a hidden bottleneck, especially in microservices where you're doing it constantly 2. The default implementation of any library is rarely optimal for your specific use case 3. Benchmarking before optimization is critical - they identified the actual bottleneck instead of guessing
For anyone dealing with Protobuf performance issues, before rewriting: - Use arena allocation to reduce memory allocations - Pool your message objects - Consider if you actually need all the fields you're serializing - Profile the actual hot path
Rust FFI has overhead too. The real win here was probably rethinking their data flow and doing the optimization work, not just the language choice.
yodacola|1 month ago
nindalf|1 month ago
[1] - https://github.com/protocolbuffers/protobuf: Google's data interchange format
[2] - https://github.com/google/flatbuffers: Also maintained by Google
rurban|1 month ago
secondcoming|1 month ago
eliasdejong|1 month ago
tucnak|1 month ago
ajross|1 month ago
Well, yeah. If there's a feature you don't need, you'll see value by coding around it. Some features turn out not to be needed by anyone, maybe this is one. But some people need serialization, and that's what protobufs are for[1]. Those people are very (!) poorly served by headlines telling them to use Rust (!!) instead of serialization.
[1] Though as always the standard litany applies: you actually want JSON, and not protobus or ASN.1 or anything else. If you like some other technology better, you're wrong and you actually want JSON. If you think you need something faster, you probably don't and JSON would suit your needs better. If you really, 100%, know for sure that you need it faster than JSON, then you're probably isomorphic to the folks in the linked article, shouldn't have been serializing at all, and should get to work open coding your own hooks on the raw backend.
suriya-ganesh|1 month ago
using a transport serialization and deserialization protocol for IPC. It is obvious why there was an overhead because it was architectural decision to manage the communication.
I guess the old adage of if something goes 20% faster something was improved if it is 10x faster, it was just built wrong is true here.
lowdownbutter|1 month ago
chuckadams|1 month ago
unknown|1 month ago
[deleted]
nemothekid|1 month ago
ordu|1 month ago
I had experience with writing safe bindings to structures created in C library, and it is a real pain. You spend a lot of times reverse engineering C code to get an idea of the intent of those who had wrote the code. You need to know which pointers can address the same memory. You need to know which pointers can be NULL or just plain invalid. You need to know which pointers you get from C code or pass to it along with ownership, and which are just borrowed. It maybe (and often is) unclear from the documentation, so you are going to read a lot of C code, trying to guess what the authors were thinking when writing it. Generating hypotheses about the library behavior (like 'library never does THIS with the pointer') and trying to prove them by finding all the code dealing with the pointer.
It can be easy in easy situations, or it can be really tricky and time consuming. So it can make sense to just insert serialization/deserialization to avoid dealing with C code.
unknown|1 month ago
[deleted]
maherbeg|1 month ago
At the scale we were using PGDog, enabling the previous form of the query parser was extremely expensive (we would have had to 16x our pgdog fleet size).
levkk|1 month ago
Thank you so much for the kind words!
linuxftw|1 month ago
unknown|1 month ago
[deleted]
IshKebab|1 month ago
But I would just increase the stack size limit if it ever becomes a problem. As far as I know the only reason it is so small is because of address space exhaustion which only affects 32-bit systems.
jeroenhd|1 month ago
The `become` keyword has already been reserved and work continues to happen (https://github.com/rust-lang/rust/issues/112788). If you enable #![feature(explicit_tail_calls)] you can already use the feature in the nightly compiler: https://play.rust-lang.org/?version=nightly&mode=debug&editi...
(Note that enabling release mode on that link will have the compiler pre-calculate the result so you need to put it to debug mode if you want to see the assembly this generates)
embedding-shape|1 month ago
Isn't that just TCO or similar? Usually a part of the compiler/core of the language itself, AFAIK.
sylware|1 month ago
I wrote assembly, memory mapping oriented protobuf software... in assembly, then what? I am allowed to say I am going 1000 times faster than rust now???
t-writescode|1 month ago
izacus|1 month ago
Do you want to maintain that and debug that? Do you want to do all of that without help of a compiler enforcing the schema and failing compiles/CI when someone accidentally changes the schema?
Because you get all of that with protobuf if you use them appropriately.
You can of course build all of this yourself... and maybe it'll even be as efficient, performant and supported. Maybe.
tcfhgj|1 month ago
tuetuopay|1 month ago
Having a way to describe your whole API and generate bindings is a godsend. Yes, it can be done with JSON and OpenApi, yet it’s not mandatory.
vouwfietsman|1 month ago
It sounds weird, and its totally dependent on your use case, but binary serialization can make a giant difference.
For me, I work with 3D data which is primarily (but not only) tightly packed arrays of floats & ints. I have a bunch of options available:
1. JSON/XML, readable, easy to work with, relatively bulky (but not as bad as people think if you compress) but no random access, and slow floating point parsing, great extensibility.
2. JSON/XML + base64, OK to work with, quite bulky, no random access, faster parsing, but no structure, extensible.
3. Manual binary serialization: hard to work with, OK size (esp compressed), random access if you put in the effort, optimal parsing, not extensible unless you put in a lot of effort.
4. Flatbuffers/protobuf/capn-proto/etc: easy to work with, great size (esp compressed), random access, close-to-optimal parsing, extensible.
Basically if you care about performance, you would really like to just have control of the binary layout of your data, but you generally don't want to design extensibility and random access yourself, so you end up sacrificing explicit layout (and so some performance) by choosing a convenient lib.
We are a very regularly sized company, but our 3D data spans hundreds of terabytes.
(also, no, there is no general purpose 3D format available to do this work, gltf and friends are great but have a small range of usecases)
bluGill|1 month ago
jonathanstrange|1 month ago
Chiron1991|1 month ago
pjmlp|1 month ago
spwa4|1 month ago
levkk|1 month ago
The output is machine-verifiable, which makes this uniquely possible in today's vibe-coded world!
up2isomorphism|1 month ago
ruicraveiro|1 month ago
unnouinceput|1 month ago
So it's C actually, not Rust. But Hey! we used Rust somewhere, so let's post it on HN and farm internet points.
0x457|1 month ago
chuckhend|1 month ago
levkk|1 month ago
rgovostes|1 month ago
steeve|1 month ago
ahartmetz|1 month ago
Xunjin|1 month ago