Show HN: Apache Fory Rust – 10-20x faster serialization than JSON/Protobuf
67 points| chaokunyang | 5 months ago |fory.apache.org | reply
Technical approach: compile-time codegen (no reflection), compact binary protocol with meta-packing, little-endian layout optimized for modern CPUs.
Unique features that other fast serializers don't have:
- Cross-language without IDL files (Rust ↔ Python/Java/Go)
- Trait object serialization (Box<dyn Trait>)
- Automatic circular reference handling
- Schema evolution without coordination
Happy to discuss design trade-offs.
Benchmarks: https://fory.apache.org/docs/benchmarks/rust
[+] [-] tnorgaard|5 months ago|reply
[+] [-] chaokunyang|5 months ago|reply
Fory’s format was designed from the ground up to handle those cases efficiently, while still enabling cross‑language compatibility and schema evolution.
[+] [-] stmw|5 months ago|reply
[+] [-] no_circuit|5 months ago|reply
https://github.com/apache/fory/blob/fd1d53bd0fbbc5e0ce6d53ef...
It seems if the serialization object is not a "Fory" struct, then it is forced to go through to/from conversion as part of the measured serialization work:
https://github.com/apache/fory/blob/fd1d53bd0fbbc5e0ce6d53ef...
The to/from type of work includes cloning Strings:
https://github.com/apache/fory/blob/fd1d53bd0fbbc5e0ce6d53ef...
reallocating growing arrays with collect:
https://github.com/apache/fory/blob/fd1d53bd0fbbc5e0ce6d53ef...
I'd think that the to/from Fory types is shouldn't be part of the tests.
Also, when used in an actual system tonic would be providing a 8KB buffer to write into, not just a Vec::default() that may need to be resized multiple times:
https://github.com/hyperium/tonic/blob/147c94cd661c0015af2e5...
[+] [-] no_circuit|5 months ago|reply
I can see the source of an 10x improvement on an Intel(R) Xeon(R) Gold 6136 CPU @ 3.00GHz, but it drops to 3x improvement when I remove the to/from that clones or collects Vecs, and always allocate an 8K Vec instead of a ::Default for the writable buffer.
If anything, the benches should be updated in a tower service / codec generics style where other formats like protobuf do not use any Fory-related code at all.
Note also that Fory has some writer pool that is utilized during the tests:
https://github.com/apache/fory/blob/fd1d53bd0fbbc5e0ce6d53ef...
Original bench selection for Fory:
Compared to original bench for Protobuf/Prost: However after allocating 8K instead of ::Default and removing to/from it for an updated protobuf bench:[+] [-] stmw|5 months ago|reply
In my experience, while starting from a language to arrive at the serialization often feels more ergonomic (e.g. RPC style) in the start, it hides too much of what's going on from the users and over time suffers greatly from programming language / runtime changes - the latter multiplied by the number of languages or frameworks supported.
[+] [-] chaokunyang|5 months ago|reply
The way I think about it is: • Single‑language projects often work best without an IDL — it keeps things simple and avoids extra steps. • Two languages – both IDL and no‑IDL approaches can work, depending on the team’s habits. • Three or more – an IDL can be really useful as a single source of truth and to avoid manually writing struct definitions in every language.
For Apache Fory, my plan is to add optional IDL support, so teams who want that “single truth” can generate definitions automatically, and others can continue with language‑first development. My hope is to give people flexibility to choose what fits their situation best.
[+] [-] mlhamel|5 months ago|reply
[+] [-] kenhwang|5 months ago|reply
Otherwise, the schema seems to be derived from the class being serialized for typed languages, or otherwise annotated in code. The serializer and deserializer code must be manually written to be compatible instead of both sides being codegen'd to match from a schema file. He's the example I found for python: https://fory.apache.org/docs/docs/guide/python_serialization...
[+] [-] athorax|5 months ago|reply
[0] https://fory.apache.org/blog/2025/10/29/fory_rust_versatile_...
[+] [-] stmw|5 months ago|reply
[+] [-] fabiensanglard|5 months ago|reply
[+] [-] seg_lol|5 months ago|reply
I do really like that is broad support out of the box and looks easy to use.
For Python I still prefer using dill since it handles code objects.
https://github.com/uqfoundation/dill
[+] [-] chaokunyang|5 months ago|reply
https://github.com/apache/fory/tree/main/python#serialize-lo...
When serializing code objects, pyfory is 3× higher compression ratio compared cloudpickle
And pyfory also provide extra security audit capability to avoid maliciously deserialization data attack.
[+] [-] chaokunyang|5 months ago|reply
https://github.com/chaokunyang/python_benchmarks
[+] [-] shinypenguin|5 months ago|reply
https://fory.apache.org/docs/docs/introduction/benchmark
[+] [-] wiseowise|5 months ago|reply
[+] [-] chaokunyang|5 months ago|reply
[+] [-] nitwit005|5 months ago|reply
It'd be helpful to see a plot of serialization costs vs data size. If you only display serialization TPS, you're always going to lose to the "do nothing" option of just writing your C structs directly to the wire, which is essentially zero cost.
[+] [-] chaokunyang|5 months ago|reply
| data type | data size | fory | protobuf |
| --------------- | --------- | ------- | -------- |
| simple-struct | small | 21 | 19 |
| simple-struct | medium | 70 | 66 |
| simple-struct | large | 220 | 216 |
| simple-list | small | 36 | 16 |
| simple-list | medium | 802 | 543 |
| simple-list | large | 14512 | 12876 |
| simple-map | small | 33 | 36 |
| simple-map | medium | 795 | 1182 |
| simple-map | large | 17893 | 21746 |
| person | small | 122 | 118 |
| person | medium | 873 | 948 |
| person | large | 7531 | 7865 |
| company | small | 191 | 182 |
| company | medium | 9118 | 9950 |
| company | large | 748105 | 782485 |
| e-commerce-data | small | 750 | 737 |
| e-commerce-data | medium | 53275 | 58025 |
| e-commerce-data | large | 1079358 | 1166878 |
| system-data | small | 311 | 315 |
| system-data | medium | 24301 | 26161 |
| system-data | large | 450031 | 479988 |
[+] [-] stmw|5 months ago|reply
[+] [-] no_circuit|5 months ago|reply
https://github.com/apache/fory/blob/fd1d53bd0fbbc5e0ce6d53ef...
[+] [-] chaokunyang|5 months ago|reply
I’m curious though: what’s an example scenario you’ve seen that requires so many distinct types? I haven’t personally come across a case with 4,096+ protocol messages defined.
[+] [-] dietr1ch|5 months ago|reply
You can browse https://fory.apache.org/docs/, but I didn't find any benchmarks directory
[+] [-] Brian_K_White|5 months ago|reply
https://fory.apache.org/docs/docs/introduction/benchmark
https://fory.apache.org/docs/docs/guide/rust_serialization
[+] [-] lsb|5 months ago|reply
[+] [-] chaokunyang|5 months ago|reply
Apache Fory, on the other hand, has its own wire‑stream format designed for sending data across processes or networks. Most of the code is focused on efficiently converting in‑memory objects into that stream format (and back) — with features like cross‑language support, circular reference handling, and schema evolution.
Fory also has a row format, which is a memory format, and can complement or compete with Arrow’s columnar format depending on the use case.
[+] [-] jasonjmcghee|5 months ago|reply
[+] [-] jasonjmcghee|5 months ago|reply
https://fory.apache.org/blog/fury_blazing_fast_multiple_lang...
But flatbuffers is _much_ faster than protobuf/json:
https://flatbuffers.dev/benchmarks/
[+] [-] yencabulator|5 months ago|reply
Have we learned nothing? Endian swap on platforms that need it is faster than conditionals, and simpler.
[+] [-] dxxvi|5 months ago|reply
[+] [-] chaokunyang|5 months ago|reply
[+] [-] paddy_m|5 months ago|reply
[+] [-] chaokunyang|5 months ago|reply
[+] [-] IshKebab|5 months ago|reply
[+] [-] chaokunyang|5 months ago|reply
The AGENTS.md file was added only recently. Its original purpose wasn’t to hand code generation entirely to an AI, but rather to use AI as a pair‑programming debugger — for example, having it walk through tricky binary parsing issues byte‑by‑byte. Serialization in a compact binary format can be hard to debug, and AI can sometimes save hours by quickly pinpointing structural mismatches.
That said, serialization is a fundamental piece of infrastructure, so we remain conservative: any AI‑assisted changes go through the same rigorous review and test process as everything else. As technology evolves, I think it’s worth exploring new tools — but with care.
[+] [-] chaokunyang|5 months ago|reply
[+] [-] paddy_m|5 months ago|reply
[+] [-] OptionOfT|5 months ago|reply
[+] [-] fritzo|5 months ago|reply
[+] [-] chaokunyang|5 months ago|reply
[+] [-] binary132|5 months ago|reply
[+] [-] chaokunyang|5 months ago|reply