Java FFM zero-copy transport using io_uring

nateb2022|2 months ago

This looks like most of it was vibecoded.

Unnecessary comments like:

  clientChannel.configureBlocking(false); // Non-blocking client

can be found throughout the source, and the project's landing page is a good example of typical SOTA models' outputs when asked for a frontend landing page.

Squarex|2 months ago

They openly talk about it here https://www.mvp.express/philosophy/ in section " AI-Assisted Development".

krisgenre|2 months ago

Okay, but is that a bad thing?

rohanray|2 months ago

Apologies! It's a long read and was the only time I did not want to use AI to summarize for a purpose :) ---- So Yes, a lot of the code has been written using AI. I have also been transparent about it on https://www.mvp.express/philosophy/ in the section "AI-Assisted Development".

However, that does not mean that I as the author do not understand the code/concepts :) I also don't deny the fact that I might not have gone through the entire codebase till now.

For some background: 1. I have been working in Capital Markets-Trading, basically FIX (https://www.fixtrading.org/what-is-fix/) systems since a few years now and have been using QuickFIX/J at my job. 2. At the same time, I have been intrigued with Java FFM especially after seeing huge performance gain over idiomatic Java code for a (~500 MB market data) file processing job which I had written a few months back at my regular work. 3. Fellow FIX developers from JVM world would know that there are other Java FIX systems that achieve "extra/huge performance boost" by using "Java's sun.misc.Unsafe" in several parts of the FIX system and OMS.

Reflecting on above 3 points, I had envisioned writing a modern Java FIX engine with 1. 0.0% usage of sun.misc.Unsafe in the entire codebase, 2. achieve close(-enough) performance to market leading C/C++ FIX engines. This was somewhat in the beginning of this year-2025. However, a month or two into this effort I realized 2 key essential ingredients which will dictate performance, latency, & throughput of the entire system - 1. Serialization & 2. Transport. By then, I had already written quite a few tests and benchmarks and was amazed by the performance boost solely relying on FFM; also no unsafe, zero copy, zero allocations are the benefits as byproduct & ofcourse extremely low GC pauses comparatively.

Since I had already started using FFM MemorySegment et al to build the key infra parts of the system - I was of the opinion that restricting these only within a FIX system alone would be a crime. Hence, MYRA & MVP.Express were incubated as an idea overnight - modern, safe, lightweight, modular, FFM oriented high-performant Java infra libs.

Well, I have been posting only on Reddit's Java sub till now to get some initial feedback. However, I just noticed today a sudden huge inflow of traffic and that's how I realized its coming from a post by mands on HN. Thanks mands! I had no intention of posting to HN (yet). No complaints :) I'm glad it made here and also appreciate all the feedback.

A note on why the extensive (un)checked usage of AI to build this - I would like to go breadth first rapidly i.e. expanding the ecosystem to let others tinker with. Work in pipeline - 1. JIA Cache - build a modern JVM based off-heap & safe distributed caching using the MYRA libs as infra. 2. MVP.Express - a light-weight Java only RPC system focusing on performance, type-safety, schema-driven, high throughput & low latency by leveraging MYRA libs & JIA-Cache as building blocks. Side note: I am currently on a vacation. Once back; I plan to start integrating XDP/eBPF as another backend for myra-transport.

Agree or not - That's a hell lot of work! And that's the reason I am using AI extensively. To quickly build modern FFM based solutions and validate the existence/purpose - through performance and other metrics. Ideally, they should be real good candidates to perhaps displace incumbent similar systems which have a lot of legacy pre Java 8 code; meaning even if such existing systems need to be modernized they would potentially have to be re-written from scratch using modern Java paradigms. Well that's what MYRA & MVP.Express is trying to do now as Stage 0 at a rapid pace - to see a market fit!

Having said that, I am very cautious about the design and guard-rails which is evident from the extensive test suite & benchmark every MYRA lib has and will have. Trying to follow a close TTD loop here.

Next stages: If the MYRA libs and related ecosystem seems to be a good fit for modern Java projects, then I and others (its OSS for a purpose) can contribute also by manually reading (human verification) certain parts of the code in which they are experts at. This way we/us as a Java community can build modern forward-looking libs & solutions to power the enterprises for the next decade or two. It may sound silly but I believe in this philosophy and I hope you will too!

Let's look at it from another (realistic) perspective - I have been working on this since a few months (2 to 3 give or take) along with my current 9-5; have been possible only due to AI. TBH if there was no AI, most probably I would not even have thought of starting this myriad task - since I know practically I would never have been able to finish ever or would have taken an enormous timeline and perhaps might have abandoned it half-way!

Hope, this clears some air and brings some honest clarity about the goals & philosophy of MYRA (& myself seconded). Also, I am not a io_uring/XDP expert and AI has been really helpful to bring my vision into reality. Although, I am in parallel trying to grow my knowledge into the technical nitty-gritties of these tools/technologies. However solely due to AI, I was able to rapidly build something and hence, prove that using io_uring has substantial benefit - evident from benchmarks against Java Netty. That's what I meant earlier by rapidly expanding on the breadth of the ecosystem first and warranting every solution's purpose thru benchmarks and other metrics; not to forget NO unsafe and NO JNI as well are also golden nuggets.

Last but not the least, I am excited by the response here on HN and will stay close here going forward; will be sharing updates here as well. I would also appreciate all kind of concerns/feedback/suggestion.

Thanks -RR

szundi|2 months ago

What really matters though is the quality of the human review.

jeffreygoesto|2 months ago

27us roundtrip is not really state of the art for zero copy IPC, about 1us would be. What is causing this overhead?

jstimpfle|2 months ago

Asking for those who, like me, haven't yet taken the time to find technical information on that webpage:

What exactly does that roundtrip latency number measure (especially your 1us)? Does zero copy imply mapping pages between processes? Is there an async kernel component involved (like I would infer from "io_uring") or just two user space processes mapping pages?

znpy|2 months ago

It may or may not be good, depending on a number of fact.

I did read the original linux zerocopy papers from google for example, and at the time (when using tcp) the juice was worth the squeeze when payload was larger than than 10 kilobytes (or 20? Don’t remember right now and i’m on mobile).

Also a common technique is batching, so you amortise the round-trip time (this used to be the cost of sendmmsg/recvmmsg) over, say, 10 payloads.

So yeah that number alone can mean a lot or it can mean very little.

In my experience people that are doing low latency stuff already built their own thing around msg_zerocopy, io_uring and stuff :)

unknown|2 months ago

[deleted]

unknown|2 months ago

[deleted]

rohanray|2 months ago

It's not a local IPC exactly. The roundtrip benchmark stat is for a TCP server-client ping/pong call using a 2 KB payload; TCP is although on local loopback (127.0.0.1).

Source: https://github.com/mvp-express/myra-transport/blob/main/benc...

blibble|2 months ago

indeed, you can get a packet from one box to another in 1-2us

rohanray|2 months ago

It's not a local IPC exactly. The roundtrip benchmark stat is for a TCP server-client ping/pong call using a 2 KB payload; TCP is although on local loopback (127.0.0.1).

The payload is encoded using myra-codec FFM MemorySegment directly into a pre-registered buffer in io_uring SQE on the server. Similarly, on the client side CQE writes encoded payload directly into a client provided MemorySegment. The whole process saves a few SYSCALLs. Also, the above process is zero copy.

Source: https://github.com/mvp-express/myra-transport/blob/main/benc...

P.S.: I had posted this as a reply to jeffrey but not able to see it. Hence, reposting as a direct reply to the main post for visibility as well.

Disclaimer: I am the author of https://mvp.express. I would love feedback, critical suggestions/advise.

Thanks -RR

refulgentis|2 months ago

Pretty much what NateB said* - but that might leave you at "what's wrong with that? that's how I could get it done"

There's WAY too much content, way too many names and stuff that feels subtly off. I'm 37, been on this site for 16 years. I'm assuming target audience here is enterprise Java developers, which isn't my home, so I'm sure I'm missing some stuff is idiomatic in that culture.

But the vast, vast amount of things that are completely unfamiliar tells me something else is going on and it's not good.

Like I bet this is f'ing cool, otherwise you wouldn't put in the effort to share it. But you're better off having something super brief** in a GitHub README than a pseudo-marketing site that's straining to fit a cool technical thing into the wrong template.

* https://news.ycombinator.com/item?id=46255661

** what you wrote is great! "The payload is encoded using myra-codec FFM MemorySegment directly into a pre-registered buffer in io_uring SQE on the server. Similarly, on the client side CQE writes encoded payload directly into a client provided MemorySegment. The whole process saves a few SYSCALLs. Also, the above process is zero copy." -- then the site looks like it wants to sell N different products and confusing flowcharts, but really, you're just geeked out and did something cool and want to share the technical details. So it's designed for the wrong audience.

owl_might|2 months ago

Do you vibecoded this entire thing ? That's clearly the impression it gives. I haven't seen a single line of text or code in this entire organization that looks human.

Do you have the skills to verify what the AI has generated, and are you confident that everything works as advertised?

exabrial|2 months ago

Impressive. I'm sure the numbers will continue to improve as both the FFM and this project mature.

Java Native databases or KVP stores would be good usage targets IMHO

rohanray|2 months ago

I have been planning on JIA Cache as a distributed caching system built with off heap memory DS & Flyweight for readers to achieve zero copy. I think a KV store will come out as a byproduct while developing JIA Cache

TheGuyWhoCodes|2 months ago

In my opinion adding kryo in the benchmark is somewhat disingenuous as it does not require a message schema definition while MyraCodec/SBE/FlatBuffers do.

The only thing that says is schemeless and is zero copy is Apache Fory which is missing from the benchmark.

rohanray|2 months ago

I had added Kryo since that seems to be the fastest Java serialization library which does not use sun.misc.unsafe.

Thanks for sharing Apache Fory! Will try to add that to the benchmark as well.

DarkmSparks|2 months ago

Most of it seems to be 404ing now

rohanray|2 months ago

Oh! That shouldn't be the case :( Please let me know if you are still facing 404. I just checked and no alerts from my monitoring yet.

Thanks for letting know though!

54 comments