If you have a single TCP connection, all the data flows through that connection, ultimately serializing at least some of the processing. Given that the workers are just responding with OK, no matter how many CPU cores you give to that you're still bound by the throughput of the IO thread (well by the minimum of the client and server IO thread). If you want more than 1 IO thread to share the load, you need more than one TCP connection.
When I started out using the gRPC SDK in Go, I was really surprised to find that it created just one connection per client. You'd think someone had made a multiplexing client wrapper, but I haven't been able to find one, so inevitably I've just written lightweight pooling myself. (Actually, I did come across one, but the code was very low quality.)
If request payload exceeds certain size the response latency goes from network RTT to double that, or triple.
Definitely something wrong with either TCP or HTTP/2 windowing as it doesn't send the full request without getting ACK from server first. But none of the gRPC windowing config options nor linux tcp_wmem/rmem settings work. Sending one byte request every few hundred milliseconds fixes it by keeping the gRPC channel / TCP connection active. Nagle / slow start is disabled.
gRPC is a very badly implemented system. I have gotten 25%-30%+ improvements in throughput just by monkeypatching client libraries for google cloud to force json api endpoint usage.
At least try something else besides gRPC when building systems so you have a baseline performance understanding. gRPC is OFTEN introducing performance breakdowns that goes unnoticed.
I don't think this is head-of-line blocking. That is, it's not like a single slow request causes starvation of other requests. The IO thread for the connection is grabbing and dispatching data to workers as fast as it can. All the requests are uniform, so it's not like one request would be bigger/harder to handle for that thread.
yuliyp|7 months ago
atombender|7 months ago
lacop|7 months ago
If request payload exceeds certain size the response latency goes from network RTT to double that, or triple.
Definitely something wrong with either TCP or HTTP/2 windowing as it doesn't send the full request without getting ACK from server first. But none of the gRPC windowing config options nor linux tcp_wmem/rmem settings work. Sending one byte request every few hundred milliseconds fixes it by keeping the gRPC channel / TCP connection active. Nagle / slow start is disabled.
littlecranky67|7 months ago
eivanov89|7 months ago
ltbarcly3|7 months ago
At least try something else besides gRPC when building systems so you have a baseline performance understanding. gRPC is OFTEN introducing performance breakdowns that goes unnoticed.
stock_toaster|7 months ago
xtoilette|7 months ago
yuliyp|7 months ago