Another option could be to "chunk" the messages with client-/server-streaming or bi-directional calls. But if you call your API from a browser, that may not be possible yet
Ahh, we don't have access to the server. It's closed - an NVIDIA inference engine. Which under the hood talks to their Triton engine. Unfortunately, while Triton allows configuring the limit, the layer in front of it eats our channel options which have the message size configurations.
CommonGuy|4 years ago
sjnair96|4 years ago
sjnair96|4 years ago