(no title)
jgraettinger1 | 1 year ago
I'm not in this space and this is probably too simplistic, but I would think pairing asyncio to do all IO (reading / decoding requests and preparing them for inference) coupled with asyncio.to_thread'd calls to do_inference_in_C_with_the_GIL_released(my_prepared_request), would get you nearly all of the performance benefit using current Python.
No comments yet.