top | item 27137277

(no title)

1_person | 4 years ago

I think the pragmatic current realization of a functional data flow pipeline as described in my earlier reply incidentally achieves exactly what you mention here:

> Sure, but even faster is not to loop over intermediate arrays at all, by virtue of never constructing them in the first place when they aren't necessary.

In those examples of this type of pipeline, the vectors you start with are slices of hardware device descriptor queues which reference a DMA region, and by progressive transformation of the receive buffers and composition with additional buffer regions through intermediate decoded states you produce reply buffers.

The hardware is programmed to sample only the packets of interest to this descriptor queue, and copies the packets matching this filter directly to the DMA region referenced in the descriptor it places in the queue.

With sufficiently sophisticated hardware it is possible to offload decode of increasingly large fragments of protocol logic or even entire applications.

Description of the operations in a language of common composable operations over the type of vectors of buffers allows the amount of any given application which is mapped to e.g. general purpose CPU, GPU or other FPGA/ASIC offload feature instructions to vary continuously over time or API surface at runtime pretty neatly, it creates breakpoints in the logic flow that more or less always map exactly to functional hardware boundaries, because everything's pretty much just vectors of buffers all the way down really when you think about it. Your process's entire runtime is just another vector of buffers to the kernel.

discuss

No comments yet.