(no title)
emmericp | 6 years ago
We've done some benchmarks here: https://www.net.in.tum.de/fileadmin/bibtex/publications/pape... (Figure 9 on page 10)
Only a very basic benchmark, working on more...
emmericp | 6 years ago
We've done some benchmarks here: https://www.net.in.tum.de/fileadmin/bibtex/publications/pape... (Figure 9 on page 10)
Only a very basic benchmark, working on more...
drewg123|6 years ago
emmericp|6 years ago
I think ~100k to 200k TSO "packets" per second should be doable with the IOMMU. But I guess it depends where the data is coming from. Could be one of the odd cases where copying data is faster than doing zero-copy, e.g., just copy everything into the same small set of small-ish buffers to keep the number of pages that need to be present in the IOMMU small?
hedora|6 years ago
At most, each kernel driver has to do an extra addition to map its physical I/O offset to the one exposed to the bus by the IOMMU. With huge pages, there’s approximately one offset per driver, so it lives in cache, probably next to other driver state.
ajross|6 years ago
emmericp|6 years ago
Yeah, I think the dTLB is only 64 entries on Intel CPUs as well, but there's a second larger layer behind that, and an even larger third layer. IIRC it's a total of 4096 entries on recent Intel CPUs.
the8472|6 years ago