top | new | best | ask | show | jobs

top | item 40373188

Layer-wise inferencing and batching: Small VRAM doesn't limit LLM throughput

2 points| verdagon | 1 year ago |verdagon.dev

discuss

order

No comments yet.

powered by hn/api // news.ycombinator.com