top | item 42845624 (no title) redlock | 1 year ago The issue here is that, even with a lot of VRAM, you may be able to run the model, but with a large context, it will still be too slow. (For example, running LLaMA 70B with a 30k+ context prompt takes minutes to process.) discuss order hn newest No comments yet.
No comments yet.