top | item 42845624

(no title)

redlock | 1 year ago

The issue here is that, even with a lot of VRAM, you may be able to run the model, but with a large context, it will still be too slow. (For example, running LLaMA 70B with a 30k+ context prompt takes minutes to process.)

discuss

No comments yet.