top | item 45058121

Show HN: oLLM – LLM Inference for large-context tasks on consumer GPUs

3 points| anuarsh | 6 months ago |github.com

7 comments

20 minutes is a huge turnoff, unless you have it run over night.... Just to get the hint that you should exercise self care in the morning when presenting a legal paper and have the ai check it for flaws.

anuarsh|6 months ago

We are talking about 100k context here. 20k would be much faster, but you won't need KVCache offloading for it

Haeuserschlucht|6 months ago

It's better to have software erase all private details from text and have it checked by cloud ai to then have all placeholders replaced back at your harddrive.

unknown|6 months ago

[deleted]

attogram|6 months ago

"~20 min for the first token" might turn off some people. But it is totally worth it to get such a large context size on puny systems!

anuarsh|6 months ago

Absolutely, there are tons of cases where interactive experience is not required, but ability to process large context to get insights.

anuarsh|6 months ago

Hi everyone, any comments or questions are appreciated