top | item 46762663

(no title)

LTL_FTC | 1 month ago

It sounds like you don’t need immediate llm responses and can batch process your data nightly? Have you considered running a local llm? May not need to pay for api calls. Today’s local models are quite good. I started off with cpu and even that was fine for my pipelines.

discuss

kreetx|1 month ago

Though haven't done any extensive testing then I personally could easily get by with current local models. The only reason I don't is that the hosted ones all have free tiers.

queenkjuul|1 month ago

Agreed, I'm pretty amazed at what I'm able to do locally just with an AMD 6700XT and 32GB of RAM. It's slow, but if you've got all night...

ok_orco|1 month ago

I haven't thought about that, but really want to dig in more now. Any places you recommend starting?

LTL_FTC|1 month ago

I started off using gpt-oss-120b on cpu. It uses about 60-65gb of memory or so but my workstation has 128gb of ram. If I had less ram, I would start off with the gpt-oss-20b model and go from there. Look for MoE models as they are more efficient to run.

My old threadripper pro was seeing about 15tps, which was quite acceptable for the background tasks I was running.

ydu1a2fovb|1 month ago

Can you suggest any good llms for cpu?

LTL_FTC|1 month ago

R_D_Olivaw|1 month ago

Following.