top | item 46929655

(no title)

OtherShrezzing | 22 days ago

A useful feature would be slow-mode which gets low cost compute on spot pricing.

I’ll often kick off a process at the end of my day, or over lunch. I don’t need it to run immediately. I’d be fine if it just ran on their next otherwise-idle gpu at much lower cost that the standard offering.

discuss

order

stavros|22 days ago

OpenAI offers that, or at least used to. You can batch all your inference and get much lower prices.

airspresso|21 days ago

Still do. Great for workloads where it's okay to bundle a bunch of requests and wait some hours (up to 24h, usually done faster) for all of them to complete.

mrklol|21 days ago

Yep same, I often think why this isn’t a thing yet. Running some tasks in the night at e.g. 50% of the costs - there’s the batch api but that is not integrated in e.g. claude code

gardnr|21 days ago

The discount MAX plans are already on slow-mode.

guerrilla|22 days ago

> I’ll often kick off a process at the end of my day, or over lunch. I don’t need it to run immediately. I’d be fine if it just ran on their next otherwise-idle gpu at much lower cost that the standard offering.

If it's not time sensitive, why not just run it at on CPU/RAM rather than GPU.

weird-eye-issue|22 days ago

Yeah just run a LLM with over 100 billion parameters on a CPU.

gruez|22 days ago

Does that even work out to be cheaper, once you factor in how much extra power you'd need?