top | item 45185705

(no title)

kiratp | 5 months ago

> Importantly, we never intentionally degrade model quality as a result of demand or other factors, and the issues mentioned above stem from unrelated bugs.

Things they could do that would not technically contradict that:

- Quantize KV cache

- Data aware model quantization where their own evals will show "equivalent perf" but the overall model quality suffers.

Simple fact is that it takes longer to deploy physical compute but somehow they are able to serve more and more inference from a slowly growing pool of hardware. Something has to give...

discuss

cj|5 months ago

> Something has to give...

Is training compute interchangeable with inference compute or does training vs. inference have significantly different hardware requirements?

If training and inference hardware is pooled together, I could imagine a model where training simply fills in any unused compute at any given time (?)

kiratp|5 months ago

Hardware can be the same but scheduling is a whole different beast.

Also, if you pull too manny resources from training your next model to make inference revenue today, you’ll fall behind in the larger race.