top | item 42824765

(no title)

flaque | 1 year ago

This only makes sense if you think scaling laws won't hold.

If someone gets something to work with 1k h100s that should have taken 100k h100s, that means the group with the 100k is about to have a much, much better model.

discuss

order

No comments yet.