top | item 46714938

(no title)

plutodev | 1 month ago

[flagged]

discuss

oefrha|1 month ago

You’re subtly pushing the same product in basically every one of your comments. If these are good faith comments please edit out the product name, it’s unnecessary and doing so as a green account just makes people consider you a spammer. Establish yourself first.

subscribed|1 month ago

They've submitted "I'm working at io.net" quite openly, but I admit, they should at least announce their employment in the bio, otherwise it's a very poorly executed astroturf post (phrased like they're an experimenting user and not a dev).

lelanthran|1 month ago

Or he could disclose it.l, which he did in a different comment on a different story.

I agree that green accounts could be regarded as suspicious and, if it were me, I'd disclose each time I mention it.

kouteiheika|1 month ago

> On the infra side, training a 1.5B model in ~4 hours on 8×H100 is impressive.

It's hard to compare without more details about the training process and the dataset, but, is it? Genuine question, because I had the opposite impression. Like, for example, recently I did a full finetuning run on a 3B model chewing through a 146k entry dataset (with 116k entries having reasoning traces, so they're not short) in 7 hours on a single RTX 6000.

unknown|1 month ago

[deleted]

kevinlu1248|1 month ago

Honestly I think we can improve our training throughput drastically via a few more optimizations but we've been spending most of our time on model quality improvements instead.