top | item 46999459

(no title)

Yea it's been butchering relatively easy to moderate tasks for me even with reasoning set to high. I am hoping it's just tuning that needs to be done since they've had to port it to a novel architecture.

If instead the model is performing worse due to how much they had to shrink it just so it will fit on Cerebras hardware, then we might be in for a long wait for the next gen of ginormous chips.

discuss

postalcoder|18 days ago

Agree w/ you on the model's tendency to butcher things. Performance wise, this almost feels like the GPT-OSS model.

I need to incorporate "risk of major failure" into bluey bench. Spark is a dangerous model. It doesnt strongly internalize the consequences of the commands that it runs, even on xhigh. As a result I'm observing a high tendency to run destructive commands.

For instance, I asked it to assign random numbers to the filename of the videos in my folder to run the bm. It accidentally deleted the files on most of the runs. The funniest part about it is that it comes back to you within a few seconds and says something like "Whoops, I have to keep it real, I just deleted the files in your folder."

HumanOstrich|18 days ago

Ouch, at least it fesses up. I ran into problems with it first refusing to use git "because of system-level rules in the session". Then later it randomly amended a commit and force pushed it because it made a dumb mistake. I guess it was embarassed.

jychang|18 days ago

> If instead the model is performing worse due to how much they had to shrink it just so it will fit on Cerebras hardware

They really should have just named it "gpt-5.3-codex-mini" (served by Cerebras). It would have made it clear what this model really is.

HumanOstrich|18 days ago

Not if you're suggesting that "(served by Cerebras)" should be part of the name. They're partnering with Cerebras and providing a layer of value. Also, OpenAI is "serving" you the model.

We don't know how they integrate with Cerebras hardware, but typically you'd pay a few million dollars to get the hardware in your own datacenter. So no, "served by Cerebras" is confusing and misleading.

Also "mini" is confusing because it's not analagous to gpt-5.1-codex vs gpt-5.1-codex-mini. Gpt-5.3-codex-spark is a unique, _experimental_ offering that doesn't fit the existing naming suffixes.

I don't understand what's wrong with "spark". It's friendly and evokes a sense of something novel, which is perfect.

If you want to know more about the model, read the first paragraph of the article. That information doesn't need to be hardcoded into the model name indefinitely. I don't see any "gpt-5.3-codex-nvidia" models.