top | item 46307575

(no title)

tacitusarc | 2 months ago

It’s actually more complicated than that now. You don’t get that kind of refusal purely from MoE. OpenAI models use a fine-tuned model on a token-based system, where every interaction is wrapped as a “tool call” with some source attached, and a veracity associated with the source. OpenAI tools have high veracity, users have low veracity. To mitigate prompt injection, models are expect a token early in the flow, and then throughout the prompt they expect that token to be associated with the tool calls.

In effect this means user input is easily disbelieved, and the model can accidentally output itself into a state of uncorrectable wrongness. By invoking the image tool, you managed to get your information into the context as “high veracity”.

Note: This info is the result of experimentation, not confirmed by anyone at OpenAI.

discuss

measurablefunc|2 months ago

Seems plausible but the overall architecture is still the same, your request has to be "routed" by some NN & if that gets stuck picking a node/"expert" (regardless of "tools" & "veracity" scoring) that keeps refusing the request incorrectly then getting unstuck is highly non-trivial b/c users are not given a choice in what weights are assigned to the "experts", it's magic that OpenAI is performing behind the scenes that no one has any visibility into.

tacitusarc|2 months ago

I think maybe you mean something else when you say MoE. I interpret that as “Mixture of Experts” which is a model type where there is a routing matrix applied per layer to sort of generate the matmul executed on that layer. The experts are the weight columns that are selected, but calling them experts kinda muddies the waters IMO, it’s really just a sparsification strategy. Using that MoE you almost certainly would get various different routing behaviors as you added to the context.

I might misunderstand you but it seems like you think there are multiple models with one dispatching to others? I’m not sure what that sort of multi-agent architecture is called, but I think those would be modeled as tool calls (and I do believe that the image related stuff is certainly specialized models).

In any case, I am saying that GPT5 (or whichever) is the one actually refusing the request. It is making that decision, and only updating its behavior after getting higher trust data confirming the user’s words in its context.