One thing to consider: we don’t know if these LLMs are wrapped with server-side logic that injects randomness (e.g. using actual code or external RNG). The outputs might not come purely from the model's token probabilities, but from some opaque post-processing layer. That’s a major blind spot in this kind of testing.
avianlyric|10 months ago
This what the “temperature” parameter of an LLM controls. Setting the temperature of an LLM to 0 effectively disables that randomness, but the result is a very boring output that’s likely to end up caught in a never ending loop of useless output.
orbital-decay|10 months ago
remoquete|10 months ago