top | item 33860764

(no title)

alexedw | 3 years ago

I suspect the only solid solution is OpenAI themselves storing all text their models generates, and providing an API which will return whether they've outputted a specific string (or a similar one) in the past.

A lot of suggestions here talk about the consistent stylistic choices that ChatGPT makes, like it's lists or other particular mannerisms. I'd argue these are simply artefacts of it being fine-tuned on a large number of 'well-behaved' examples from Open AI. This phenomena is called partial mode collapse, this article does a great job discussing it with respect to GPT-3 [0].

Of course you could train a model to detect when this mode-collapse occurs to detect ChatGPT. The un-finetuned model, however, does not have these problems, so it's only a matter of OpenAI improving their fine-tuning dataset to return to an 'undetectable' AI.

[0] https://www.lesswrong.com/posts/t9svvNPNmFf5Qa3TA/mysteries-...

discuss

No comments yet.