If a GPT detector has any false positives at all it will disadvantage people who are already disadvantaged. They are the least likely to be capable of defending themselves and the least likely to be heard if they do try to defend themselves.
Not to mention the fact that being called non-human is most definitely going to offend some people.
What exactly makes anyone think that they can detect an LLM that is outputting text? The notion seems absurd yet it keeps coming up.
> What exactly makes anyone think that they can detect an LLM that is outputting text?
When you read LLM output, you can often tell. The source of the notion is that we can do it pretty well ourselves, so if the AIs are so magical then they should be able to do it too (not saying I agree, but it is a pretty clear line of logic).
Probably the fact that if they admit the reality, they have to think about some difficult and profound questions. It's much easier just to posit an imaginary future technology and decide that will solve it.
> What exactly makes anyone think that they can detect an LLM that is outputting text? The notion seems absurd yet it keeps coming up.
My sense of the general idea (non-authorative): Since the sequence emitted by an LLM is probabilistic completion i.e. predict the next word, the examiner can also do the same by progressively processing the text. Given the assumption that the semantic relations extracted from training corpus should be fairly universal for a given domain at the output level (even though distinct LLMs will likely have distinct embedding spaces), then the examiner LLM should be able to assign probabilities to the predicted words. The idea is that a genuine human produced text will have idiosyncrasies that are -not- probabilistically optimal and the examiner can establish a sort of 'distant from probable mean' measure, with the expectation that LLM produced text should be 'closer' to the examiner's predictions of 'the next word'.
The problem (if above is correct) then is the missing 'prompt' and meta-instruction embedded therein. Those should ("engineering") affect the output, possibly skewing the distance measure, thus defeating the examiner. But of course, say in context of academia, the examiner can 'guess' as to some aspects of the prompt as well. For example, if you are examining papers for a specific assignment, the examiner can self-prompt as well. "An essay on Hume's position on the knowledge of the self".
> it will disadvantage people who are already disadvantaged. They are the least likely to be capable of defending themselves and the least likely to be heard if they do try to defend themselves.
Can you come up with any type of system that this does not apply to?
It’s unfortunate that the economics involved means that the purchasers of such tools will likely want a false negative rate of 0, likely at the expense of victims of such tools suffering the resultant false positive tradeoff.
You may want to read on history of polygraph. Notion that a tool giving yes/no answer needs to have it's answers relate to ground truth is very academic and generally frown upon by gatekeepers.
Prior to GPT, word processors would catch when your grammar was off for multiple words and give you suggestions to fix it. As students begin using the new GPT enhanced word processors, these suggestions will only increase.
Professors have a tough job ahead and they aren't always smart enough to use these GPT Detectors[1].
I think one solution would be a word processor that records the process of writing a paper and you need to turn in your paper along with your recording. Of course this is going to create added stress but what else do we do? There's going to be GPT scramblers that remove watermarks.
> What exactly makes anyone think that they can detect an LLM that is outputting text?
Watermarking.
On each word of the output, you randomly split all possible words into two groups and only generate output using one of them. If you get a text of 1000 words that exactly follow the secret sequence of the groups, you can be sure that it's generated by this LLM with one in 2^1000 chance of error.
If its detecting non-native English as GPT produced then wouldnt that mean GPT is modeled on non-native English to produce similar outputs? If it is NOT detecting native English as GPT produced then wouldn't that mean it's showing up different than the training data outputs?
well the system we live under doesn't give a crap about how absurd a notion is, and surely doens't care about educating people against it. as long as it's profitable it's gonna be made. In multiple scenarios it makes a ton of more sense economically to make the misinformation pool larger, and as a person who values truth and knowledge and the free access to correct information and believes that this will be the force which drives our species further, supporting this system is a contradiction. maybe i am a commie now? How would I know? The west has dismantled all the other systems to the point it is impossible for me to imagine living outside it.
Could you do the stupid thing and have openai et al offer an API that, given text (or a hash of some text) returns whether that hash was generated by the service? Some 3rd party companies specialize in cheat detection and pay for the API then get paid by schools or whoever to detect detect cheaters.
Difficulties I can think of:
* Getting ai companies to offer this. I don't think it comes with a downside for them really though. You couldn't use it to actually retrieve results in any way, and you wouldn't have prompts.
* This only detects exact matches, people changing the output would defeat it. Fixable with some kind of fuzzy search that returns a 'distance to nearest response' but this obviously makes it more expensive and difficult to run, use, etc.
* People could still run the model themselves, as models get better and more expensive maybe this becomes less of a problem. Or maybe models get small enough while still generating good output that it becomes more of a problem. Who knows.
Scott Aaronson (who is temporarily working for OpenAI) proposed a cryptographic scheme where the generated text can be watermarked without any decrease in quality.
It has most of the same problems you list, except it is much more robust against small changes to the text.
As of a few weeks ago, he mentioned that OpenAI had a working implementation and were discussing whether to start using it. I assume they'd tell people before they turn it on in prod, I see no advantage in secrecy.
OpenAI has released a model for that type of detection. It doesn’t work 100% of the time. The idea that you can just “hash all the AI generated answers and compare against them” is flawed because that problem is intractable.
While I don't doubt it, there's probably an uncanny valley too: if a writer's English contains spelling errors, or ungrammatical or unidiomatic constructions, AI won't make the same mistakes. As the English level improves, now the spelling and grammar is correct, but you have a limited repertoire of correct constructions to draw from. This may lead to repetitive structure and wordiness as the meaning is shoehorned into more, simpler, phrases. I imagine this is what sets off the AI detectors.
Considering that AI writing standards will probably only increase from here, encroaching on the territory native English speakers operate at, perhaps the best way to mark yourself as not-AI is to start making many errors.
This will work for a week or two until someone tunes a model to also do this, and the arms race grinds ever on!
It also goes for other languages too, so luckily for me, I have a short period where my dreadful other languages will be an advantage.
The texts in the sample set are too short for this to be meaningful. The human-authored TOEFL tests (which compare unfavorably with the eighth-grader essays) are ~100 words. The longest sample is under 1000 words.
GPT detection ought to be optimized for longform texts, so tests of its efficacy should be, too. Perhaps the current detectors on the market are trying to assess writing at the sentence level, but if that's the case then it should be obvious that they will be inaccurate.
GMail can write most of a 100-word email for me if I type "Thanks so much" and hit tab. That's a good thing. If GPT is useful as a "productivity" tool, it is for low-level "writing" tasks like this, which aren't really writing at all, just rote responses. Anyone who can access this tool (if they have confidence in its prowess) should use it.
Writing proper is about developing ideas, and it's this that people need to be concerned about. It's true that if your college admissions essay is riddled with typos, eyebrows may be raised, but its ultimate significance is whether you can and are willing to reason. If someone is using ChatGPT (or spellcheck, or Grammarly) in order to cultivate the appearance of having "proper English," who cares? But if they're using GPT to avoid thinking altogether, that's a problem.
At ~250 words, I guess the only thing you can assess is good formal usage. It's patently obvious that both GPT and non-native English speakers will outperform native speakers on this.
Of course, I hope that anyone in the position to assess short-form writing is made aware of this research, and is cautious about GPT detection. On the other hand, it'd be pretty funny if idiosyncratic grammatical choices became the marker of a human hand.
A close acquaintance of mine just quit her job as a community college writing instructor because of chatGPT. Everybody is using it and it’s difficult to prove so it’s made her job impossible. I think students in general would lose a lot if we “adapted” writing class into prompt engineering class. Writing is as much about organizing your thoughts and assembling an argument as it is about putting words on a page. No amount of prompt engineering training is going to teach that. The only solution I see is making all assessment in class.
If anything the opposite should be true. GPT-4 at least has near perfect English. If your sample displays non-native traits probably it wasn't generated by GPT!
Speculation on my part, but I believe us non-native English speakers write more formally and with less natural flow.
I also wonder which proportion of English writing (in general) is written by non-native speakers, and whether we might be disproportionately represented in training data.
I think people here misunderstand the whole point, non-native speakers do not necessarily make more mistakes in general, it is about perplexity of word choices and structure. In a subsequent experiment in the paper, they used chatgpt to increase and decrease perplexity in the original non-native and native texts respectively, and the exact opposite pattern was observed.
If I was a non-native speaker, I would use AI tools to clean up my writing. Does this mean I’m cheating? Of course not, the ideas are still my own. But this might make my writing more likely to be flagged.
The fact of the matter is none of these supposed “AI detectors” are reliable. GPTZero claims to be the #1 AI detection system which is a little like claiming to be the world’s best perpetual motion machine.
> If I was a non-native speaker, I would use AI tools to clean up my writing. Does this mean I’m cheating? Of course not, the ideas are still my own.
Consider a different moral dilemma:
If I had a job which I could not do and used AI tools to clean up my work, does this mean I'm cheating my employer? Even if my ideas might be incorrect, a fact hidden by the "AI clean up", yet are my own?
If I was a non-native speaker, I would use AI tools to clean up my writing. Does this mean I’m cheating? Of course not, the ideas are still my own.
No, the ideas are not your own. They're a combination of yours and the AIs (which is a combination of millions of other people's ideas really), just as they are for anyone who writes a prompt. The fact that you aren't a native English speaker, or that you're using AI for 'good', are irrelevant. If you use AI to produce content of any sort you have to accept that you are not really the author. You just wrote the prompt.
> The detectors demonstrated near-perfect accuracy for US 8-th grade essays
I am genuinely confused.
It seems that all the test data provided were real human essays. The ones provided to GPT as native English speaker ones are from a prominent machine-learning data set of common essays, that seem likely to have been included in any training data, where the non-English ones were taken from a random forum.
Can someone help me understand if I’ve understood the flaws correctly here? If so, does this paper add anything beyond just confirming that there’s at this point of time absolutely no value in GPT detectors?
I strongly believe that any false positive rate here means they shouldn’t be used at all to detect cheating. It really is morally wrong to potentially fail people that have worked hard to produce something then to question the integrity of it.
There is no chance whatsoever that a tool will be able to reliably tell the difference, and I can’t understand how anyone thinks such a thing is possible.
It’s not clear how there could be a mechanism of action for such a thing.
We’re quite literally talking about an infinite number of monkeys scenario.
Consider this thought experiment. We will start with a piece of text that your detector is 100% certain was created by a GPT tool.
Now, actually prove that there is no way whatsoever for at least one human being to independently create this piece of text given a reasonably plausible prompt.
I believe that in a couple of years, people will start looking at writings (any writing) in a very different way than they do now. Apart from the fact that spelling or grammatical mistakes will join history, people will take anything written on a much less serious or personal note. Although the less personal aspect is gratifying, the less serious tone is a huge negative. Writers will move to a new level of writing skills in order to keep relevant and retain their audience. Puns, teasing, and unorthodox expressions will flood the internet. Writing styles will become trademarks! Language itself, especially English, will be much more standardized, as if it's a programming code. The bad will get badder and the good will be gooder.
I don't think that's really what's going to happen... People stuffing puns in just to convince you it's human written just sounds silly. Not to mention you can ask GPT to do that too lol
What's going to happen is that branding will get more and more important. People simply won't trust anything from random sources anymore like they do now when they search on Google and treat the top pages as the truth
They will get most of their info from trusted sources/brands and filter out the rest
There was a time a beautiful hand writing was an incredible asset as a person, you'd be writing letters and invitations, and people would look at your script and assert your personality from it. Companies would pass resumes through a mentalist looking mostly at your hand writing.
And all of that just went down the drain with the digital world, but we still have people enjoying beautiful writing for the sake of it. Pen and paper addicts still abound.
I see stylistic forms analysis going the same route. Nowadays we pay attention to it in many settings, but down the line it should become a niche hobby for people really enjoying it.
I think the opposite will happen. People's tolerance of fluff will go to zero, whether it's AI- or human-generated. People will demand straight facts, without any linguistic embellishments. Just pure information, simple.wikipedia.org style.
Plagiarism detectors are inherently hard to create because they're trying to detect an amorphous concept right from the start: Originality.
Instead, more emphasis should be placed on proper citations & individual short Q&A (< 5 questions) about the writing in question: What they've researched, unexpected hurdles, research methodologies & tools, main references used, etc. Perfect recall of what's been written is not the aim, but rather that the author is able to understand what has been supposedly written by their own efforts, along with the citations used in their works.
In fact, as of writing this comment, it could be fun to see what an LLM would produce as questions to such a paper in question, and have the author answer those questions on the spot. This can be used as a teaching lesson on the limits of what an LLM can accomplish, as well as proof that the author can at the very least withstand surface-level examinations from an automated system. Those with (stage fright / social anxiety / vocal disabilities) could be given extra time to come up with answers to said questions, in an attempt to balance out any advantages that could be given to confident authors via this method.
Using AI to detect LLM writing is a fool's errand. You can easily generate text using GPT4 that is absolutely same as how many humans would write it (off course apart from the boilerplate "As an AI language model"). All it will do is penalize some specific writing styles. Ironically it will drive those people to use GPT and ask it to rewrite their work in a different style.
>>Our
findings reveal that these detectors consistently misclassify non-native
English writing samples as AI-
generated, whereas native writing
samples are accurately identified.
Furthermore, we demonstrate that
simple prompting strategies can not only mitigate this bias but also
effectively bypass GPT detectors,
suggesting that GPT detectors may
unintentionally penalize writers with
constrained linguistic expressions.
Essentially ChatGPT writes like a non native english speaker. It has to translate from it's computer language into english.
I think it's more likely that humans will adapt to ai "style" then the other way round. Look at the elaborate and often exquisitely constructed language that books used 100 years ago. Does anyone think that the art of good writing has not degenerated?
This is a needlessly inflammatory and judging from the comments quite distracting phrasing for the title. The really interesting thing here is that the detectors appear to recognize machine output on account of it is relatively less proficient than fluent writers.
I have a strong accent and when I call thebyoical government/bank phone that requires voice responses it typically is hopeless. This is a problem with all such automated systems, although it is often also a problem with people ...
The abstract claims, "whereas native writing samples are accurately identified", which is very different from claims I've seen elsewhere about how well these detectors work. The test they're reporting is running the detectors on "88 US 8-th grade essays sourced from the Hewlett Foundation’s Automated Student
Assessment Prize (ASAP) dataset", and they got a false positive rate of ~10%.
It seems quite natural that it’s easier to interpret correctly written English than language that contains mistakes. Does anyone find this controversial?
The next step will be students required to "show their work" on writing assignments. That is, some type of edit history or snapshots of the work-in-progress. Of course that could also be faked, but would require quite a bit more work.
Just like code is easier to examine and authenticate code if you can see the commit history.
[+] [-] iinnPP|2 years ago|reply
Not to mention the fact that being called non-human is most definitely going to offend some people.
What exactly makes anyone think that they can detect an LLM that is outputting text? The notion seems absurd yet it keeps coming up.
[+] [-] wheybags|2 years ago|reply
When you read LLM output, you can often tell. The source of the notion is that we can do it pretty well ourselves, so if the AIs are so magical then they should be able to do it too (not saying I agree, but it is a pretty clear line of logic).
[+] [-] mtlmtlmtlmtl|2 years ago|reply
Probably the fact that if they admit the reality, they have to think about some difficult and profound questions. It's much easier just to posit an imaginary future technology and decide that will solve it.
[+] [-] eternalban|2 years ago|reply
My sense of the general idea (non-authorative): Since the sequence emitted by an LLM is probabilistic completion i.e. predict the next word, the examiner can also do the same by progressively processing the text. Given the assumption that the semantic relations extracted from training corpus should be fairly universal for a given domain at the output level (even though distinct LLMs will likely have distinct embedding spaces), then the examiner LLM should be able to assign probabilities to the predicted words. The idea is that a genuine human produced text will have idiosyncrasies that are -not- probabilistically optimal and the examiner can establish a sort of 'distant from probable mean' measure, with the expectation that LLM produced text should be 'closer' to the examiner's predictions of 'the next word'.
The problem (if above is correct) then is the missing 'prompt' and meta-instruction embedded therein. Those should ("engineering") affect the output, possibly skewing the distance measure, thus defeating the examiner. But of course, say in context of academia, the examiner can 'guess' as to some aspects of the prompt as well. For example, if you are examining papers for a specific assignment, the examiner can self-prompt as well. "An essay on Hume's position on the knowledge of the self".
[+] [-] jstanley|2 years ago|reply
Can you come up with any type of system that this does not apply to?
https://www.lesswrong.com/posts/G5eMM3Wp3hbCuKKPE/proving-to...
[+] [-] teeray|2 years ago|reply
[+] [-] Nowado|2 years ago|reply
[+] [-] kvetching|2 years ago|reply
I think one solution would be a word processor that records the process of writing a paper and you need to turn in your paper along with your recording. Of course this is going to create added stress but what else do we do? There's going to be GPT scramblers that remove watermarks.
[1] https://www.businessinsider.com/professor-fails-students-aft...
[+] [-] thatsadude|2 years ago|reply
Publish or perish!
[+] [-] golergka|2 years ago|reply
Watermarking.
On each word of the output, you randomly split all possible words into two groups and only generate output using one of them. If you get a text of 1000 words that exactly follow the secret sequence of the groups, you can be sure that it's generated by this LLM with one in 2^1000 chance of error.
[+] [-] onos|2 years ago|reply
https://openai.com/blog/new-ai-classifier-for-indicating-ai-...
[+] [-] explorer83|2 years ago|reply
[+] [-] coconuthacker42|2 years ago|reply
[+] [-] sour-taste|2 years ago|reply
Difficulties I can think of:
* Getting ai companies to offer this. I don't think it comes with a downside for them really though. You couldn't use it to actually retrieve results in any way, and you wouldn't have prompts.
* This only detects exact matches, people changing the output would defeat it. Fixable with some kind of fuzzy search that returns a 'distance to nearest response' but this obviously makes it more expensive and difficult to run, use, etc.
* People could still run the model themselves, as models get better and more expensive maybe this becomes less of a problem. Or maybe models get small enough while still generating good output that it becomes more of a problem. Who knows.
At least this would avoid AI grading AI issues
[+] [-] sebzim4500|2 years ago|reply
It has most of the same problems you list, except it is much more robust against small changes to the text.
As of a few weeks ago, he mentioned that OpenAI had a working implementation and were discussing whether to start using it. I assume they'd tell people before they turn it on in prod, I see no advantage in secrecy.
[+] [-] dingledork69|2 years ago|reply
[+] [-] mkoubaa|2 years ago|reply
[+] [-] ShamelessC|2 years ago|reply
[+] [-] adhesive_wombat|2 years ago|reply
Considering that AI writing standards will probably only increase from here, encroaching on the territory native English speakers operate at, perhaps the best way to mark yourself as not-AI is to start making many errors.
This will work for a week or two until someone tunes a model to also do this, and the arms race grinds ever on!
It also goes for other languages too, so luckily for me, I have a short period where my dreadful other languages will be an advantage.
[+] [-] anothernewdude|2 years ago|reply
[+] [-] cpif|2 years ago|reply
GPT detection ought to be optimized for longform texts, so tests of its efficacy should be, too. Perhaps the current detectors on the market are trying to assess writing at the sentence level, but if that's the case then it should be obvious that they will be inaccurate.
GMail can write most of a 100-word email for me if I type "Thanks so much" and hit tab. That's a good thing. If GPT is useful as a "productivity" tool, it is for low-level "writing" tasks like this, which aren't really writing at all, just rote responses. Anyone who can access this tool (if they have confidence in its prowess) should use it.
Writing proper is about developing ideas, and it's this that people need to be concerned about. It's true that if your college admissions essay is riddled with typos, eyebrows may be raised, but its ultimate significance is whether you can and are willing to reason. If someone is using ChatGPT (or spellcheck, or Grammarly) in order to cultivate the appearance of having "proper English," who cares? But if they're using GPT to avoid thinking altogether, that's a problem.
At ~250 words, I guess the only thing you can assess is good formal usage. It's patently obvious that both GPT and non-native English speakers will outperform native speakers on this.
Of course, I hope that anyone in the position to assess short-form writing is made aware of this research, and is cautious about GPT detection. On the other hand, it'd be pretty funny if idiosyncratic grammatical choices became the marker of a human hand.
[+] [-] Gimpei|2 years ago|reply
[+] [-] skummetmaelk|2 years ago|reply
[+] [-] janekm|2 years ago|reply
[+] [-] niel|2 years ago|reply
I also wonder which proportion of English writing (in general) is written by non-native speakers, and whether we might be disproportionately represented in training data.
[+] [-] freehorse|2 years ago|reply
[+] [-] anothernewdude|2 years ago|reply
[+] [-] janalsncm|2 years ago|reply
The fact of the matter is none of these supposed “AI detectors” are reliable. GPTZero claims to be the #1 AI detection system which is a little like claiming to be the world’s best perpetual motion machine.
Anyways, that’s why I created this “tool”, hopefully it can get to the top of Google: https://isthiswrittenbyai.surge.sh/
[+] [-] sys_64738|2 years ago|reply
[+] [-] AdieuToLogic|2 years ago|reply
Consider a different moral dilemma:
If I had a job which I could not do and used AI tools to clean up my work, does this mean I'm cheating my employer? Even if my ideas might be incorrect, a fact hidden by the "AI clean up", yet are my own?
[+] [-] onion2k|2 years ago|reply
No, the ideas are not your own. They're a combination of yours and the AIs (which is a combination of millions of other people's ideas really), just as they are for anyone who writes a prompt. The fact that you aren't a native English speaker, or that you're using AI for 'good', are irrelevant. If you use AI to produce content of any sort you have to accept that you are not really the author. You just wrote the prompt.
[+] [-] petesergeant|2 years ago|reply
I am genuinely confused.
It seems that all the test data provided were real human essays. The ones provided to GPT as native English speaker ones are from a prominent machine-learning data set of common essays, that seem likely to have been included in any training data, where the non-English ones were taken from a random forum.
Can someone help me understand if I’ve understood the flaws correctly here? If so, does this paper add anything beyond just confirming that there’s at this point of time absolutely no value in GPT detectors?
[+] [-] pylua|2 years ago|reply
[+] [-] tgv|2 years ago|reply
[+] [-] CPLX|2 years ago|reply
It’s not clear how there could be a mechanism of action for such a thing.
We’re quite literally talking about an infinite number of monkeys scenario.
Consider this thought experiment. We will start with a piece of text that your detector is 100% certain was created by a GPT tool.
Now, actually prove that there is no way whatsoever for at least one human being to independently create this piece of text given a reasonably plausible prompt.
If you can’t prove that your tool is bullshit.
[+] [-] alex201|2 years ago|reply
[+] [-] weird-eye-issue|2 years ago|reply
What's going to happen is that branding will get more and more important. People simply won't trust anything from random sources anymore like they do now when they search on Google and treat the top pages as the truth
They will get most of their info from trusted sources/brands and filter out the rest
[+] [-] makeitdouble|2 years ago|reply
And all of that just went down the drain with the digital world, but we still have people enjoying beautiful writing for the sake of it. Pen and paper addicts still abound.
I see stylistic forms analysis going the same route. Nowadays we pay attention to it in many settings, but down the line it should become a niche hobby for people really enjoying it.
[+] [-] meindnoch|2 years ago|reply
[+] [-] k__|2 years ago|reply
I tried tools like GPTZero on dozens of my hand written texts and AI texts.
Random results all the way.
After an editor had their hands on an AI text, you can't tell the difference anymore.
[+] [-] x-complexity|2 years ago|reply
Instead, more emphasis should be placed on proper citations & individual short Q&A (< 5 questions) about the writing in question: What they've researched, unexpected hurdles, research methodologies & tools, main references used, etc. Perfect recall of what's been written is not the aim, but rather that the author is able to understand what has been supposedly written by their own efforts, along with the citations used in their works.
In fact, as of writing this comment, it could be fun to see what an LLM would produce as questions to such a paper in question, and have the author answer those questions on the spot. This can be used as a teaching lesson on the limits of what an LLM can accomplish, as well as proof that the author can at the very least withstand surface-level examinations from an automated system. Those with (stage fright / social anxiety / vocal disabilities) could be given extra time to come up with answers to said questions, in an attempt to balance out any advantages that could be given to confident authors via this method.
[+] [-] crop_rotation|2 years ago|reply
[+] [-] FridayoLeary|2 years ago|reply
Essentially ChatGPT writes like a non native english speaker. It has to translate from it's computer language into english.
I think it's more likely that humans will adapt to ai "style" then the other way round. Look at the elaborate and often exquisitely constructed language that books used 100 years ago. Does anyone think that the art of good writing has not degenerated?
[+] [-] User23|2 years ago|reply
[+] [-] bigbacaloa|2 years ago|reply
[+] [-] jefftk|2 years ago|reply
[+] [-] Grustaf|2 years ago|reply
[+] [-] ballenf|2 years ago|reply
Just like code is easier to examine and authenticate code if you can see the commit history.