top | item 37989997

(no title)

wgd | 2 years ago

The approach proposed in this paper is to watermark LLM generated text using character-substitution from various simple characters (normal whitespace, normal letters, etc) to semantically equivalent Unicode code points (such as U+2004 THREE-PER-EM SPACE instead of normal spaces, or replacing specific character sequences with equivalent ligatures).

The authors appear to be entirely aware that this sort of substitution can be trivially stripped out by normalizing down to a simplified character set ("The critical limitation of Whitemark is that it can be bypassed by replacing all whitespaces with the basic whitespace U+0020, then the validator can no longer detect the watermark"), but believe that it still has value because the typical student using an LLM to write their essay won't know anything about Unicode.

This seems a bit naive to me. Implementing the necessary "watermark remover" normalization as a simple webapp would be an easy afternoon project for most of us here, and if this approach reached any sort of widespread use there would be many such sites. Students who intend to cheat by using an LLM to write their essays are entirely capable of learning "there's some secret data hidden in the text so copy-paste it through this other site to strip that out before turning it in". Even without access to such a tool they could simply...retype the text themselves?

Arguably this still has some value. In most contexts there is minimal downside to watermarking the generated text in this way, and a slight possibility of catching some cases in which people lazily present LLM generated text as human written. However this might give people a misplaced belief that the absence of such a watermark means the text is authentically human authored, which might outweigh the benefits of catching the occasional lazy or ignorant user.

discuss

jstanley|2 years ago

> Students who intend to cheat by using an LLM to write their essays are entirely capable of learning "there's some secret data hidden in the text so copy-paste it through this other site to strip that out before turning it in"

In fact there is precedent for this. When I was at school a lot of kids would start writing an essay by copy and pasting the most relevant Wikipedia article into Microsoft Word, and then edit it to sound different, but this resulted in a subtle light-blue background being inserted into the resulting printed page, which made it very obvious that they had copied from Wikipedia. They quickly learnt that they had to paste it through Notepad or similar first to get rid of the background colour.

adhesive_wombat|2 years ago

Has anyone ever actually wanted paste from an HTML source into a word processor to drag all the random formatting along for the ride? I still don't understand why that is the default. I see about an email per day with mismatched styles because people are pasting from various documents, each with their own slightly different formatting. No one really cares, presumably, but it's ugly. Give me plain text any day (except as the company logo is inserted as an image, it's verboten by the server now).

It's usually Ctrl-Shift-V to not include formatting (or get a menu of options, of which that's one), by the way.

PaulHoule|2 years ago

Reminds me of the time I was in college and a friend of mine in CS 101 wanted my help in stealing somebody else's programming assignment. We had no trouble stealing one from an account which had open permissions but the program had bugs and we had to fix it.

I could hardly comprehend, at that time, how much this was preparation for a career in software development.

morpheuskafka|2 years ago

As popular as ChatGPT is, I'm sure it will only take a few weeks before TikToks are widely circulating instructing users how to un-flag text if this was adopted. It would be so widely known that even non tech savvy students would be searching or asking friends how to get away with using it. Either a web app or saving as ASCII text in Notepad will probably be the preferred approach.

Smoosh|2 years ago

> TikToks are widely circulating

No need to waste all that time watching a TikTok video - just ask ChatGPT to do it for you.

__MatrixMan__|2 years ago

Might be worth doing anyway. Best to start practicing good opsec early. They might need it for something important later.

PaulHoule|2 years ago

They are too lazy to retype.

There are so many ways you could catch leakers of sensitive information this way. Look at how often government agencies react information in PDFs by writing black blocks over the text.

Note it could be used for authentication in the opposite direction, only accepting text with the unusual spaces in it.

So far as catching the indolent and the ignorant, making an example here or their works wonders.

blackhaz|2 years ago

I wonder if it is possible to implement a watermark via patterns of Oxford comma occurrences and similar linguistic styles that would fit into a simple character set.

throwanem|2 years ago

It's possible to implement a purely informational watermark lots of ways, but no such watermark long survives wide awareness of its use. To know how it works is to know how to defeat it.

This specific scheme is also not remotely novel; I once saw it implemented, something like six or eight years back, in an effort to quell leaks to an industry rag with a habit of posting paragraph-length excerpts verbatim. They also did this with some of the watermarked emails, having stripped the watermarking whitespace before publication.

thaumasiotes|2 years ago

> I wonder if it is possible to implement a watermark via patterns of Oxford comma occurrences

This would be a glaring stylistic inconsistency in every text produced with a watermark. You could just as well implement a watermark by doing automated thesaurus replacements on certain of the words and using the index of the selected entry as a code.

A watermark that deeply unnerves everyone who reads the text can carry information, but it tends to render the tool itself unfit for purpose.

Izkata|2 years ago

> Even without access to such a tool they could simply...retype the text themselves?

There was a story I remember hearing, I think from an older student during highschool or during college from another student's highschool, where some kid was cheating by copying a hand-written paper from another student, and the paper had two names on it. They had put their name in the corner then just blindly wrote all text on the other paper, including the other student's name.

nighthawk454|2 years ago

Yeah, someone will certainly make a tool to strip that out. This doesn't seem as useful for cheating, but more as a notice for content that was AI generated. That could be useful just as an indicator for general website content. Presumably there are also trivial watermarks for images/video/audio. Of course it can be easily foiled, so it's more of a disclaimer out of politeness.

robomc|2 years ago

And there's already a chrome plugin that pings you if you copy watermarked text (or rather text with weird characters in it).

alpaca128|2 years ago

> they could simply...retype the text themselves?

Or copy the text using an OCR app.

quic5|2 years ago

You're right we should (ab)use another unicode code point to indicate that this text was scanned using an OCR app /s