top | item 41159969

(no title)

preciz | 1 year ago

You are DEEPLY WRONG on all issues you mentioned.

Open weight models do not require great investment. In fact I can run them on my 400 EUR computer.

Also why you want to regulate text output from machines in the name of "public good"? That's insanity.

discuss

lewhoo|1 year ago

Why exactly is it insane ? To reliably differentiate (let's assume it's possible for the sake of argument) between "you made this" and "you didn't make this" or at least "a human made this" seems to carry mostly (if not only) benefits.

fragmede|1 year ago

the problem is your parenthetical - it's not possible, so attempting to do so isn't actually really possible. what's worse than a watermark? one that doesn't actually work.

pona-a|1 year ago

The comment was referring to models close to the recent releases from Meta and Mistral, reaching up to 405B with performance competitive with large commercial vendors. These models absolutely can't be trained without significant investment, and their inference without a cloud provider isn't cheap either. As I had mentioned, nothing short of not having released the weights could have stopped the abuse, but still, a fraction of it could be deterred, hopefully adding up to a few billion less spam pages for search engines to serve back to you.

As for the rationality of watermarking itself, firstly I'd like to reiterate, no spam wave of this magnitude and undetectability has ever happened in the history of the web. A word processor cannot write a petabyte of propaganda on its own. A Markov chain can't generate anything convincing enough to fool a human. Transformer-based LLMs are the first of their kind and should be treated as such. There is no quick analogy or a rule of thumb to point to.

If statistical watermarking is proven to have sufficient recall and error, there'll be nothing to lose in implementing it. A demand already exists for detecting AI slop; half-working BERT classifiers and prejudiced human sniff tests already provide for it, with little incentive to reduce false positives. With watermarks, there'll be a less painful, more certain way to catch the worst offenders. Do you really think the same operations that produce papers with titles like "Sorry, as an AI model..." or papers with pieces of ChatGPT UI text will care to roundtrip translate or rewrite entire paragraphs?

We already had this exact dilemma back when email spammers tried Bayesian poisoning [0]. Turns out, it actually creates an identifiable pattern, if not for the system, then for the user on the other side. People will train themselves to look for oddly phrased sentences or the outright nonsense roundtripping produces, abrupt shifts in writing style, and other heuristics, and once the large enough corpus is there, we can talk about training a new classifier, this time on a much more stable pattern with less type-I errors.

[0] https://en.wikipedia.org/wiki/Bayesian_poisoning