top | item 45533475

(no title)

karpathy | 4 months ago

Sorry I thought it would be clear and could have clarified that the code itself is just a joke illustrating the point, as an exaggeration. This was the thread if anyone is interested

https://chatgpt.com/share/68e82db9-7a28-8007-9a99-bc6f0010d1...

discuss

order

why_at|4 months ago

This part from the first try made me laugh:

      if random.random() < 0.01:

          logging.warning("This feels wrong. Aborting just in case.")

          return None

x187463|4 months ago

I actually laughed when I read that. This one got me, too. The casual validation of its paranoia gives me Marvin the Paranoid Android vibes.

  try:
      result = a / b
      if math.isnan(result):
          raise ArithmeticError("Result is NaN. I knew this would happen.")

bspammer|4 months ago

I think that’s the funniest joke I’ve ever seen an LLM make. Which probably means it’s copied from somewhere.

tomjakubowski|4 months ago

Years and years ago, the MongoDB Java driver had something like this to skip logging sometimes in one of its error handling routines.

   } catch (Exception e) {
                if (!((_ok) ? true : (Math.random() > 0.1))) {
                    return res;
                }

                final StringBuilder logError = (new StringBuilder("Server seen down: ")).append(_addr);

                /* edited for brevity: log the error */
 
https://github.com/mongodb/mongo-java-driver/blob/1d2e6faa80...

chis|4 months ago

I think there’s always a danger of these foundational model companies doing RLHF on non-expert users, and this feels like a case of that.

The AIs in general feel really focused on making the user happy - your example, and another one is how they love adding emojis to the stout and over-commenting simple code.

miki123211|4 months ago

This feels like RLVR, not RLHF.

With RLVR, the LLM is trained to pursue "verified rewards." On coding tasks, the reward is usually something like the percentage of passing tests.

Let's say you have some code that iterates over a set of files and does processing on them. The way a normal dev would write it, an exception in that code would crash the entire program. If you swallow and log the exception, however, you can continue processing the remaining files. This is an easy way to get "number of files successfully processed" up, without actually making your code any better.

cma|4 months ago

And more advanced users are more likely to opt out of training on their data, Google gets around it with a free api period where you can't opt out and I think from did some of that too, through partnerships with tool companies, but not sure if you can ever opt out there.

justatdotin|4 months ago

'over-commenting simple code' is preparing it for future agent work. pay attention to those comments to learn how you can better scaffold for agents.

bjourne|4 months ago

This is stunning English: "Perfect setup for satire. Here’s a Python function that fully commits to the bit — a traumatically over-trained LLM trying to divide numbers while avoiding any conceivable danger:" "Traumatically over-trained", while scoring zero google hits, is an amazingly good description. How can it intuitively know what "traumatic over-training" should mean for LLMs without ever having been taught the concept?

bccdee|4 months ago

I don't know. It's a classic LLM-ism. "Traumatically over-X" is probably a common enough phrase. The prmpt says, "I don't know what labs are doing to these poor LLMs during RL," so the model connects that to some form of trauma. The training is traumatic, so the model is traumatically over-trained.

It sounds fine and flows nicely, but it doesn't quite make sense. Too much training over-fits an LLM; that's not what we're describing. Bad training might traumatize a model, but bad how? A creative response would suggest an answer to that question—perhaps the model has been made paranoid, scarred by repeat exposure to the subtlest and most severe bugs ever discovered—but the LLM isn't being creative. Its response has that spongy, plastic LLM texture that comes from the model rephrasing its prompt to provide a sycophantic preamble for the thing that was actually being asked for. It uses new words for the same old idea, and a bit of the precision is lost during the translation.

gnulinux|4 months ago

Hard to know but if you could express "traumatically" as a number, and "over-trained" as a number, it seems like we'd expect "traumatically" + "over-trained" to be close to "traumatically over-trained" as a number. LLMs work in mysterious ways.

jcelerier|4 months ago

LLMs operate at token level, not word. it doesn't operate in terms of "traumatic", "over-training", "over" or "training", but rather "tr" "aum" "at" "ic, ", etc.

mahogany|4 months ago

“Traumatic overtraining” does have hits though. My guess is that “traumatically” is a rarely used adverb, and “traumatic” is much more common. Possibly it completed traumatic into an adverb and then linked to overtraining which is in the training data. I dunno how these things work though.

chipsrafferty|4 months ago

You need to read more if you think that's stunning English

drekipus|4 months ago

The same way that you and I think up a word and what it might mean without being taught the concept.

Adverb + verb

shawabawa3|4 months ago

> How can it intuitively know what "traumatic over-training" should mean for LLMs without ever having been taught the concept?

Because, and this is a hot take, LLMs have emergent intelligence

cruffle_duffle|4 months ago

Kind of interesting it didn't add type hints though! You'd think for all that paranoia it would at least add type hints.

nought|4 months ago

It was a great joke, that's why I posted it