top | item 46389956

(no title)

BarryMilo | 2 months ago

I recently witnessed one such potential fuckup. The AI had written functioning code, except one of the business rules was misinterpreted. It would have broken in a few months time and caused a massive outage. I imagine many such time bombs are being deployed in many companies as we speak.

discuss

order

tombert|2 months ago

Yeah; I saw a 29,000 line pull request across seventy files recently. I think that realistically 29,000 lines of new code all at once is beyond what a human could understand within the timeframe typically allotted for a code review.

Prior to generative AI I was (correctly) criticized once for making a 2,000 line PR, and I was told to break it up, which I did, but I think thousand-line PRs are going to be the new normal soon enough.

aurareturn|2 months ago

That’s the fault of the human who used the LLM to write the code and didn’t test it properly.

tombert|2 months ago

Exhaustive testing is hard, to be fair, especially if you don’t actually understand the code you’re writing. Tools like TLA+ and static analyzers exist precisely for this reason.

An example I use to talk about hidden edge cases:

Imagine we have this (pseudo)code

  fn doSomething(num : int) {
    if num % 2 == 0 {
      return  Math.sqrt(num)
    } else {
       return Math.pow(num, 2)
    }

  }
Someone might see this function, and unit test it based on the if statement like:

    assert(doSomething(4) == 2)
    assert(doSomething(3) == 9)
These tests pass, it’s merged.

Except there’s a bug in this; what if you pass in a negative even number?

Depending on the language, you will either get an exception or maybe a complex answer (which not usually something you want). The solution in this particular case would be to add a conditional, or more simply just make the type an unsigned integer.

Obviously this is just a dumb example, and most people here could pick this up pretty quick, but my point is that sometimes bugs can hide even when you do (what feels like) thorough testing.