(no title)
majormajor | 12 days ago
There are a lot of white-collar tasks that have far lower quality and correctness bars. "Researching" by plugging things into google. Writing reports summarizing how a trend that an exec saw a report on can be applied to the company. Generating new values to share at a company all-hands.
Tons of these that never touch the "real world." Your assistant story is like a coding task - maybe someone ran some tests, maybe they didn't, but it was verifiable. No shortage of "the tests passed, but they weren't the right test, this broke some customers and had to be fixed by hand" coding stories out there like it. There are pages and pages of unverifiable bullshit that people are sleepwalking through, too, though.
Nobody already knows if those things helped or hurt, so nobody will ever even notice a hallucination.
But everyone in all those fields is going to be trying really really hard to enumerate all the reasons it's special and AI won't work well for them. The "management says do more, workers figure out ways to be lazier" see-saw is ancient, but this could skew far towards "management demands more from fewer people" spectrum for a while.
t43562|11 days ago
In all areas where there's less easy ways to judge output there is going to be correspondingly more value to getting "good" people. Some AI that can produce readable reports isn't "good" - what matters is the quality of the work and the insight put into it which can only be ensured by looking at the workers reputation and past history.
naasking|11 days ago
That's not obvious at all if the AI writing the tests is different than the AI writing the code being tested. Put into an adversarial and critical mode, the same model outputs very different results.
nthj|11 days ago
I have been doing this with coding agents across LLM providers for a while now, with very successful results. Grok seems particularly happy to tell Anthropic where it’s cutting corners, but I get great insights from O3 and Gemini too.
pydry|11 days ago
Except the test suite isnt just something that appears and the bugs dont necessarily get covered by the test suite.
The bugginess of a lot of the software i use has spiked in a very noticeable way, probably due to this.
>But everyone in all those fields is going to be trying really really hard to enumerate all the reasons it's special and AI won't work well for them.
No, not everyone. Half of them are trying to lean in to the changing social reality.
The gaslighting from the executive side, on the other hand, is nearly constant.