I often see comments on Github issues where poor wording makes it difficult to understand what is actually meant. Things like "I reproduced the bug on Linux, then I tried Windows. I can't reproduce it now." Does that mean it's just not reproducible on Windows, or not reproducible at all anymore? Ambiguities like that are especially annoying when it's someone posting a solution to a problem. Sometimes it's because of grammatical errors, sometimes not.I think LLMs are actually great for catching things like this, and you generally don't need some higher-level understanding about the goals involved to notice the ambiguity. My point wasn't that bots shouldn't be used like this, just that they need to be given the right instructions.
samrus|7 months ago
this is the part im talking about. i also think LLMs are very capable at detecting different types and levels or grammar, but they cant decide which ones should be filtered out to meet a certain goal. they need detailed instructions for that, and thats somewhat inefficient and causes issues like this right here.
we have done this song and dance many times with AI. its the bitter lesson: you need a system that learns these things, you cant just give it a rule based engine to patch the things it cant learn. that works in the short term, but leads to a dead end. we need something that has the "common sense" to see when grammar is fine versus hindering communication, and this just isnt there yet. so it needs to be given detailed instructions to do so. which may or may not be sustainable
Asraelite|7 months ago
It sounds like you mean that curating messages purely to conform to a particular style guide requires context-dependent information and can never be accomplished reliably with some unmodified generic prompt across many different projects.
I'm saying that while this is true, if you ignore grammar guidelines and just look for cases of ambiguous and confusing wording specifically, then this can actually be accomplished reliably with a generic prompt, if you get that prompt right. Not 100% accurately of course, but good enough that it would be beneficial overall in the long run.