I’d be curious to hear more on Attempt 2. It sounds like the approach was basically to ask an llm for a score for each comment. Adding specifics to this prompt might go a long way? Like, what specifically is the rationale for this change, is this likely to be a functional bug, is it a security issue, how does it impact maintainability over the long run, etc.; basically I wonder if asking about more specific criteria and trying to define what you mean by nits can help the LLM give you more reliable scores.
dakshgupta|1 year ago