(no title)
utdiscant | 1 year ago
This metric would go up if you leave almost no comments. Would it not be better to find a metric that rewards you for generating many comments which are addressed, not just having a high relevance?
You even mention this challenge yourselves: "Sadly, even with all kinds of prompting tricks, we simply could not get the LLM to produce fewer nits without also producing fewer critical comments."
If that was happening, that doesn't sound like it would be reflected in your performance metric.
dakshgupta|1 year ago
SomewhatLikely|1 year ago