You're saying this as if the result is unsurprising, however it is significant that the performance jumps so dramatically and it is not a fundamental issue of capability, just a bias in the model to be hesitant towards providing false information. That's a good insight, as it can allow further fine-tuning towards getting that balance right, so that careful prompt engineering is no longer necessary to achieve high P/R on this task.
crawfordcomeaux|2 years ago