top | item 40579773

(no title)

Good point - I saw the FLAN anomaly and this didn’t occur to me!

A good follow up question would be: why didn’t the other models do better on the 2nd-order question? Especially BLOOM and davinci-003, which were middling on the 1st-order question.

I agree on your overall criticism of the experimental protocol, though.

discuss

No comments yet.