top | item 22294535 (no title) octbash | 6 years ago Those are question-answering and language-understanding benchmarks respectively, neither of which has been suitable for language generation mode evaluation since GPT-1 was roundly beating by BERT. GPT-2 didn't evaluate on them either. discuss order hn newest No comments yet.
No comments yet.