Yes, unfortunately we have to rely on the very brittle "exact match" method of evaluating whether an answer is correct. FWIW and perhaps surprisingly, this is the primary way question-answering systems are evaluated in common benchmarks. I totally agree that fine-tuning T5 for answer grading would be super interesting!
modeless|6 years ago
lsb|6 years ago
dmit|6 years ago
craffel|6 years ago
svnpenn|6 years ago
schoen|6 years ago