(no title)
rdos
|
19 days ago
Is it possible for such a small model to outperform gemini 3 or is this a case of benchmarks not showing the reality? I would love to be hopeful, but so far an open source model was never better than a closed one even when benchmarks were showing that.
amluto|19 days ago
retrac|19 days ago
I am reminded it's basically impossible to read cursive writing in a language you don't know even if it's the same alphabet.
vintermann|18 days ago
Evaluation methods, too, are bad because they don't think critically about what the downstream task is. Word Error Rate and Character Error Rate are terrible metrics for most historical HTR, yet they're what people use because of habit.
It's a bit like how for a long time BLEU was the metric for translation quality. BLEU is based on N-gram similarity to a reference translation, so naturally translation methods based on and targeting N-gram similarity (e.g. pre NN Google translate) did well, and looked much better than they actually were.
rdos|19 days ago
woeirua|19 days ago