(no title)
gajomi | 4 years ago
There are more complete versions of this kind of thing publicly available: https://github.com/turtlesoupy/this-word-does-not-exist
> This would be amazing, for example, to run on a large corpus, generate the dictionary, and then run it again to find words that are used but not defined - not just in the original corpus but in the definitions too.
I think this would be how you would gauge success of the model. That is to say, you would evaluate model accuracy on a set of held-out words with definitions that never appeared in your dictionary training set but appeared in context in your corpus. You would have to manually annotate whether or not the generated definition of these held out words was acceptable.
boffinAudio|4 years ago
>I think this would be how you would gauge success of the model.
Yes, exactly. I think there would definitely be edge-cases, but the general rule is that there should not be any undefined terms/words in the final dictionary. The degree to which this can be achieved is of course related to the cyclomatic complexity of the original materials. But this is why I want this tool - to see how effective it is for creating training materials that prepare students for obtuse subjects.