top | item 29190004

(no title)

gajomi | 4 years ago

There are a couple starting points you could take. I spent a weekend hacking out a program that generates fake word/definition pairs with a transformer model set against a dictionary: https://youtu.be/XnJ2TKAn-Vk?t=1547. If you substitute fake words for real words and have a sufficiently accurate model you could quickly generate reasonable and novel definitions.

There are more complete versions of this kind of thing publicly available: https://github.com/turtlesoupy/this-word-does-not-exist

> This would be amazing, for example, to run on a large corpus, generate the dictionary, and then run it again to find words that are used but not defined - not just in the original corpus but in the definitions too.

I think this would be how you would gauge success of the model. That is to say, you would evaluate model accuracy on a set of held-out words with definitions that never appeared in your dictionary training set but appeared in context in your corpus. You would have to manually annotate whether or not the generated definition of these held out words was acceptable.

discuss

boffinAudio|4 years ago

Thanks - that is indeed very interesting, and I will spend my weekend checking it out.

>I think this would be how you would gauge success of the model.

Yes, exactly. I think there would definitely be edge-cases, but the general rule is that there should not be any undefined terms/words in the final dictionary. The degree to which this can be achieved is of course related to the cyclomatic complexity of the original materials. But this is why I want this tool - to see how effective it is for creating training materials that prepare students for obtuse subjects.