top | item 43397030

(no title)

mchinen | 11 months ago

Claude Shannon was interested in this kind of thing and had a paper on the entropy per letter or word of English. He also has a section in his famous "A Mathematical Theory of Communication that has experiments using the conditional probability of the next word based on the previous n=1,2 words from a few books. I wonder if the conditional entropy approaches zero as n increases assuming ergodicity. But the number of entries in the conditional probability table blows up exponentially. The trick of combining multiple n=1 of different distances sounds interesting, and reminds me a bit of contrastive prediction ml methods.

Anyway the experiments in Shannon's paper sound similar to what you describe but with less data and distance, so it should give some idea of how it would look: From the text:

* 5. First-order word approximation. Rather than continue with tetragram, : : : , n-gram structure it is easier and better to jump at this point to word units. Here words are chosen independently but with their appropriate frequencies.

REPRESENTING AND SPEEDILY IS AN GOOD APT OR COME CAN DIFFERENT NAT- URAL HERE HE THE A IN CAME THE TO OF TO EXPERT GRAY COME TO FURNISHES THE LINE MESSAGE HAD BE THESE.

6. Second-order word approximation. The word transition probabilities are correct but no further structure is included.

THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER THAT THE CHAR- ACTER OF THIS POINT IS THEREFORE ANOTHER METHOD FOR THE LETTERS THAT THE TIME OF WHO EVER TOLD THE PROBLEM FOR AN UNEXPECTED *

discuss

No comments yet.