top | item 35395333

(no title)

By "diversity," do you mean something like "entropy?" Like maybe

    H_s(x) := -\sum_{x \in X_s} p(x) log(p(x))

where X_s := all s-grams from the training set? That seems like it would eventually become hard to impossible to actually compute. Even if you could what would it tell you?

Or, wait... are you referring to running such an analysis on the output of the model? Yeah, that might prove interesting....

discuss

thomastjeffery|2 years ago

I'm really just speculating here.

Because the text we write is not evenly distributed random noise, what we encode into it (by writing) is entropy.

Because LLMs model text with inference, they model all of the entropy that is present.

That would mean that the resulting size would be a measure of entropy (sum of patterns) divided by repetition (recurring patterns). In this count, I would consider each unique token alone an instance of the identity pattern.

So to answer both questions: yes.