top | item 42493286

(no title)

mcyc | 1 year ago

NB: Can't edit my original reply.

Sorry actually I misread part of your comment in relation to the paper and confused δ and another parameter, K.

To clarify, δ is the number of tokens in the tokenized corpus and K is the size of the vocabulary.

So, if you are asking about why would they limit _K_, then my answer still applies (after swapping δ for K). But if you still mean "why do they pick some arbitrary δ as the limit of the size of the tokenized corpus", then I think the answer is just "because that makes it a decision problem".

discuss

order

bnjmn|1 year ago

Thanks for these detailed replies! Now I really want to read your paper.