top | item 46489733

(no title)

remexre | 1 month ago

Or more like,

    x = tokenize(input)
    i = 0
    do {
      finish, x = layers(x)
    } while(!finish && i++ < t_max);
    output = lm_head(x)

discuss

oofbey|1 month ago

That’s closer still. But even closer would be:

    x = tokenize(input)
    i = 0
    finish = 0
    do {
      p, x = layers(x)
      finish += p
    } while(finish < 0.95 && i++ < t_max);
    output = lm_head(x)

Except the accumulation of the stop probabilities isn’t linear like that - it’s more like a weighted coin model.