top | item 46489733 (no title) remexre | 1 month ago Or more like, x = tokenize(input) i = 0 do { finish, x = layers(x) } while(!finish && i++ < t_max); output = lm_head(x) discuss order hn newest oofbey|1 month ago That’s closer still. But even closer would be: x = tokenize(input) i = 0 finish = 0 do { p, x = layers(x) finish += p } while(finish < 0.95 && i++ < t_max); output = lm_head(x) Except the accumulation of the stop probabilities isn’t linear like that - it’s more like a weighted coin model.
oofbey|1 month ago That’s closer still. But even closer would be: x = tokenize(input) i = 0 finish = 0 do { p, x = layers(x) finish += p } while(finish < 0.95 && i++ < t_max); output = lm_head(x) Except the accumulation of the stop probabilities isn’t linear like that - it’s more like a weighted coin model.
oofbey|1 month ago