(no title)
jorlow | 1 year ago
self.w2(F.silu(self.w1(x)) * self.w3(x))
I.e. the nonlinearity is a gate.https://github.com/meta-llama/llama3/blob/14aab0428d3ec3a959...
jorlow | 1 year ago
self.w2(F.silu(self.w1(x)) * self.w3(x))
I.e. the nonlinearity is a gate.https://github.com/meta-llama/llama3/blob/14aab0428d3ec3a959...
soraki_soladead|1 year ago