top | item 45322355 (no title) inciampati | 5 months ago It turns out you can use a fused triton kernel for a true RNN GRU and run just as fast as the minGRU model in training. Yeah, it doesn't work for very long context but neither does minGRU (activation memory...) discuss order hn newest No comments yet.
No comments yet.