top | item 40298173 (no title) korbip | 1 year ago This was formulated a bit unclear. It is not possible to parallelize in the sequence dimension for training as it is possible for Transformers. In the batch dimension you can always do it. discuss order hn newest No comments yet.
No comments yet.