top | item 40298173

(no title)

korbip | 1 year ago

This was formulated a bit unclear. It is not possible to parallelize in the sequence dimension for training as it is possible for Transformers. In the batch dimension you can always do it.

discuss

order

No comments yet.