top | item 45674842

(no title)

darkbatman | 4 months ago

By looking at the paper, memory needed per layer seems to be higher than transformer architecture. Pretty sure that would be blowing up the vram of gpu at scale.

discuss

order

No comments yet.