top | item 45674842 (no title) darkbatman | 4 months ago By looking at the paper, memory needed per layer seems to be higher than transformer architecture. Pretty sure that would be blowing up the vram of gpu at scale. discuss order hn newest No comments yet.
No comments yet.