(no title)
VHRanger | 2 months ago
The issue is that they're generally harder to train (need input/output pairs as a training dataset) and don't naturally generalize as well
VHRanger | 2 months ago
The issue is that they're generally harder to train (need input/output pairs as a training dataset) and don't naturally generalize as well
GaggiX|2 months ago
Decoder-only models also do this, the only difference is that they use a masked attention.