top | item 40516901

(no title)

localhost | 1 year ago

This is a giant dataset of 536GB of embeddings. I wonder how much compression is possible by training or fine-tuning a transformer model directly using these embeddings, i.e., no tokenization/decoding steps? Could a 7B or 14B model "memorize" Wikipedia?

discuss

order

No comments yet.