(no title)
TikiTDO | 2 years ago
So while you obviously wouldn't be able to conclusively prove the idea fixes the issue in larger models, if you know what you are looking for you should be able to validate that the method works in general down to very small models.
That said, consumer grade cards should be able to train an 8B model with quantization, so you might as well train the whole thing.
No comments yet.