top | item 39707282

(no title)

pstorm | 1 year ago

I'm surprised this isn't getting more love. I love the concept of finetuned, hyper-specific, tiny LLMs. Of course, the data is the most important part.

discuss

roborovskis|1 year ago

Thanks for the kind words! I started with the 780M param flan-t5-large model, and kept trying smaller and smaller base models - I was shocked at how good the output was at 77M. As you go smaller, though, it's much easier to accidentally overfit or collapse the model and produce gibberish. Had to be very careful with hyperparams and sanitizing / filtering the dataset.