top | item 46143920

(no title)

huydotnet | 2 months ago

Hey, I'm the author of the post. Thank you so much for the kind feedback!

Speaking about total time/cost, this experiment cost me just $1.01 for 2h30 on a rental GPU. But the actual successful run was less than 10 minutes for both phases. The rest of the time I was spending fixing the code, tuning the params, train, and retrain. It took me about 6 hours to build and clean the two datasets, though.

For the next step, I'm thinking of improving the model accuracy, maybe with RL, but I would not go about shrinking the model size any lower. Prior to this, I've tried a lot of different model sizes on different kinds of tasks, from 135M to 4B. I'm not sure I like the performance of these small models for code generation :D

discuss

No comments yet.