(no title)
me_vinayakakv | 8 months ago
I've hit this with gemini-2.0-flash and changing the prompt ever so slightly seems to make things work, just to break it at other input.
me_vinayakakv | 8 months ago
I've hit this with gemini-2.0-flash and changing the prompt ever so slightly seems to make things work, just to break it at other input.
gdiamos|8 months ago
Andrej's 2019 blog laments on some of the reasons why it is hard and I can relate to a lot of this - https://karpathy.github.io/2019/04/25/recipe
The biggest mistake I see people making is this quote from the blog: "a 'fast and furious' approach to training neural networks does not work and only leads to suffering"
I'll probably write more about it in a few months...