(no title)
warsheep | 6 months ago
This version will not get you far, you will just train a model that solves the last math problem you gave it and maybe some others, but it will probably forget the first ones.
There are other similar procedures that train better, but they've been tried and are currently worse than classical SGD with large batches
vivzkestrel|6 months ago