(no title)
anotherjesse | 1 year ago
> To sample on Mac, uncomment line 21 in sample.py. To train on Mac, rename train_shakespeare_char_mac.py to train_shakespeare_char.py
The `mac` file changed several things - I decided to try running training with the original config file - changing device to mps / compile to false
iter 100: loss 2.0268, time 815.43ms, mfu 3.24%
iter 200: loss 1.8523, time 818.79ms, mfu 3.24%
iter 300: loss 1.7799, time 823.05ms, mfu 3.23%
iter 400: loss 1.6887, time 819.08ms, mfu 3.23%
Training is ~4x slower than the speed reported on the original multi-GPU run: https://wandb.ai/adam-karvonen/chess-gpt-batch/runs/zt5htyl6...Not bad for an M2 studio which is running lots of other workloads at the same time
a_wild_dandan|1 year ago
nl|1 year ago
It's possible MLX has some additional micro optimizations, but in general most people who have tried it out against hand-written MPS based training implementations haven't found great speed ups yet.