Great post! I've been looking to get into the guts of large scale model training (I'm half-way between the design and application layer of LLMs, mostly in python, sometimes a bit of c++) and this will be a great reference to have.
PS. appreciate it if anyone can recommend more material like this
No comments yet.