top | item 39079676

(no title)

Getting to know some cuda is a solid step forward, I'd recommend familiarizing with Triton as a next move since you seem into that, besides fundamentals like trying out a scalable deployment workflow with a framework>training env>inference env>orchestration tooling of choice. Plenty of room to mix and match there (and can be done at no cost in many cases due to generous free trials/tiers) :). Simply finding something you love goes a long way, there's always going to be opportunities to prove your worth especially when it comes to infra: here's a little example, there's a lot of popular streamlined training tools which save checkpoints where tensors are nested oddly in subkeys- because of it deserialization gets iffy when converting to safetensors. That wouldn't be a hassle if it were a one off thing, but said models are really big and popular without even counting the sheer number of finetunes that also get deployed and when aware of it can just act accordingly but it's what stopped enough projects in their tracks for a while until pointed out haha. I'm sure that whatever you pick to do will be a good choice, best of luck :)

discuss

minzi|2 years ago

Thank you for your insight! I was planning on checking out triton too once I got a baseline understanding of gpu programming in general. The problem you described is right in the realm of problems that I am interested in working on. I’m really looking to work on finding efficiency in distributed training and inference.