top | item 42810608

(no title)

brainbag | 1 year ago

Would you say more about your experience writing it in Rust? It worked well, what didn't, anywhere you found that you struggled unexpectedly or that was easier than you expected?

discuss

airstrike|1 year ago

Hey, thanks for asking. I'm the furthest from an authority in this so I encourage you to take everything I say with a grain of salt.

I was using the burn[0] crate which is pretty new but in active development and chock-full of features already. It comes with a lot of what you need out of the box including a TUI visualizer for the training and validation steps.

The fact that it's so full of features is a blessing and a curse. The code is very modular so you can use the pieces you want the way you want to use them, which is good, but the "flavor" of Rust in which is written felt like a burden compared to the way I'm used to writing Rust (which, for context, is 99% using the glorious iced[1] GUI library). I can't fault burn entirely for this, after all they are free to make their own design choices and I was a beginner trying to do this in less than a week. I also think they are trying to solve for getting a practitioner to just get up and going right away, whereas I was trying to build a modular configuration on top of the crate instead of a one-and-done type script.

But there were countless generic types, several traits to define and implement in order to make some generic parameter fit those bounds, and the crate has more proc_macro derives than I'd like (my target number is 0) such as `#[derive(Module, Config, new)]` because they obfuscate the code that I actually have to write and don't teach me anything.

TL;DR the crate felt super powerful but also very foreign. It didn't quite click to the point where I thought it was intuitive or I felt very fluent with it. But then again, I spent like 5 days with it.

One other minor annoying thing was that I couldn't download exactly what I wanted out of HuggingFace directly. I ended up having to use `HuggingfaceDatasetLoader::new("carlosejimenez/wikitext__wikitext-2-raw-v1")` instead of `HuggingfaceDatasetLoader::new("Salesforce/wikitext")` because the latter would get an auth error, but this may also be my ignorance about how HF is supposed to work...

Eventually, I got the whole thing to work quite neatly and was able to tweak hyperparameters and get my model to increasingly better perplexity. With more tweaks, a better tokenizer, possibly better data source, and an NVIDIA GPU rather than Apple Silicon, I could have squeezed even more out of it. My original goal was to try to slap an iced GUI on the project so that I could tweak the hyperparameters there, compare models, plot the training and inference, etc. with a GUI instead of code. Sort of a no-code approach to training models. I think it's an area worth exploring more, but I have a main quest I need to finish first so I just wrote down my findings in an unpublished "paper" and tabled it for now.

________

[0]: https://github.com/tracel-ai/burn

[1]: https://github.com/iced-rs/iced