top | item 45051166

(no title)

staticelf | 6 months ago

Reading an article like this makes me realize I am too stupid to ever build a foundation model from scratch.

discuss

order

oersted|6 months ago

Paper authors (and this posts author apparently) like to throw in lots of scary-looking maths to signal that they are smart and that what they are doing has merit. The Reinforcement Learning field is particularly notorious for doing this, but it's all over ML. Often it is not on purpose, everyone is taught this is the proper "formal" way to express these things, and that any other representation is not precise or appropriate in a scientific context.

In practice, when it comes down to code, even without higher-level libraries, it is surprisingly simple, concise and intuitive.

Most of the math elements used have quite straightforward properties and utility, but of course if you combine them all together into big expressions with lots of single-character variables, it's really hard to understand for everyone. You kind of need to learn to squint your eyes and understand the basic building-blocks that the maths represent, but that shouldn't be necessary if it wasn't obfuscated like this.

catgary|6 months ago

I’m going to push back on this a bit. I think a simpler explanation (or at least one that doesn’t involve projecting one’s own insecurities onto the authors) is that the people who write these papers are generally comfortable enough with mathematics that they don’t believe anything has been obfuscated. ML is a mathematical science and many people in ML were trained as physicists or mathematicians (I’m one of them). People write things this way because it makes symbolic manipulations easier and you can keep the full expression in your head; what you’re proposing would actually make it significantly harder to verify results in papers.

MattPalmer1086|6 months ago

Haha, recognise. I invented a fast search algorithm and worked with some academics to publish a paper on it last year.

They threw in all the complex math to the paper. I could not initially understand it at all despite inventing the damn algorithm!

Having said that, picking it apart and taking a little time with it, it actually wasn't that hard - but it sure looked scary and incomprehensible at first!

godelski|6 months ago

I think you misunderstand what the math is for. The math is not for training the model but for understanding why the model can be formulated that way and why this training will work. It is the exact opposite of obscurification.

Think of it this way

  You don't need math to train a good model but you need math to know why your model is wrong.
It isn't about lording over others, it is that in research you care why things work just as much as that they work. The reason for this is very simple: it's fucking hard to improve things when you don't understand them. If you just have a black box then the only strategy you have available is brute force. But if you analyze things and and build knowledge, then you don't have to brute force.

Also, the idea of using a paper to signal intelligence is kinda silly. Papers aren't being written for the general public, papers are the communication between scientists. Who are they impressing? Each other? The others who are going to call them out if they write bullshit or make arguments convoluted? I don't buy that. But maybe because I'm a researcher. But I also don't think I need to use math to look smart, my PhD and publication record do a good enough job of that on their own. I don't even need it to flex to other researchers. The math in my papers is because it is just easier to communicate. I'm sure there's concepts that you find easier to understand by reading code than by using English. Same thing. Math and programming are great languages when you need high precision and when being pedantic is essential. Math is used because it is the best way to communicate, not as a flex. We flex on each other by showing how our ideas are the best. You can't do that if the other person doesn't understand you.

@staticelf and anyone else that feels that way:

That feeling is normal in the beginning. Basically your first year of a PhD is spent going "what the fuck does any of this mean?!?!" It's rough. But also normal. You're working at the bounds of human knowledge and papers are written in the context of other papers. It's hard to jump in because it is like jumping into the middle of a decades (or longer) conversation. If you didn't feel lost then the conversation probably wasn't that complicated and we'd probably have solved those problems much earlier. So you sit down and read a lot of papers to get context to that conversation.

My point is, don't put yourself down. The hill you need to climb looks steeper than it is. Unfortunately it is also hard to track your progress so you tend to feel like it's continually out of reach until it suddenly isn't. (It's also hard because everyone feels like an imposter and many are afraid to admit not knowing. But the whole job is about not knowing lol) Probably the most important skill in a PhD is persistence. I doubt you're too stupid. I'm sure you can look back and see that you've done things you or other people are really impressed with. Things that looked like giant mountains to climb but looking back don't seem so giant anymore. We'd get nowhere in life if we didn't try to do things we initially thought were too hard. Truth is you never know till you try. I'm not going to say it's easy (it isn't), but that it isn't insurmountable. You can't compare yourself against others who have years of training. Instead look at them and see that that's where this training can take you. But you can't get there if you don't try.

hoppp|6 months ago

It takes a while to get into, just like with everything determination is key

Also there are libraries that abstract away most if not all the things, so you don't have to know everything

staticelf|6 months ago

That's the thing, it's too hard to learn so I rather do something else with the limited time I have left.

reactordev|6 months ago

Haha, I was just going to say the same. I was hoping, I guess naively, that this would explain the math. Not just show me math. While I love a good figure, I like pseudocode just as much :)