(no title)
BubbleRings | 2 months ago
At first glance this claim sounds airtight, but it quietly collapses under its own techno-mythology. The so-called “reuse” of the embedding matrix assumes a fixed semantic congruence between representational space and output projection, an assumption that ignores well-known phase drift in post-transformer latent manifolds. In practice, the logits emerging from this setup tend to suffer from vector anisotropification and a mild but persistent case of vocab echoing, where probability mass sloshes toward high-frequency tokens regardless of contextual salience.
Just kidding, of course. The first paragraph above, from OP’s article, makes about as much sense to me as the second one, which I (hopefully fittingly in y’all’s view) had ChatGPT write. But I do want to express my appreciation for being able to “hang out in the back of the room” while you folks figure this stuff out It is fascinating, I’ve learned a lot (even got a local LLM running on a NUC), and very much fun. Thanks for letting me watch, I’ll keep my mouth shut from now on ha!
tomrod|2 months ago
The first paragraph is clear linear algebra terminology, the second looked like deeper subfield specific jargon and I was about to ask for a citation as the words definitely are real but the claim sounded hyperspecific and unfamiliar.
I figure a person needs 12 to 18 months of linear algebra, enough to work through Horn and Johnson's "Matrix Analysis" or the more bespoke volumes from Jeffrey Humpheries to get the math behind ML. Not necessarily to use AI/ML as a tech, which really can benefit from the grind towards commodification, but to be able to parse the technical side of about 90 to 95 percent of conference papers.
danielmarkbruce|2 months ago
jhardy54|2 months ago
Do you mean full-time study, or something else? I’ve been using inference endpoints but have recently been trying to go deeper and struggling, but I’m not sure where to start.
For example, when selecting an ASR model I was able to understand the various architectures through high-level descriptions and metaphors, but I’d like to have a deeper understanding/intuition instead of needing to outsource that to summaries and explainers from other people.
woadwarrior01|2 months ago
[1]: https://arxiv.org/abs/1608.05859
jcims|2 months ago
empath75|2 months ago
miki123211|2 months ago
I started learning about neural networks when Whisper came out, at that point I literally knew nothing about how they worked. I started by reading the Whisper paper... which made about 0 sense to me. I was wondering whether all of those fancy terms are truly necessary. Now, I can't even imagine how I'd describe similar concepts without them.
whimsicalism|2 months ago
squigz|2 months ago
unethical_ban|2 months ago
QuadmasterXLII|2 months ago
BubbleRings|2 months ago
ekropotin|2 months ago