top | item 45091042

(no title)

Yeah... I'm far from an expert on state-of-the-art ML, but it feels like a new embedding would invalidate any of the layers you keep. Taking off a late layer makes sense to me, like in cases where you want to use an LLM with a different kind of output head for scoring or something like that, because the basic "understanding" layers are still happening in the same numerical space - they're still producing the same "concepts", that are just used in a different way, like applying a different algorithm to the same data structure. But if you have a brand new embedding, then you're taking the bottom layer off. Everything else is based on those dimensions. I suppose it's possible that this "just works", in that there's enough language-agnostic structure in the intermediate layers that the model can sort of self-heal over the initial embeddings... but that intuitively seems kind of incredible to me. A transformation over vectors from a completely different basis space feels vanishingly unlikely to do anything useful. And doubly so given that we're talking about a low-resource language, which might be more likely to have unusual grammatical or linguistic quirks which self-attention may not know how to handle.

discuss

No comments yet.