top | item 45390480

(no title)

glowcoil | 5 months ago

The original article discusses techniques for constraining the weights of a neural network to a submanifold of weight space during training. Your comment discusses interleaving the tokens of an LLM prompt with Unicode PUA code points. These are two almost completely unrelated things, so it is very confusing to me that you are confidently asserting that they are the same thing. Can you please elaborate on why you think there is any connection at all between your comment and the original article?

discuss

aghilmort|5 months ago

Our ECC construction induces an emergent modular manifold during KVQ computation.

Suppose we use 3 codeword lanes every codeword which is our default. Each lane of tokens is based on some prime, p, so collectively forms CRT-driven codeword (Chinese Remainder Theorem). This is discretely equivalent to labeling every k tokens with 1x globally unique indexing grammar.

That interleaving also corresponds to a triple of adjacent orthogonal embeddings since those tokens still retain a random gaussian embedding. The net effect is we similarly slice the latent space into spaced chain of modular manifolds within the latent space every k content tokens.

We also refer to that interleaving as Steifel frames for similar reasons as the post reads etc. We began work this spring or so to inject that net construction inside the model with early results in similar direction as post described. That's another way of saying this sort of approach lets us make that chained atlas (wc?) of modular manifolds as tight as possible within dimensional limits of the embedding, floating point precision, etc.

We somewhat tongue-in-cheek refer to this as the retokenization group at the prompt level re: renormalization group / tensor nets / etc. Relayering group is the same net intuition or perhaps reconnection group at architecture level.

glowcoil|5 months ago

I'm sorry, but even if I am maximally charitable and assume that everything you are saying is meaningful and makes sense, it still has essentially nothing to do with the original article. The original article is about imposing constraints on the weights of a neural network, during training, so that they lie on a particular manifold inside the overall weight space. The "modular" part is about being able to specify these constraints separately for individual layers or modules of a network and then compose them together into a meaningful constraint for the global network.

You are talking about latent space during inference, not weight space during training, and you are talking about interleaving tokens with random Gaussian tokens, not constraining values to lie on a manifold within a larger space. Whether or not the thing you are describing is meaningful or useful, it is basically unrelated to the original article, and you are not using the term "modular manifold" to refer to the same thing.