top | item 42485330

(no title)

aothms | 1 year ago

I think the first hand distinction is questionable, e.g https://en.wikipedia.org/wiki/Thing-in-itself We can also only perceive through our sensory and neural pathways.

And with multimodal LLMs there is also some ability for multiple sensory inputs.

discuss

dartos|1 year ago

I don’t think it is.

To oversimplify, our input system takes a continuous stream of raw input. We can’t stop it, really.

We get our input directly from the source. Even if it’s aliased by our neural pathways, when they receive that info initially, it’s unadulterated.

LLMs take fixed amounts of discrete tokens which must be modified before it even reaches the training routine . Even multimodal models take in discrete tokens.

Information is lost when recording reality to video and even more is lost when converting that video into tokens.

And LLMs only learn in fixed steps. We take in information while we generate our whatever it we generate (movement, sense of self, understanding of our surroundings and place in those surroundings, next sentence, etc.)

Talking specifically about the most popular transformers models.