(no title)
clooper | 1 year ago
The main practical issue is the size of the table but I don't see any theoretical reasons why this is incorrect. The neural network is simply a compressed representation of the uncompressed lookup table. Given that the two representations are theoretically equivalent and a lookup table does not perform any reasoning we can conclude that no neural network is actually doing any thinking other than uncompressing the table and looking up the value corresponding to the input number.
Modern neural networks have some randomness but that doesn't change the table in any meaningful way because instead of the output being a number it becomes a distribution over some finite range which can again be turned into a table with some tuples.
edflsafoiewq|1 year ago
Famously a simple lookup table for the transition function then suffices to compute any computable function.
eru|1 year ago
Simplified for Post's correspondence problem, you have a set of playing cards with text written on the front and back. (You can make copies of cards in your set.)
The question is, can you arrange your cards in such a way, that they spell out the same total text on the front and back?
As an example your cards might be: [1] (a, baa), [2] (ab, aa), and [3] (bba, bb). One solution would be (3, 2, 3, 1) which spells out bbaabbbaa on both sides.
Figuring out whether a set of cards has a solution is Turing complete.
clooper|1 year ago
Centigonal|1 year ago
There are some ways to introduce stochasticity:
1. Add randomness. The temperature or "creativity" hyperparameter in most LLMs does this, as do some decoders. The hardware these models run can also add randomness.
2. Add some concept of state. RNNs do this, some of the approaches which give the LLM a scratch pad or external memory do this, and continuous pre-training sort of does this.
How this affects people's perception of LLMs as thinking machines, I don't know. What if someone took every response I ever gave to every question that was ever asked of me in my life and made a Chinese Room[1] version of me? A lookup table that is functionally identical to my entire existence. In what contexts is the difference meaningful?
[1] https://en.wikipedia.org/wiki/Chinese_room
cryptoxchange|1 year ago
A LUT version of you is inductive. Every observed input/output pair does not uniquely identify your current state. Much like a puddle left by a melted ice cube indicates its volume, but little to nothing of its shape.
Post LUT-you genesis, applying property based fuzz testing would quickly reveal that the LUT-you is one of an infinite number of LUT-yous that melts into the puddle of historical data, but not the LUT-you that is the original ice cube.
https://fsharpforfunandprofit.com/posts/property-based-testi...
clooper|1 year ago
I'm not making an abstract claim about neural networks because all numerical algorithms like neural networks can be reduced to a lookup table given a large enough hard drive. This is not practical because the space required would exceed the number of atoms in the known universe but the argument is sound. The same isn't true for people unless a person is idealized and abstracted into a sequence of numbers. I'm not saying no one is allowed to think of people as some sequence of numbers but this is clearly an abstraction of what it means to be a person and in the case of the neural network there is no abstraction, it really is a numerical function which can be expanded into a large table which represents its graph.
radarsat1|1 year ago
And that's why the concept of generalization is so important on machine learning, and as a consequence, why the internal representation of that "lookup" matters.
By definition a lookup table can only store data it is given. However, the idea of ML systems is actually to predict values of inputs that are similar to but not given in their training data.
Interpolation and extrapolation, key components to applying ML systems to new data and therefore critical for actual usage, are enabled by internal representations that allow for modeling the space between and around data points. It so happens that multilayer neural networks accomplish this by general and smoothed (due to regularization tricks and inductive biases) iterative warpings of the representation (embedding) space.
Due to the manifold hypothesis, we can interpret this as determining underlying and semantically meaningful subspaces, and unfolding them to perform generalized operations such as logical manipulations and drawing classification boundaries in some relatively smooth semantic space, then refolding things to drive some output representation (pixels, classes, etc.)
Another view on this is that these manipulations allow a kind of compression by optimizing the representation to make manipulations easier, in other words they re-express the data in a form that allows algorithmic evaluation of some input program. This gives the chance of modeling intrinsic relationships such as infinite sequences as vector programs. (Here I mean things like mathematical recursions, etc.) When this is accomplished, and it happens due to the pressure to optimally compress data, you could say that "understanding" emerges, and the result is a program that extrapolates to unseen values of such sequences. At this point you could say that while the input-output relationship is like a lookup table, functionally it is not the same thing because the need to compress these input-output relationships has led to some representation which allows for extrapolation, aka "intelligence" by some definitions.
The fact that these systems are still very dumb sometimes is simply due to not developing these representations as well as we would like them to, for a variety of reasons. But theoretically this is the idea behind why emergence might occur in an NN but not in a lookup table.
pistachiopro|1 year ago
calf|1 year ago
So, a neural network being a compressor/decompressor is nothing special.
Note, however, that supposing a context window of 1000 units, then we are looking at K = 2^1000 = 10^300 different entries in the truth table. Somehow, your LLM neural network is the result of compressing a 10^300 exponential scale amount of possible information, which of course could never be seen at all -- to compress a JPEG at least you have access to the original image, not just two pixels in it.
Anyways, the philosophical debate is whether you believe programs can think, whether machine intelligence is meaningful at all by definition. Some say yes, others say no. When humans think, are not our abstractions and ideas a kind of compression?
bubblyworld|1 year ago
1. Modern physics suggests you can implement such a lookup table for any subset of our universe.
2. We are a subset of the universe.
3. Therefore we are representable by lookup tables too.
...so your argument appears to prove too much, namely that humans aren't thinking beings either. Which is fine, but personally I don't think that's a useful definition of "thinking".
mjburgess|1 year ago
ie., when you compress text into an NN and use it to generate text, the generated text is just a synthesis of the compressed text.
Whereas when I type, I am not synthesising text. Rather I have the skill of typing, I have an interior subjectivity of thoughts, I have memories which arent text, and so on.
When my fingers move across the keyboard it isn't because they are looking up text.
Our causal properties (experiencing, thinking, seeing, feeling, remembering, moving, speaking, growing, digesting ...) are not each, "index on the total history of prior experience", "index on the total history of prior seeing". The world directly causes, eg., us to see -- seeing isnt a lookup table of prior seeings.
( Also, the whole of physics is formulated in terms that cannot be made into a lookup table; and there is no evidence, only insistence, of the converse. )
clooper|1 year ago
My argument isn't abstract. Neural networks really are just numerical functions which can be expanded into their equivalent graph representations.
microtonal|1 year ago
clooper|1 year ago
cesaref|1 year ago
unknown|1 year ago
[deleted]
FrustratedMonky|1 year ago
Isn't this purely math, the equivalence of a function to a lookup table is well studied. And NN as comprised of functions, can be boiled down to table as posted.
How do we get from this math concept of function=table, and get to arguments about consciousness and free-will and state space of the universe...
The table-NN equivalence doesn't seem to help peoples understanding of NN.
calf|1 year ago
That said, the general debate is a valid one. Are LLMs just doing fancy statistical compression of data, or are they doing "reasoning" in some important sense, be that merely mechanistic logical reasoning, or "human-level intelligent reasoning"?
For that matter, did the paper authors ever define "Reasoners" in their title, or leave it to the reader?
Terretta|1 year ago
You're proposing the lookup table as one possible mechanism in Searle's chinese room, then proposing Searle's conclusion?
“Searle argues that, without ‘understanding’ (or ‘intentionality’), we cannot describe what the machine is doing as ‘thinking’ and, since it does not think, it does not have a ‘mind’ in anything like the normal sense of the word. Therefore, he concludes that the ‘strong AI’ hypothesis is false.‘
https://en.wikipedia.org/wiki/Chinese_room
I think you've said Chinese room, run as many times as it takes to get all possible sequences of Chinese characters to cache the results, then using those run it and ask if it's still or yet ‘thinking’.
PS. Where did the arithmetic operations come from? How did they come to be as they are? Is iterating to an algo that does that, ‘learning’? What's the difference between this and lossy or non-lossy compression of information? Could it be said the arithmetic operations are a compression of the lookup table into that which has the ‘right’ response given the inputs? If two different sets of arithmetic operations give by and large the same outputs from inputs, is one of them more ‘reasoning’ than the other depending how it's derived? What do we mean by ‘learning’ and ‘reasoning’ when applying those words to humans? Are teachers telling students to ‘show your work’ searching for explainable intelligence? :-)
Legend2440|1 year ago
You can't even know if the RNN will halt for a given input. Neural networks are stronger than lookup tables, they are programs.
still_grokking|1 year ago
Computer programs can only compute computable functions. Therefore any computer program is (in theory) equivalent to a table lookup.
¹ For finite inputs, the lookup table can be finite, and for infinite inputs, the lookup table can be infinite but still countable, as the set of computable functions is countable.
cjfd|1 year ago