top | item 39688599

(no title)

clooper | 1 year ago

I was recently thinking how every neural network is equivalent to a lookup table where the input is all numbers up to what can be expressed within the context window and the output is the result of the arithmetic operations applied to that number. So every neural network is equivalent to T = {(i, f(i)) : i < K} where K is the constant which determines the context window and f is the numerical function implemented by the network. Can someone ask a neural network if my reasoning is valid and correct?

The main practical issue is the size of the table but I don't see any theoretical reasons why this is incorrect. The neural network is simply a compressed representation of the uncompressed lookup table. Given that the two representations are theoretically equivalent and a lookup table does not perform any reasoning we can conclude that no neural network is actually doing any thinking other than uncompressing the table and looking up the value corresponding to the input number.

Modern neural networks have some randomness but that doesn't change the table in any meaningful way because instead of the output being a number it becomes a distribution over some finite range which can again be turned into a table with some tuples.

discuss

order

edflsafoiewq|1 year ago

This reminds me of the classic problem in computation, where the simplest form of computation, the lookup table, input -> output, is limited to a finite domain. Turing modified the computation to have a finite internal state and infinite external environment (tape), so it becomes a transition function (state, stimulus) -> (new state, response), applied recursively in a feedback loop, allowing it to operate on infinite domains.

Famously a simple lookup table for the transition function then suffices to compute any computable function.

eru|1 year ago

Have a look at Post's correspondence problem for even crazier universal models of computation. Or at Fractran.

Simplified for Post's correspondence problem, you have a set of playing cards with text written on the front and back. (You can make copies of cards in your set.)

The question is, can you arrange your cards in such a way, that they spell out the same total text on the front and back?

As an example your cards might be: [1] (a, baa), [2] (ab, aa), and [3] (bba, bb). One solution would be (3, 2, 3, 1) which spells out bbaabbbaa on both sides.

Figuring out whether a set of cards has a solution is Turing complete.

clooper|1 year ago

That's a good point.

Centigonal|1 year ago

It sounds like you're asking whether the output of a neural network is a deterministic function of its input. For many LLMs, you can make that answer yes with the right combination of parameters (temperature = 0) and underlying compute (variance in floating point calculations can still introduce randomness in model outputs even when the model should theoretically return the same answer every time).

There are some ways to introduce stochasticity:

1. Add randomness. The temperature or "creativity" hyperparameter in most LLMs does this, as do some decoders. The hardware these models run can also add randomness.

2. Add some concept of state. RNNs do this, some of the approaches which give the LLM a scratch pad or external memory do this, and continuous pre-training sort of does this.

How this affects people's perception of LLMs as thinking machines, I don't know. What if someone took every response I ever gave to every question that was ever asked of me in my life and made a Chinese Room[1] version of me? A lookup table that is functionally identical to my entire existence. In what contexts is the difference meaningful?

[1] https://en.wikipedia.org/wiki/Chinese_room

cryptoxchange|1 year ago

To your last point, https://en.m.wikipedia.org/wiki/Problem_of_induction

A LUT version of you is inductive. Every observed input/output pair does not uniquely identify your current state. Much like a puddle left by a melted ice cube indicates its volume, but little to nothing of its shape.

Post LUT-you genesis, applying property based fuzz testing would quickly reveal that the LUT-you is one of an infinite number of LUT-yous that melts into the puddle of historical data, but not the LUT-you that is the original ice cube.

https://fsharpforfunandprofit.com/posts/property-based-testi...

clooper|1 year ago

People can not be reduced to lookup tables even in theory. No one even knows how a single cell does what it does let alone an entire organism like a person.

I'm not making an abstract claim about neural networks because all numerical algorithms like neural networks can be reduced to a lookup table given a large enough hard drive. This is not practical because the space required would exceed the number of atoms in the known universe but the argument is sound. The same isn't true for people unless a person is idealized and abstracted into a sequence of numbers. I'm not saying no one is allowed to think of people as some sequence of numbers but this is clearly an abstraction of what it means to be a person and in the case of the neural network there is no abstraction, it really is a numerical function which can be expanded into a large table which represents its graph.

radarsat1|1 year ago

Yes, this is one view of machine learning, the idea that you are training some function to map input to output, similar to "looking up" what output is addressed by some input.

And that's why the concept of generalization is so important on machine learning, and as a consequence, why the internal representation of that "lookup" matters.

By definition a lookup table can only store data it is given. However, the idea of ML systems is actually to predict values of inputs that are similar to but not given in their training data.

Interpolation and extrapolation, key components to applying ML systems to new data and therefore critical for actual usage, are enabled by internal representations that allow for modeling the space between and around data points. It so happens that multilayer neural networks accomplish this by general and smoothed (due to regularization tricks and inductive biases) iterative warpings of the representation (embedding) space.

Due to the manifold hypothesis, we can interpret this as determining underlying and semantically meaningful subspaces, and unfolding them to perform generalized operations such as logical manipulations and drawing classification boundaries in some relatively smooth semantic space, then refolding things to drive some output representation (pixels, classes, etc.)

Another view on this is that these manipulations allow a kind of compression by optimizing the representation to make manipulations easier, in other words they re-express the data in a form that allows algorithmic evaluation of some input program. This gives the chance of modeling intrinsic relationships such as infinite sequences as vector programs. (Here I mean things like mathematical recursions, etc.) When this is accomplished, and it happens due to the pressure to optimally compress data, you could say that "understanding" emerges, and the result is a program that extrapolates to unseen values of such sequences. At this point you could say that while the input-output relationship is like a lookup table, functionally it is not the same thing because the need to compress these input-output relationships has led to some representation which allows for extrapolation, aka "intelligence" by some definitions.

The fact that these systems are still very dumb sometimes is simply due to not developing these representations as well as we would like them to, for a variety of reasons. But theoretically this is the idea behind why emergence might occur in an NN but not in a lookup table.

pistachiopro|1 year ago

Take a relatively simple large language model like Llama 1. It has a context of 2048 tokens and each token can be one of 32,000 values. So the lookup table would need 32,000^2048 entries. That's not just impractically large, that's larger than cosmically large. There are only estimated to be about 10^80 atoms in the visible universe. So while a 32,000^2048 lookup table might be a valid concept mathematically, it's not anything you can intuit physically, and therefore not something you can say is incapable of reason.

calf|1 year ago

Every program is a compressed representation of its output. This is from Kolmogorov complexity, which you learn this in any CS complexity theory course.

So, a neural network being a compressor/decompressor is nothing special.

Note, however, that supposing a context window of 1000 units, then we are looking at K = 2^1000 = 10^300 different entries in the truth table. Somehow, your LLM neural network is the result of compressing a 10^300 exponential scale amount of possible information, which of course could never be seen at all -- to compress a JPEG at least you have access to the original image, not just two pixels in it.

Anyways, the philosophical debate is whether you believe programs can think, whether machine intelligence is meaningful at all by definition. Some say yes, others say no. When humans think, are not our abstractions and ideas a kind of compression?

bubblyworld|1 year ago

This is an old argument against determinism - I think a serious challenge is that:

1. Modern physics suggests you can implement such a lookup table for any subset of our universe.

2. We are a subset of the universe.

3. Therefore we are representable by lookup tables too.

...so your argument appears to prove too much, namely that humans aren't thinking beings either. Which is fine, but personally I don't think that's a useful definition of "thinking".

mjburgess|1 year ago

We're not a lookup table of the things we're, eg., saying, or doing etc. Nor are we looking up, in this sense, when we act.

ie., when you compress text into an NN and use it to generate text, the generated text is just a synthesis of the compressed text.

Whereas when I type, I am not synthesising text. Rather I have the skill of typing, I have an interior subjectivity of thoughts, I have memories which arent text, and so on.

When my fingers move across the keyboard it isn't because they are looking up text.

Our causal properties (experiencing, thinking, seeing, feeling, remembering, moving, speaking, growing, digesting ...) are not each, "index on the total history of prior experience", "index on the total history of prior seeing". The world directly causes, eg., us to see -- seeing isnt a lookup table of prior seeings.

( Also, the whole of physics is formulated in terms that cannot be made into a lookup table; and there is no evidence, only insistence, of the converse. )

clooper|1 year ago

How are people lookup tables? In the case of neural networks the representation of the table is obvious, it's just numbers. What would be the equivalent table for the liver?

My argument isn't abstract. Neural networks really are just numerical functions which can be expanded into their equivalent graph representations.

microtonal|1 year ago

Suppose that we used embeddings as the input of the model rather than piece identifiers plus an embedding lookup table. This is possible with every transformer model and some libraries provide an API to do this. Moreover, we convert the parameters and ops to use arbitrary precision types. Then the network cannot be represented as a lookup table. Given that there is an infinite number of inputs, there is also an infinite number of outputs. But the arbitrary-precision network does not operate fundamentally different from the original network. It has the same parameters, ops, etc., yet you cannot store it as a (finite) lookup table.

clooper|1 year ago

Even if you increase the precision I can still generate a table T(P) for each fixed precision P. So the table is parametrized by P but it's still a table. The entire table T = colim T(P) is the colimit over all precision values but for every finite precision it is still a table.

cesaref|1 year ago

Just like to point out that RNNs have internal state which isn't captured in this view, so yes, lots of NNs can be considered this way, but not all. It's the DSP equivalent of FIRs vs IIRs.

FrustratedMonky|1 year ago

This whole thread on lookup tables seems to be confused.

Isn't this purely math, the equivalence of a function to a lookup table is well studied. And NN as comprised of functions, can be boiled down to table as posted.

How do we get from this math concept of function=table, and get to arguments about consciousness and free-will and state space of the universe...

The table-NN equivalence doesn't seem to help peoples understanding of NN.

calf|1 year ago

People are just outright abusing the terminology. OP's argument would also conclude that a sorting algorithm is not a "real" algorithm because it too can be done by an infinite lookup table.

That said, the general debate is a valid one. Are LLMs just doing fancy statistical compression of data, or are they doing "reasoning" in some important sense, be that merely mechanistic logical reasoning, or "human-level intelligent reasoning"?

For that matter, did the paper authors ever define "Reasoners" in their title, or leave it to the reader?

Terretta|1 year ago

> thinking how every neural network is equivalent to a lookup table where the input is all numbers up to what can be expressed within the context window and the output is the result of the arithmetic operations applied to that number... no neural network is actually doing any thinking other than uncompressing the table and looking up the value corresponding to the input number

You're proposing the lookup table as one possible mechanism in Searle's chinese room, then proposing Searle's conclusion?

“Searle argues that, without ‘understanding’ (or ‘intentionality’), we cannot describe what the machine is doing as ‘thinking’ and, since it does not think, it does not have a ‘mind’ in anything like the normal sense of the word. Therefore, he concludes that the ‘strong AI’ hypothesis is false.‘

https://en.wikipedia.org/wiki/Chinese_room

I think you've said Chinese room, run as many times as it takes to get all possible sequences of Chinese characters to cache the results, then using those run it and ask if it's still or yet ‘thinking’.

PS. Where did the arithmetic operations come from? How did they come to be as they are? Is iterating to an algo that does that, ‘learning’? What's the difference between this and lossy or non-lossy compression of information? Could it be said the arithmetic operations are a compression of the lookup table into that which has the ‘right’ response given the inputs? If two different sets of arithmetic operations give by and large the same outputs from inputs, is one of them more ‘reasoning’ than the other depending how it's derived? What do we mean by ‘learning’ and ‘reasoning’ when applying those words to humans? Are teachers telling students to ‘show your work’ searching for explainable intelligence? :-)

Legend2440|1 year ago

Here's a counterexample. Suppose I create a simple neural network that computes f(x) = x^2 + c (where x and c are complex numbers) and then I run it as an RNN. This RNN will compute the mandelbrot set, which can't be represented by a lookup table.

You can't even know if the RNN will halt for a given input. Neural networks are stronger than lookup tables, they are programs.

still_grokking|1 year ago

Every computable function can be represented by a (possibly infinite)¹ lookup table.

Computer programs can only compute computable functions. Therefore any computer program is (in theory) equivalent to a table lookup.

¹ For finite inputs, the lookup table can be finite, and for infinite inputs, the lookup table can be infinite but still countable, as the set of computable functions is countable.

cjfd|1 year ago

I am sorry to be this blunt but this is really utter and complete nonsense. The phrase that the mandelbrot set can't be represented in a lookup table is as such true but that is because nothing that you do with finite precision numbers can represent the mandelbrot set because it essentially is an inifinte object. The function f(x) = x^2 + c as an RNN can also not compute the mandelbrot set if the numbers it uses are of finite precision. That is exactly the same limitation that the lookup table also faces so there is no fundamental difference between the two.