top | item 43843515

(no title)

edding4500 | 10 months ago

This is silly. Behind an LLM sits a deterministic algorithm. So no, it is not possible without ibserting randomness by other means into the algo, for example by setting temperatures for gradient descent.

Why are all these posts and news about LLMs so uninformed? This is human built technology. You can actually read up how these things work. And yet they are treated as if it were an alien species that must be examined by sociological means and methods where it is not necessary. Grinds my gears every time :D

discuss

order

whoami_nr|10 months ago

Author here. I know it’s silly. I understand to some extent how they work. I was just doing this for fun. Took about 1hr for everything and it all started when a friend asked me whether we can use them for a coin toss.

edding4500|10 months ago

Sorry, I did not mean to downtalk the blog post :) I did not mean silly as in stupid. It's rather the title that I think is misleading. Can a LLM do randomness? Well, PRNGs are part of it so the question boils down whether PRNGs can do randomness. As mentioned here before, setting the temperature of say GPT-2 to zero makes the output deterministic. I was 99% sure that you as the author knew about this :)

alew1|10 months ago

The algorithms are not deterministic: they output a probability distribution over next tokens, which is then sampled. That’s why clicking “retry” gives you a different answer. An LM could easily (in principle) compute a 50/50 distribution when asked to flip a coin.

im3w1l|10 months ago

Yes so it's basically asking whether that probability distribution is 50/50 or not. And it turns out that it's sometimes very skewed. Which is a non-obvious result.

kurikuri|10 months ago

So, what ‘algorithms’ are you talking about? The randomness comes from the input value (the random seed). Once you give it a random seed, a special number generator (PRNG) makes a sequence from that seed. When the LLM needs to ‘flip a coin,’ it just consumes a value from the PRNG’s output sequence.

Think of each new ‘interaction’ with the LLM as having two things that can change: the context and the PRNG state. We can also think of the PRNG state as having two things: the random seed (which makes the output sequence), and the index of the last consumed random value from the PRNG. If the context, random seed, and index are the same, then the LLM will always give the same answer. Just to be clear, the only ‘randomness’ in these state values comes from the random seed itself.

The LLM doesn’t make any randomness, it needs randomness as an input (hyper)parameter.

kbelder|10 months ago

But the randomness doesn't directly translate to a random outcome in results. It may randomly choose from a thousand possible choices, where 90% of the choices are some variant of 'the coin comes up heads'.

I think a more useful approach is to give the LLM access to an api that returns a random number, and let it ask for one during response formulation, when needed.

throwawaymaths|10 months ago

i think gp would consider the sampling bit a part of the API, not a part of the algorithm.

kerkeslager|10 months ago

The algorithms are definitely not deterministic. That said I agree with your general point that experimenting on LLMs as if they're black boxes with unknown internals is silly.

EDIT: I'm seeing another poster saying "Deterministic with a random seed?" That's a good point--all the non-determinism comes from the seed, which isn't particularly critical to the algorithm. One could easily make an LLM deterministic by simply always using the same seed.

dist-epoch|10 months ago

> all the non-determinism comes from the seed

not fully true, when using floating point the order of operations matters, and it can vary slightly due to parallelism. I've seen LLMs return different outputs with the same seed.

_joel|10 months ago

Deterministic with a random seed?

edding4500|10 months ago

But then the random seed is the source of randomness and not the training data. So the question "Can LLMs do randomness" would actually boil down to "Can PRNGs do randomness".

chaoz_|10 months ago

"You can actually read up on how these things work."

While you can definitely read about how some parts of a very complex neural network function, it's very challenging to understand the underlying patterns.

That's why even the people who invented components of these networks still invest in areas like mechanistic interpretability, trying to develop a model of how these systems actually operate. See https://www.transformer-circuits.pub/2022/mech-interp-essay (Chris Olah)

kaibee|10 months ago

Yes, but sometimes asking dumb questions is the first step to asking smart questions. And OP's investigation does raise some questions to me at least.

1. Give a model a context with some # of actually random numbers and then ask it to generate the next random number. How random is that number? Repeat N times, graph the results, is there anything interesting about the results?

2. I remember reading about how brains/etc are kinda edge-balanced chaotic systems. So if a model is bad at outputting random numbers (ie: needs a very high temperature for the experiment from step 1 to produce a good distribution of random numbers) What if anything does that tell us about the model?

3. Can we add a training step/fine-tuning step that makes the model better at the experiment from step #2? What effect does that have on its benchmarks?

I'm not an ML researcher, so maybe this is still nonsense.