top | item 44041722

(no title)

thatnerd | 9 months ago

I think that's an invalid hypothesis here, not just an unlikely one, because that's not my understanding of how LLMs work.

I believe you're suggesting (correctly) that a prediction algorithm trained on a data set where women outperform men with equal resumes would have a bias that would at least be valid when applied to its training data, and possibly (if it's representative data) for other data sets. That's correct for inference models, but not LLMs.

An LLM is a "choose the next word" algorithm trained on (basically) the sum of everything humans have written (including Q&A text), with weights chosen to make it sound credible and personable to some group of decision makers. It's not trained to predict anything except the next word.

Here's (I think) a more reasonable version of your hypothesis for how this bias could have come to be:

If the weight-adjusted training data tended to mention male-coded names fewer times than female-coded names, that could cause the model to bring up the female-coded names in its responses more often.

discuss

aetherson|9 months ago

People need to divorce the training method from the result.

Imagine that you were given a very large corpus of reddit posts about some ridiculously complicated fantasy world, filled with very large numbers of proper names and complex magic systems and species and so forth. Your job is, given the first half of a reddit post, predict the second half. You are incentivized in such a way as to take this seriously, and you work on it eight hours a day for months or years.

You will eventually learn about this fantasy world and graduate from just sort of making blind guesses based on grammar and words you've seen before to saying, "Okay, I've seen enough to know that such-and-such proper name is a country, such-and-such is a person, that this person is not just 'mentioned alongside this country,' but that this person is an official of the country." Your knowledge may still be incomplete or have embarrassing wrong facts, but because your underlying brain architecture is capable of learning a world model, you will learn that world model, even if somewhat inefficiently.

vessenes|9 months ago

To chime in on one point here: I think you're wrong about what an LLM is. You're technically correct about how an LLM is designed and built, but I don't think your conclusions are correct or supported by most research and researchers.

In terms of the Jedi IQ Bell curve meme:

Left: "LLMs think like people a lot of the time"

Middle: "LLMs are tensor operations that predict the next token, and therefore do not think like people."

Right: "LLMs think like people a lot of the time"

There's a good body of research that indicates we see emergent abilities, theory of mind, and a bunch of other stuff that shows models do deep levels of summarization, pattern matching during training from these models as they scale up.

Notice in your own example there's an assumption models summarize "male-coded" vs "female-coded" names; I'm sure they do. Interpretability research seems to indicate they also summarize extremely exotic and interesting concepts like "occasional bad actor when triggered," for instance. Upshot - I propose they're close enough here to anthropomorphize usefully in some instances.