top | item 36867114

(no title)

I see the logic here, but I’m highly skeptical about how valid such a tool would be.

If a researcher comes out and says, “Surveys show that people want X, and they do not like Y,” and then others ask the researcher if they surveyed people, the answer would be “no.”

Fundamentally, people wanting feedback from humans will not get that by using your product.

The best you can say is this: “Our product is guessing people will say X.”

discuss

famouswaffles|2 years ago

Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus? (https://arxiv.org/abs/2301.07543)

Out of One, Many: Using Language Models to Simulate Human Samples (https://arxiv.org/abs/2209.06899)

There's been some research in this vain. To answer your question, seemingly very valid.

puppy_nap|2 years ago

These papers suggest that LLMs do something a lot more specific (when asked to simulate a certain political background, they're able to give responses to questions in a way that's consistent with those political backgrounds). That's not particularly surprising to me as I would expect a human to be also able to simulate this kind of thing pretty accurately. I don't think it implies that LLMs would be good at answering typical business survey questions.

timshell|2 years ago

We're trying to figure out the optimal use case for this, i.e. whether it's internal or client-facing (your example).

Internal purposes include stuff like optimally rewording questions and getting priors.

A hybrid approach would be something like - hey let's not ask someone 100 questions because we can accurately predict 80%. Let's just ask them the hard-to-estimate 20 questions

tcgv|2 years ago

I think it's less about "prediction" and more about mapped cohort behaviors and opinions, especially those that change slowly over time. The LLM model will likely be a picture of how the population and each demographic group behaved and what they believed at a specific time window (i.e. when the data set was collected), and will produce answers that reflect that. It will most likely be lagging behind new trends and how they shape population behaviors and beliefs over time. In any case I think even the most experienced market research professionals would agree that discovering new trends before they become mainstream is really challenging.

quadrature|2 years ago

> optimally rewording questions

This kind of concerns me because you could use this to bias surveys in different directions. This obviously already happens, so maybe it just part of the status quo.

tchock23|2 years ago

I’ve worked in this industry for a while and in the ‘faster, cheaper, better - pick two’ trade off, many will select faster and cheaper. That’s only speaking for corporate market research though, can’t say the same for academic researchers.

I suspect people would use this product as a quick gut check to decide if it is warranted to spend the time and money on a full scale quant study.

DriverDaily|2 years ago

You want a 90/10: 90% of the benefit, 10% of the effort.

This is like a 10/10.

Shrezzing|2 years ago

The tool would be useful as a QA step to test for leading questions in survey design. See Yes Minister's[1] explanation for how they can work. A simulation to see if the questions get the same response irrespective of the order they were asked in could improve survey quality. Obviously, the tool could be used in the opposite way too, to help design surveys that say exactly what the company/govt/charity wants it to.

[1] https://www.youtube.com/watch?v=G0ZZJXw4MTA

helsinkiandrew|2 years ago

> I see the logic here, but I’m highly skeptical about how valid such a tool would be.

I see the problem as although you can create lots of examples that are correct/follow real world opinions, you can never prove that a particular question is correct/follows real world opinion. I'm not sure who would trust the output enough to rely on it for decision making.

digitcatphd|2 years ago

I have been using AI generated surveys using the playground and have found them quite effective in simulating responses. In fact they are incredibly similar to my experience asking the same questions IRL. The challenge is people don’t trust them and AI still have this negative association. So yes I mean to say it’s yet another human error.