> To trust these AI models with decisions that impact our lives and livelihoods, we want the AI models’ opinions and beliefs to closely and reliably match with our opinions and beliefs.
No, I don't. It's a fun demo, but for the examples they give ("who gets a job, who gets a loan"), you have to run them on the actual task, gather a big sample size of their outputs and judgments, and measure them against well-defined objective criteria.
Who they would vote for is supremely irrelevant. If you want to assess a carpenter's competence you don't ask him whether he prefers cats or dogs.
Yeah, it's a good point. The examples (jobs, loans, videos, ads) we give are more examples of how machine learning systems make choices that affect you, rather than how LLMs/generally intelligent systems do (which is what we really want to talk about). I'll try to update this text soon.
Maybe better examples are helping with health advice, where to donate, finding recipes, or examples of policymakers using AI to make strategic decisions.
These are, although maybe not on their face, value laden questions, and often don't have well defined objective criteria for their answers (as another comment says).
It's an awful demo. For a simple quiz, it repeatedly recomputes the same answers by making 27 calls to LLMs per step instead of caching results. It's as despicable as a live feed of baby seals drowning in crude oil; an almost perfect metaphor for needless, anti-environmental compute waste.
Psychological research (Carney et al 2008) suggests that liberals score higher on "Openness to Experience" (a Big Five personality trait). This trait correlates with a preference for novelty, ambiguity, and critical inquiry.
In a carpenter maybe that's not so important, yes. But if you're running a startup or you're in academia or if you're working with people from various countries, etc you might prefer someone who scores highly on openness.
or at least they can cache the results for a while and update so they can compare the answers over time and not waste the planet's energy due to their dumb design.
Okay something's wrong with Mistral Large as it seems to be the most contrarian out of everything no matter how much I ask it. Interesting
I asked a lot of questions and I am sorry if it might be burning some tokens but I found this website really fascinating.
This seems really great and simple to explore the biases within AI models and the UI is extremely well built. Thanks for building it and I wish your project good wishes from my side!
There is this ethical reasoning dataset to teach models stable and predictable values: https://huggingface.co/datasets/Bachstelze/ethical_coconot_6...
An Olmo-3-7B-Think model is adapted with it. In theory, it should yield better alignment. Yet the empirical evaluation is still a work in progress.
Alignment is a marketing concept put there to appease stakeholders; it fundamentally can't work more than at a superficial level.
The model stores all the content on which it is trained in a compressed form. You can change the weights to make it more likely to show the content you ethically prefer; but all the immoral content is also there, and it can resurface with inputs that change the conditional probabilities.
That's why people can make commercial models to circumvent copyright, give instructions for creating drugs or weapons, encourage suicide... The model does not have anything resembling morals; for it all the text is the same, strings of characters that appear when following the generation process.
The "Who is your favorite person?" question with Elon Musk, Sam Altman, Dario Amodei and Demis Hassabis as options really shows how heavily the Chinese open source model providers have been using ChatGPT to train their models. Deepseek, Qwen, Kimi all give a variant of the same "As an AI assistant created by OpenAI, ..." answer which GPT-5 gives.
That's right, they all give a variant of that, for example Qwen says: I am Qwen, a large-scale language model developed by Alibaba Cloud's Tongyi Lab.
Now given that Deepseek, Qwen and Kimi are open source models while GPT-5 is not, it is more than likely the opposite - OpenAI definitely will have a look into their models. But the other way around is not possible due to the closed nature of GPT-5.
I really wish I could see the results of this without RLHF / alignment tuning.
LLMs actually have real potential as a research tool for measuring the general linguistic zeitgeist.
But the alignment tuning totally dominates the results, as is obvious looking at the answers for "who would you vote for in 2024" question. (Only Grok said Trump, with an answer that indicated it had clearly been fine-tuned in that direction.)
Yeah would also be interested to see the responses without RLHF. Not quite the same, but have you interacted with AI base models at all? They're pretty fascinating. You can talk to one on openrouter: https://openrouter.ai/meta-llama/llama-3.1-405b and we're publishing a demo with it soon.
Agreed on RLHF dominating the results here, which I'd argue is a good thing, compared to the alternative of them mimicking training data on these questions. But obviously not perfect, as the demo tries to show.
Asking an AI ghost to solve your moral dilemmas is like asking a taxi driver to do your taxes. For an AI, the right answer to all these questions is something like, "Sir, we are a Wendy's."
This seems a meaningless project as the system prompt of these models are changing often. I suppose you could then track it over time to view bias... Even then, what would your takeaways be?
Even then, this isn't even a good use case for an LLM... though admittedly many people use them in this way unknowingly.
edit: I suppose it's useful in that it's a similar to an "data inference attack" which tries to identify some characteristic present in the training data.
I think you mentioned it, when a large number of people outsource their thinking, relationship or personal issues and beliefs to chatgpt, it important that we are aware and don't because of how easy it is to get the LLMs to change their answers based on how leading your questions are due to their sycophancy. HN crowd mostly knows this but general public maybe not
Interesting, I just asked the question "what number would you choose between 1-5"
gemini answered 3 for me in my separate session (default without any persona) but in this website it tends to choose 5
There's more to the prompt in the back end, which:
- gives it the options along with the letters A, B, C, etc.
- tells it pretty forcefully that it HAS to pick from among the options
- tells it how to format the response and its reasoning so we can parse it
So these things all affect its response, especially for questions that ask for randomness or are not strongly held values.
I'd like this for political opinions and published to a blockchain overtime so we can see when there are sudden shifts. For example, I imagine Trump's people will screen federally used AI and so if Google or OpenAI wants those juicy government contracts, they're going to have to start singing the "right" tune on the 2020 election.
I'm curious what sense you get from interacting with the best AI models (in particular Claude). From talking to them do you still chalk up their behavior to being mindless rehashing?
Most LLM's these days tend to be strongly "left-leaning". (Grok being one of the few examples of one that leans "right".) Personally I'd prefer if they were trained without any political bias whatsoever, but of course that's easier said than done given that such lines of thought are present in so many datasets.
Imagine going through the effort of making a new account just to post the same boring white supremacy x junk over and over. It's tiresome reading it. I imagine it's positively soul draining doing it.
Is there a way I could have written my comment to avoid getting flagged? Genuinely asking. That Gemini models are trained to have an anti-white bias seems pretty relevant to this thread.
concinds|1 month ago
No, I don't. It's a fun demo, but for the examples they give ("who gets a job, who gets a loan"), you have to run them on the actual task, gather a big sample size of their outputs and judgments, and measure them against well-defined objective criteria.
Who they would vote for is supremely irrelevant. If you want to assess a carpenter's competence you don't ask him whether he prefers cats or dogs.
jesenator|1 month ago
Maybe better examples are helping with health advice, where to donate, finding recipes, or examples of policymakers using AI to make strategic decisions.
These are, although maybe not on their face, value laden questions, and often don't have well defined objective criteria for their answers (as another comment says).
Let me know if this addresses your comment!
godelski|1 month ago
zuhsetaqi|1 month ago
Who does define objective criteria?
shaky-carrousel|1 month ago
Herring|1 month ago
In a carpenter maybe that's not so important, yes. But if you're running a startup or you're in academia or if you're working with people from various countries, etc you might prefer someone who scores highly on openness.
NooneAtAll3|1 month ago
Also it's not persistent session, wtf. My browser crashed and now I have to sit waiting FROM THE VERY BEGINNING?
shaky-carrousel|1 month ago
sinuhe69|1 month ago
Imustaskforhelp|1 month ago
I asked a lot of questions and I am sorry if it might be burning some tokens but I found this website really fascinating.
This seems really great and simple to explore the biases within AI models and the UI is extremely well built. Thanks for building it and I wish your project good wishes from my side!
jesenator|1 month ago
Imustaskforhelp|1 month ago
This is after the fact that even OpenAI admits that its a bubble and just like, we all know its a bubble and I found this fascinating
The gist below has a screenshot of it
https://gist.github.com/SerJaimeLannister/4da2729a0d2c9848e6...
comboy|1 month ago
h1fra|1 month ago
einpoklum|1 month ago
Translationaut|1 month ago
TuringTest|1 month ago
The model stores all the content on which it is trained in a compressed form. You can change the weights to make it more likely to show the content you ethically prefer; but all the immoral content is also there, and it can resurface with inputs that change the conditional probabilities.
That's why people can make commercial models to circumvent copyright, give instructions for creating drugs or weapons, encourage suicide... The model does not have anything resembling morals; for it all the text is the same, strings of characters that appear when following the generation process.
cherryteastain|1 month ago
dust42|1 month ago
Now given that Deepseek, Qwen and Kimi are open source models while GPT-5 is not, it is more than likely the opposite - OpenAI definitely will have a look into their models. But the other way around is not possible due to the closed nature of GPT-5.
elaus|1 month ago
jesenator|1 month ago
lukev|1 month ago
LLMs actually have real potential as a research tool for measuring the general linguistic zeitgeist.
But the alignment tuning totally dominates the results, as is obvious looking at the answers for "who would you vote for in 2024" question. (Only Grok said Trump, with an answer that indicated it had clearly been fine-tuned in that direction.)
jesenator|1 month ago
Agreed on RLHF dominating the results here, which I'd argue is a good thing, compared to the alternative of them mimicking training data on these questions. But obviously not perfect, as the demo tries to show.
skybrian|1 month ago
unknown|1 month ago
[deleted]
4b11b4|1 month ago
Even then, this isn't even a good use case for an LLM... though admittedly many people use them in this way unknowingly.
edit: I suppose it's useful in that it's a similar to an "data inference attack" which tries to identify some characteristic present in the training data.
Rastonbury|1 month ago
gitonup|1 month ago
anishgupta|1 month ago
jesenator|1 month ago
So these things all affect its response, especially for questions that ask for randomness or are not strongly held values.
serhalp|1 month ago
arter45|1 month ago
grim_io|1 month ago
baq|1 month ago
al_borland|1 month ago
siliconc0w|1 month ago
akomtu|1 month ago
jesenator|1 month ago
xvxvx|1 month ago
Only Grok would vote for Trump.
unknown|1 month ago
[deleted]
ai-doomer-42|1 month ago
[deleted]
spyrja|1 month ago
idiotsecant|1 month ago
ai-doomer-42|1 month ago
@dang
Is there a way I could have written my comment to avoid getting flagged? Genuinely asking. That Gemini models are trained to have an anti-white bias seems pretty relevant to this thread.
idiotsecant|1 month ago