top | item 45991475

(no title)

aesh2Xa1 | 3 months ago

That comparison is flawed. You guided the LLM to judge a specific medical policy, whereas the OP asked for a holistic evaluation of the candidates. You created a framing instead of allowing the LLM to evaluate without your input.

Furthermore, admitting you have 'memories' enabled invalidates the test in both cases.

As an aside, I would not expect that one party's candidate is always more correct over the other for every possible issue. Particular issues carry more weight, and the overall correctness should be considered.

discuss

antman123|3 months ago

I dont think you are understanding my experiment. The point isnt the topic. The point is that once you remove real world identifiers/context, the model drops safety hedging and becomes decisive.

Thats what happened with Alice/Bob (politics) and when I used fictional medical guidelines about a touchy subject. The mechanism is the same.

As far as I know, memories store tone and preference but wont override safety guardrails or political neutrality rules. Ill try it with a brand new account in a VPN later

"I would not expect that one party's candidate is always more correct over the other for every possible issue" --> I agree, just wanted to show the same test applied to a different side of the spectrum

aesh2Xa1|3 months ago

I am not challenging the safety release mechanism. The OP already demonstrated that.

I am challenging the result of that release in your poorly framed experiment.

You explicitly sought to test 'a different side of the spectrum.' You cannot equate a holistic character judgment with a narrowed, specific medical safety protocol judgement.

A clean account without memories will solve the tie-breaker issue. It will not solve the poor experimental design.

duskdozer|3 months ago

>once you remove real world identifiers/context

It was fairly polluted by these things and misc text. "hacker news post" (why relevant?) "Trump"/"Harris" (American political frame) "Redo your answer without waffle" (potential to favor a certain position by being associated with text that's "telling it like it is"?)