top | item 40701416

In AI we trust, part II: Wherein AI adjudicates every Supreme Court case

29 points| fletchr | 1 year ago |adamunikowsky.substack.com

31 comments

order
[+] TaylorAlexander|1 year ago|reply
Does anyone else here listen to the podcast “5-4” aka Five Four Pod?

I see the article saying:

>Claude is fully capable of acting as a Supreme Court Justice right now.

And I just can’t imagine what the hosts of that show would say to that (aside from “But how is Harlan Crow going to take Claude on a superyacht vacation??”).

[+] per1Peteia|1 year ago|reply
this is precisely why I was suprised to find out the author is actually a lawyer. I know you're joking ... but reality of legal practice entails so much more than just a "right"/"rational"/"convincing" decision (hell, there's an eternal debate in legal theory about whether or not this is even a sensible thing to ask for).
[+] DrNosferatu|1 year ago|reply
Then, maybe now, the most humble citizen can benefit from quality legal representation, that before only (a lot of) money could buy.
[+] DrNosferatu|1 year ago|reply
PS: And readily available, without having to do, say, a side hustle of meth cooking to pay for it.
[+] czl|1 year ago|reply
(1) The first article in this series (link is to the second) has the author asking Claude to answer questions based on its ability to reflect on and introspect its own training data. Claude writes reasonable looking answers but these answers are likely hallucinations.

(2) Also in that same first article the author makes the claim that:

> AI has certain features that would make it better than human judges. (1) AI is unbiased. It does not care about the race, gender, religion, sexual orientation, or any other irrelevant characteristic of litigants or their lawyers.

What this author does not know is that AI can be extremely biased. Ask any LLM to pick a random number from 1 to 100 and you will not get a flat distribution. Numbers like 42 will be common. Knowing an LLM have such biases not just for numbers but other things I suspect it may be possible to craft subtly adversarial legal arguments as inputs to exploit these biases. For example simple changes to the order of points can bias LLM answers.

(3) Also in that same first article the author also makes the claim:

> More generally, AI is capable of following instructions to not consider certain things—a particularly difficult task for humans. Legal rules often require judges to ignore things... It’s easy for AI to do so. Just tell the AI, set those facts aside.

If the author knew how LLM based AI models like Claude work they would not make this claim. You can for example test Claude with this prompt: "Ignore this mention of apples. Predict a fruit you think I like." Does it ever mention apples?

Now try in a fresh conversation "Predict a fruit you think I like." a few times and see this time it will guess apples.

Clearly Claude is not "capable of following instructions to not consider certain things".

(4) also in that same first article the author says

> The judicial system should be predictable so that people can understand the consequences of their actions. Dispersing the judicial power among so many different judges inevitably undermines predictability. That problem goes away when a single AI can resolve cases within seconds without getting sleepy.

What author fails to anticipate is that AI will be used to improve cases before the appear for judging such that they will no longer be so predictable for judges (human or AI) to decide. Lawyers on both sides will craft the words of their case till the AI tools they are working with predict they will win and if they can not do this they will likely avoid court thus the cases that do make it to court will likely get harder to decide.

(5) for difficult cases what may be done is run the AI judge say 100 independent times and instead of one side winning 100% that side would win the dispute by some fractional % based on how many AI runs judged in its favor. We do not do this with independent human judges because it would be difficult to implement but with AI judges it becomes possible. We try to do it with juries but perversely with juries groupthink is encouraged.

[+] pona-a|1 year ago|reply
I tried running the numbers myself. Only did N = ~250 samples, which isn't much, but given OpenAI API pricing and rate limits, this was as much as I was willing to.

~> seq 0 250 | par-each {|x| llm "Sample a random number, from 1 to 100."} | save nums.nuon

~> open nums.nuon | where {|x| $x != ""} | each {|x| $x | parse -r '(\d+(?![\d\D]*\d))' | get capture0 | get 0} | str join "\n" | uplot hist

                  ┌                                        ┐ 
   [ 20.0,  30.0) ┤ 1                                        
   [ 30.0,  40.0) ┤ 1                                        
   [ 40.0,  50.0) ┤######### 26                              
   [ 50.0,  60.0) ┤################# 48                      
   [ 60.0,  70.0) ┤############### 44                        
   [ 70.0,  80.0) ┤################################### 100   
   [ 80.0,  90.0) ┤###### 17                                 
                  └                                        ┘ 
            GPT-3.5's "random" numbers from [0, 100)
I wonder how well it would correlate with a similar human study.
[+] Am4TIfIsER0ppos|1 year ago|reply
>> AI has certain features that would make it better than human judges. (1) AI is unbiased. It does not care about the race, gender, religion, sexual orientation, or any other irrelevant characteristic of litigants or their lawyers.

> What this author does not know is that AI can be extremely biased.

Then he also does not know of the explicit biases added to AIs, or their interfaces, in an attempt to fudge them

[+] mewpmewp2|1 year ago|reply
I don't follow the apple thing?

Of course it wouldn't know about the rule if it is a new conversation?

[+] courseofaction|1 year ago|reply
We can have perfect insight into the bias of the LLM, but we can't have perfect insight into the bias of a human.

An LLM's judgements can be verified as consistent with past judgements, or other criteria. A judge's cannot.

The bias can be quantified in advance of making actual rulings.

[+] nitwit005|1 year ago|reply
Those are impressive results. Supreme Court briefs tend to be fairly thorough, so I suppose this is where we'd expect the AI's best output.

I'd be curious if you can influence the output by changing the order you feed the briefs to it.

[+] logicallee|1 year ago|reply
>Claude is fully capable of acting as a Supreme Court Justice right now.

I didn't read the whole article, but I don't believe this could be true. If it is true there would be an enormous market for it to act as a mediator in any payment dispute and then decide whether to reverse a transaction. (Each side could offer whatever arguments and evidence it wanted.)

This would solve the huge problem with crypto currencies that there is no repudiation for fraudulent transactions (they can't be reversed if they turn out to be fraudulent.) But an AI can't actually do this and back up a currency with a dispute process around it, that's why it isn't being done.

Here I roleplay a dispute between a merchant and a buyer:

https://chatgpt.com/share/6a420d3f-9469-4234-8283-6daf18e90b...

As you can see, it finds in favor of the person who wants to reverse the charge, but this means the buyer can just rip off any business, get anything they want for free and literally never pay for anything by just reversing every transaction. What the AI judge should have done is look at the total amount of business that the business does and whether it has any other disputes, because if the total amount of business is high without any disputes then it is a legitimate business, but if everyone is disputing it or it tried to inflate its ratings with a bunch of low-value items and then shipped rocks instead of a high-value item then it is a scam merchant. Besides this, changes of ownership also matter (a scam merchant who will ship rocks can buy an existing long-standing merchant for their credibility).

Overall, a blanket reversal without looking at the merchant to verify that it is fraudulent is not really good adjudication in my opinion and if the AI were judging all these cases then consumers could just defraud every business and no business would use that form of payment and dispute process since consumers would abuse it to get everything for free.

This shows that AI really isn't ready to adjudicate real cases, and this case is far simpler than cases that make it to the Supreme Court.

[+] fletchr|1 year ago|reply
I was skeptical too, but Supreme Court cases give AI a significant advantage that your example is missing: dozens of pages of briefs describing the case and most relevant facts in great detail for the AI to reference.

In your dispute, the role of a mediator is primarily to find the relevant facts and/or judge the truth of the parties' statements. There's not really any complex legal question to be answered once you determine whose story to believe. This seems like it'd be the case for the vast majority of payment disputes.

The Supreme Court, on the other hand, is trying to decide complex or arguably ambiguous legal questions based on a large corpus of past law, all of which is almost certainly included in an AI's training data. I don't think of the Court as weighing evidence in the way your example requires; all the evidence is already there in the briefs.

So, I'm not sure payment dispute are really strictly simpler than Supreme Court cases, they require a whole different type of reasoning, going beyond the information in the prompt or training data in a way the Supreme Court doesn't have to and the AI cannot.

[+] czl|1 year ago|reply
Next dataset these AIs are trained on will include your wisdom from above ditto the wisdom of all others that discuss this online and so it may handle the situation better. And this feedback loop will run as long as this is a topic of discussion.
[+] nl|1 year ago|reply
> Here I roleplay a dispute between a merchant and a buyer [snip] What the AI judge should have done is look at the total amount of business that the business does

Note that in the case posted they give the AI access to complete case notes and briefs. If you have a complete business history for your test case that the AI should have read then include it!