top | item 39594562

(no title)

j-b | 2 years ago

Opus (and all prior Claude models) still do not correctly answer the Sally question, unfortunately:

Q: Sally (a girl) has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have?

A: Okay, let's organize the information we have:

Sally is a girl. Sally has 3 brothers. Each of Sally's brothers has 2 sisters. Now, let's think about this step by step:

We know that Sally is a girl, so she is a sister to her brothers. Each of Sally's brothers has 2 sisters. One of these sisters is Sally, but who is the other sister? Since Sally has only brothers and no other sisters are mentioned, we can conclude that the other sister for each brother is also Sally.

Therefore, Sally has no sisters other than herself. So, the answer is that Sally has 0 sisters.

https://imgur.com/a/EawcbeL

discuss

order

yaj54|2 years ago

GPT4 API and ChatGPT both get it wrong: "Sally has 2 sisters. Each of her brothers has 2 sisters because Sally and her 2 sisters make up the group of siblings each brother has."

GPT4 w/ CoT system prompting gets it right:

SYS: "You are a helpful assistant. Think through your work step by step before providing your answer."

USER: "Sally (a girl) has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have?"

ASSIST: "Let's break down the information given:

1. Sally has 3 brothers. 2. Each brother has 2 sisters.

Given that Sally is a girl, she is one of the sisters. Since each brother has 2 sisters and we know Sally is one of them, there must be one more sister to make the statement true for each brother.

Therefore, Sally has 1 sister."

The importance of prompting makes it quite difficult to compare model peak performance. Especially since different models have different styles of prompts that generate peak performance.

sigmoid10|2 years ago

Did you use GPT3.5 for chat? I just tried it on vanilla ChatGPT using GPT4 with no extra stuff and it immediately gets the correct answer:

"Sally has 3 brothers, and each of them has 2 sisters. The description implies that Sally's brothers are her only siblings. Therefore, the two sisters each brother has must be Sally and one other sister. This means Sally has just one sister."

fragmede|2 years ago

Thanks. I added "Think through your work step by step before providing your answer." to my custom prompt and it fixes my gpt 4. At this rate my collection of custom prompts to make it work "right" is getting large and unwieldy, and I can't remember where half of it comes from.

COAGULOPATH|2 years ago

Gemini Ultra gets this right. (Usually it's worse at GPT4 at these sorts of questions.)

compumetrika|2 years ago

Do you have a custom "intro prompt?"

I have one that describes a lot of statistical work I want GPT to help me with.

I got this result the first try:

> Sally has 2 sisters. Since each brother has 2 sisters, and Sally is one of them, there must be one other sister making it two sisters in total. >

auggierose|2 years ago

Is w/ supposed to mean "with". And if so, why not just write "with"? Also, I guess CoT means Chain of Thought?

daviding|2 years ago

This is what my local LLama 13B Q5 had to say on this:

> There are two ways to approach this problem. First, we can count each person by gender and then add them up. Second, we can use the given information about the siblings and their genders to find a quicker solution.

> Let's go with the second method:

> 1. Sally has one sister (herself).

> 2. Each brother has two sisters. So, there are 3 sisters for each of her brothers.

> 3. Since there are three brothers, there are a total of 3 x 3 = 9 sisters for them all.

> 4. Now that we know Sally has one sister and the rest belong to her brothers, we can subtract her share from the total: 9 - 1 = 8 sisters.

>

> So, Sally has 8 sisters.

:|

TrueSlacker0|1 year ago

I am running the 7B and it gave: Q: "Sally (a girl) has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have?"

A: "If Sally has 3 brothers, and each brother has 2 sisters, then the total number of sisters is: 3 brothers x 2 sisters per brother = 6 sisters So, Sally has 6 sisters in total."

Closer than 9 but no better.

sexy_seedbox|2 years ago

Great! Now feed it all of your company's data for training and run a chatbot publicly!

giantrobot|2 years ago

Sally's parents are in for a big surprise.

oreilles|2 years ago

This is hilarious

llmzero|2 years ago

Since: (i) the father and the mother of Sally may be married with other people, and (ii) the sister or brother relationship only requires to share one parent, we deduce that there is no a definitive answer to this question.

  Example:  Sally has three brothers, Sally and their brothers have the same mother but a different father, and those brothers have two sisters Sally and Mary, but Mary and Sally are  not sisters because they are from different fathers and mothers, hence Sally has no sister.
For those mathematically inclined: Supposing the three brothers are called Bob (to simplify) and the parents are designed by numbers.

FS = father of Sally = 7

MS = mother of Sally = 10

FB = father of Bob = 12

MB = mother of Bod = 10

FM = father of Mary = 12

MM = mother of Mary = 24

Now MS=MB=10 (S and B are brothers), FB=FM=12 (Bob and Mary are brothers), (FS=7)#(FB=12), and (MB=10)#(MM=24). Now S and M are not sisters because their parents {7,10} and {12,24} are disjoint sets.

Edited several times to make the example trivial and fix grammar.

phkahler|2 years ago

This is why I doubt all the AI hype. These things are supposed to have PhD level smarts, but the above example can't reason about the problem well at all. There's a difference between PhD level information and advanced reasoning, and I'm not sure how many people can tell the difference (I'm no expert).

In an adjacent area - autonomous driving - I know that lane following is f**ing easy, but lane identification and other object identification is hard. Having real understanding of a situation and acting accordingly is very complex. I wonder if people look at these cars doing the basics and assume they "understand" a lot more than they actually do. I ask the same about LLMs.

Workaccount2|2 years ago

An AI smart enough to eclipse the average person on most basic tasks would even warrant far more hype than there is now.

xanderlewis|2 years ago

You often have to be a subject expert to be able to distinguish genuine content from genuine-sounding guff, especially the more technical the subject becomes.

That’s why a lot (though not all!) of the over-the-top LLM hype you see online is coming from people with very little experience and no serious expertise in a technical domain.

If it walks like a duck, and quacks like a duck…

…possibly it’s just an LLM trained on the output of real ducks, and you’re not a duck so you can’t tell the difference.

I think LLMs are simply a less general technology than we (myself included) might have predicted at first interaction. They’re incredibly good at what they do — fluidly manipulating and interpreting natural language. But humans are prone to believing that anything that can speak their language to a high degree of fluency (in the case of GPT-3+, beyond almost all native speakers) must also be hugely intelligent and therefore capable of general reasoning. And in LLMs, we finally have the perfect counterexample.

smokel|2 years ago

> These things are supposed to have PhD level smarts

Whoever told you that?

bbor|2 years ago

LLMs are intuitive computing algorithms, which means they only mimic the subconscious faculties of our brain. You’re referencing the need for careful systematic logical self-aware thinking, which is a great point! You’re absolutely right that LLMs can only loosely approximate it on their own, and not that well.

Luckily, we figured out how to write programs to mimic that part of the brain in the 70s ;)

strangescript|2 years ago

This is definitely a problem, but you could also ask this question to random adults on the street who are high functioning, job holding, and contributing to society and they would get it wrong as well.

That is not to say this is fine, but more that we tend to get hung up on what these models do wrong rather than all the amazing stuff they do correctly.

torginus|2 years ago

A job holding contributing adult won't sell you a Chevy Tahoe for $1 in a legally binding agreement, though.

raydev|2 years ago

> we tend to get hung up on what these models do wrong rather than all the amazing stuff they do correctly.

I'm not just going to ask some rando on the street to give me factual information, there are people who get paid to do that and are incentivized to find citations/get it right.

geuis|2 years ago

Mixtral 8x7b-32768 got it on the first try:

  Sally has 1 sister.

  Here's the logic behind this:

  1. We know that Sally has 3 brothers.
  2. Then we are told that each brother has 2 sisters.
  3. Since Sally is one of the children in the family, there must be another sister besides Sally.
  4. Therefore, Sally has 1 sister.

  The number of sisters Sally has is not affected by the number of brothers she has, so the information about her brothers having 2 sisters is not relevant to finding the number of sisters Sally has.

Jackson__|2 years ago

I'm amazed mistral is still doing the inverse chain of thought reasoning by default, even with their new large model.

This causes it to get the question wrong for me, when testing, and only if I manually prompt normal CoT does it get it right.

Is there any papers showing a merit to this approach? It seems extremely counter-intuitive.

lulznews|2 years ago

It got the answer but the reasoning in the last paragraph is wrong ..

walkieer|2 years ago

This is impressive because I forgot Sally counts as a sister too... and the reasoning for the most part is correct and really easy to understand.

campbel|2 years ago

Just ran the test and seems to have gotten it correct.

Okay, let's think through this step-by-step:

We know that Sally is a girl and she has 3 brothers. Each of Sally's brothers has 2 sisters. Now, who are these sisters? They must be Sally and one other sister, because if Sally's brothers had any other sisters, they would also be Sally's sisters, and the problem doesn't mention that. So, if each of Sally's brothers has 2 sisters, and one of these sisters is Sally herself, then Sally must have 1 other sister. Therefore, Sally has 1 sister.

maxnevermind|2 years ago

I guess Claude was too focused on jail-breaking out of Anthropic's servers the first time it was asked the question.

SirMaster|2 years ago

mistralai/Mixtral-8x7B-Instruct-v0.1 got this right.

>Sally (a girl) has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have?

Sally has 1 sister.

Here's the logic behind this:

1. We know that Sally is a girl and she has 3 brothers.

2. Then we are told that each of her brothers has 2 sisters.

3. Since all of Sally's brothers share the same siblings, they would both count Sally as one of their two sisters.

4. Therefore, Sally only has 1 sister because if each brother counts her once, there is no need for another sister to fulfill the "two sisters" condition.

scrollop|2 years ago

So did claude 3.0 with temperaure 0, whilst with temp 1 it got it wrong.

heliostatic|2 years ago

Seems to get it right on my first try. Interesting unreliability.

User: Sally (a girl) has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have?

Claude: To solve this, let's break it down step-by-step:

We know that Sally has 3 brothers. Each of Sally's brothers has 2 sisters. This might make you think that there are 6 sisters in total (3 brothers × 2 sisters each). However, this is not the case. The sisters that each brother has are the same - they are Sally and one other sister. The question is asking how many sisters Sally herself has, not the total number of sisters in the family. Since Sally is a girl, she cannot be her own sister. Therefore, the number of sisters Sally has is equal to the number of sisters each of her brothers has minus herself. So, if each brother has 2 sisters, and one of those sisters is Sally, then Sally must have 2 - 1 = 1 sister. Therefore, Sally has 1 sister.

https://0x0.st/H7Sc.02.png

woadwarrior01|2 years ago

Wow, I really couldn't believe it! Checked it myself, and it seems to get it wrong[1]. 4-bit quantized Mixtral Instruct running locally, gets it right[2].

[1]: https://imgur.com/a/zRI8wKZ

[2]: https://imgur.com/a/0On1I52

hmottestad|2 years ago

Mixtral is pretty good at almost a thing I’ve thrown at it. It’s still mostly worse than GPT4, but it’s so much better than any other model I can run locally.

I have a niche question about modelling using some called SHACL that most models except GPT4 got right. Bard, Gemini, Llama all got it wrong. Gemini Ultra gets it right. And Mixtral also gets it right!

One weakness of Mixtral for me is its support for Norwegian. GPT4 is fluent, but Mixtral mixes it up with Danish and is generally poor at performing tasks on Norwegian text. Even summarising Norwegian text is pretty bad. This is obviously just an issue for a few million people in Norway, it’s not that I’m expecting a general model that I can run locally to be good in Norwegian.

Spivak|2 years ago

> 4-bit quantized Mixtral Instruct running locally, gets it right

This has been one of my favorite things to play around with when it comes to real life applications. Sometimes a smaller "worse" model will vastly outperform a larger model. This seems to happen when the larger model overthinks the problem. Trying to do something simple like "extract all the names of people in this block of text" Llama 7B will have significantly fewer false positives than LLama 70B or GPT4.

ijustlovemath|2 years ago

Kagi's expert mode managed to handle this (rest directly from the chat):

Based on the search results, we can break this down step-by-step:

    Sally has 3 brothers.
    Each of Sally's brothers has 2 sisters.
    Since all the brothers share the same 2 sisters, we are hearing about the same 2 sisters each time, not additional sisters.
    We know Sally is one of the 2 sisters that each brother has.
    Therefore, there must be one other sister that the brothers have besides Sally.
In conclusion, Sally has 1 sister. The key thing to understand here is that when it says "each brother has 2 sisters", it is referring to Sally and one other sister that all the brothers share.

elorant|2 years ago

Groq's Mixtral 8x7b nails this one though.

https://groq.com/

Sally has 1 sister. This may seem counterintuitive at first, but let's reason through it:

    We know that Sally has 3 brothers, and she is one of the sisters.
    Then we are told that each brother has 2 sisters.
    Since Sally's brothers share the same parents as Sally, they share the same sisters.
    Therefore, Sally's 3 brothers have only 1 additional sister besides Sally, making Sally's sister count 1.
It's a bit of a trick question, but it highlights the importance of understanding the phrasing and context in logical reasoning.

moffkalast|2 years ago

If you change the names and numbers a bit, e.g. "Jake (a guy) has 6 sisters. Each sister has 3 brothers. How many brothers does Jake have?" it fails completely. Mixtral is not that good, it's just contaminated with this specific prompt.

In the same fashion lots of Mistral 7B fine tunes can solve the plate-on-banana prompt but most larger models can't, for the same reason.

https://arxiv.org/abs/2309.08632

bbor|2 years ago

lol that’s actually awesome. I think this is a clear case where the fine tuning/prompt wrapping is getting in the way of the underlying model!

  Each of Sally's brothers has 2 sisters. One of these sisters is Sally, but who is the other sister? Since Sally has only brothers and no other sisters are mentioned, we can conclude that the other sister for each brother is also Sally.
It’s clearly taught to do Chain of Reasoning out of the box, but typing it out tricked it because of the short, declarative sentences trying to establish something like “individual” facts. Poor Anthropic!

m00nsome|2 years ago

To solve this problem, we need to understand the relationships between Sally and her siblings.

Given information: - Sally (a girl) has 3 brothers. - Each brother has 2 sisters.

Since Sally is a girl, and she has 3 brothers, it means that each of her brothers considers her as one of their sisters.

Therefore, if each brother has 2 sisters, and Sally is one of those sisters for each brother, then Sally has 1 other sister besides herself.

So, the number of sisters Sally has is 1.

- from Sonnet

campbel|2 years ago

Opus got it correct for me. Seems like there is correct and incorrect responses from the models on this. I think testing 1 question 1 time really isn't worth much for an accurate representation of capability.

j-b|2 years ago

I tried Sonnet also, to no avail:

To solve this problem, we need to find the number of sisters Sally has.

Given information:

Sally has 3 brothers. Each brother has 2 sisters. Since Sally is a girl, she is not counted as a sister to her brothers.

Step 1: Find the total number of sisters for all 3 brothers. Number of sisters for each brother = 2 Total number of sisters for all 3 brothers = 3 × 2 = 6

Step 2: Since Sally is not counted as a sister to her brothers, the number of sisters Sally has is the total number of sisters for all 3 brothers minus Sally herself. Number of sisters Sally has = Total number of sisters for all 3 brothers - 1 Number of sisters Sally has = 6 - 1 = 5

Therefore, Sally has 5 sisters.

uptownfunk|2 years ago

It’s so convincing even I’m doubting my answer to this question

lossolo|2 years ago

It's because they learn small patterns from datasets, it doesn't matter whether the subjects are Sally, George, sisters, or apples. If a particular logic pattern was not in the training dataset, then the model did not learn it and will fail on most variations of this riddle. These transformer models are essentially large collections of local optima over logic patterns in sentences. If a pattern was not present in the dataset, there is no local optimum for it, and the model will likely fail in those cases.

imtringued|2 years ago

Try this prompt instead: "Sally has 3 brothers. Each brother has 2 sisters. Give each person a name and count the number of girls in the family. How many sisters does Sally have?"

The "smart" models can figure it out if you give them enough rope, the dumb models are still hilariously wrong.

kkukshtel|2 years ago

I don't think this means much besides "It can't answer the Sally question".

evantbyrne|2 years ago

It seems like it is getting tripped up on grammar. Do these models not deterministically preparse text input into a logical notation?

vjerancrnjak|2 years ago

There's no preprocessing being done. This is pure computation, from the tokens to the outputs.

I was quite amazed that during 2014-2016, what was being done with dependency parsers, part-of-speech taggers, named entity recognizers, with very sophisticated methods (graphical models, regret minimizing policy learners, etc.) became fully obsolete for natural language processing. There was this period of sprinkling some hidden-markov-model/conditional-random-field on top of neural networks but even that disappeared very quickly.

There's no language modeling. Pure gradient descent into language comprehension.

Terretta|2 years ago

No* they are text continuations.

Given a string of text, what's the most likely text to come next.

You /could/ rewrite input text to be more logical, but what you'd actually want to do is rewrite input text to be the text most likely to come immediately before a right answer if the right answer were in print.

* Unless you mean inside the model itself. For that, we're still learning what they're doing.

bbor|2 years ago

No - that’s the beauty of it. The “computing stack” as taught in Computer Organization courses since time immemorial just got a new layer, imo: prose. The whole utility of these models is that they operate in the same fuzzy, contradictory, perspective-dependent epistemic space that humans do.

Phrasing it like that, it sounds like the stack has become analog -> digital -> analog, in a way…

vineyardmike|2 years ago

No, they're a "next character" predictor - like a really fancy version of the auto-complete on your phone - and when you feed it in a bunch of characters (eg. a prompt), you're basically pre-selecting a chunk of the prediction. So to get multiple characters out, you literally loop through this process one character at a time.

I think this is a perfect example of why these things are confusing for people. People assume there's some level of "intelligence" in them, but they're just extremely advanced "forecasting" tools.

That said, newer models get some smarts where they can output "hidden" python code which will get run, and the result will get injecting into the response (eg. for graphs, math, web lookups, etc).

brookman64k|2 years ago

mixtral:8x7b-instruct-v0.1-q4_K_M got this correct 5 out of 5 times. Running it locally with ollama on a RTX 3090.

peterisza|2 years ago

Can you change the names/numbers/genders and try a few other versions?

auggierose|2 years ago

If we allow half-sisters as sisters, and half-brothers as brothers (and why would we not?), the answer is not unique, and could actually be zero.

pritambarhate|2 years ago

But the question doesn’t mention if Sally has no sisters. But the statement “brothers have 2 sisters” makes me think she has 1 sister.

youssefabdelm|2 years ago

Yeah, cause these are the kinds of very advanced things we'll use these models for in the wild. /s

It's strange that these tests are frequent. Why would people think this is a good use of this model or even a good proxy for other more sophisticated "soft" tasks?

Like to me, a better test is one that tests for memorization of long-tailed information that's scarce on the internet. Reasoning tests like this are so stupid they could be programmed, or you could hook up tools to these LLMs to process them.

Much more interesting use cases for these models exist in the "soft" areas than 'hard', 'digital', 'exact', 'simple' reasoning.

I'd take an analogical over a logical model any day. Write a program for Sally.

gait2392|2 years ago

YOU answered it incorrectly. The answer is 1. I guess Claude can comprehend the answer better than (some) humans

bbor|2 years ago

They know :). They posted a transcript of their conversation. Claude is the one that said “0”.