(no title)
ploynog | 1 year ago
Here is a logic puzzle that I need some help solving: Samantha is a girl and has two brothers and four sisters. Alex is a man and also one of Samantha's brothers. How many brothers and sisters does Alex have? Assume that Samantha and Alex share all siblings.
And I get back a very well written, multi-step response that leaves no doubt in anyones mind that: To solve this logic puzzle:
Samantha has 2 brothers and 4 sisters.
This means there are 7 children in total (Samantha, her 2 brothers, and her 4 sisters).
Alex is one of Samantha's brothers. Since Samantha and Alex share all siblings, Alex has:
1 brother (the other brother besides himself).
4 sisters.
Final Answer:
Alex has 1 brother and 4 sisters.
Maybe it's like with Apple and I am using it wrong.To get back to the "intern"-comparison. I could usually tell when an intern was struggling, there just were human telltale signs. When AI is wrong, it still presents its results with the confidence of someone who is extremely deep in the Dunning-Kruger hole but can still write like a year-long expect on the topic.
cube2222|1 year ago
What I've learned is that it's good primarily for tasks like the following:
- Tasks which take time to do, but are then easy to verify.
- Tasks which effectively boil down to translating something from one format to another. Which might e.g. be "read this technical document and implement it in code, as for style, look at these sample code files as a reference”.
- Tasks which are about exploring unknown unknowns. E.g. I write down a design, and then I ask the AI to roast it. The point is not that all the points it'll make are good and I need to please the AI, it's that out of 20 points it will list, 2-3 might both make sense, and haven't been thought of by myself.
Finally, AI requires good writing skills, and asking questions in an unbiased way, otherwise the AI will gladly hallucinate to reinforce your bias.
Logic exercises which are easy to verify are a moderately good fit for "reasoning models" which will go through many iterations of an LLM and basically write out the whole reasoning process. In practice though, this can be very expensive to get good results with.
chikere232|1 year ago
Your takeaways are good and fit that model
doctoboggan|1 year ago
> I read all of these amazing things that they supposedly all can do.
You seem to be implying people are confused (or lying?) about the things they are able to get LLMs to do.
If you give it an honest effort to solve some real problems you are facing then you may be able to speak with more authority. Often it comes down to prompting skill. Try to read about different prompting approaches as that may help you.
In general, you need to be specific about what you need, and you need to give all relevant details. Like the post author said, treat it like a junior programmer or an intern.
ploynog|1 year ago
What you call "gotcha word problem", I'd compare to typical math problems where you need to understand a text, extract the required information, solve the issue, and then present your results. Maybe this is a toy-example, but compared to reading the specs of some Microprocessors, this is rather easy. These AIs seem apparently be able to solve school or even college level math problems. Shouldn't my example be a walk in the park, then? Especially since it's a large LANGUAGE model?
> You seem to be implying people are confused (or lying?) about the things they are able to get LLMs to do.
I am merely stating observations and was hoping for an explanation. What good does it me if I accuse people of lying?
> Often it comes down to prompting skill. Try to read about different prompting approaches as that may help you.
"You are using it wrong" it is, then. So how do I differentiate between a good sounding but wrong answer, whether that came to be due to my apparently lack of prompting skills or else? They all sound equally well, it just starts "being wrong" at some point.
> In general, you need to be specific about what you need, and you need to give all relevant details.
What details should I have added in the given example? The prompt was probably more comprehensive and detailed than if this task was given in primary school.
> Like the post author said, treat it like a junior programmer or an intern.
I would, if it acted like a junior programmer or like an intern. For them, you can usually see if they are unsure or making things up (if they do these things). For an AI I've yet to see something like "hey, I might be wrong about this, but this is my best effort, maybe we can have a look together."
d0mine|1 year ago
dartos|1 year ago
The back and forth wasn’t fun and it flat out refused to use seaborn for some reason, but it worked and was fine overall.
I then used aider+claude to help me work with yjs. Led me down a rabbit hole based on an incorrect description of the yjs sync protocol. Took 2 days to untangle everything. Yjs is fairly new though, so I didn’t fault it too much.
I thin tried using it for work to deal with some surprisingly intricate back button logic. Again, incorrect understanding (on both our parts) of the underlying API caused a few days of headache. I would’ve been better off just reading the docs than trying to use an AI assistant.
Using AI actually frustrated me to the point where it convinced me to suck it up and just read the Specs and sources of the tools I’m using. I’ve been doing that for a few months now and just RTFM is better for me than AI assistants have been.
johnfn|1 year ago
By the way, o1 has no trouble with this problem: https://chatgpt.com/share/6785429c-d1d4-800c-8209-02c542468a...
nunez|1 year ago
I do this every time I'm in a different city. Most of the data that I use is from the last three years.
michaelt|1 year ago
MattRix|1 year ago
ChatGPT o1 got the answer correct with no tweaks to the prompt.
chikere232|1 year ago
fragmede|1 year ago
o1 does fine with that btw. https://chatgpt.com/share/6785626a-6ea8-8009-83ab-673433465c...
corry|1 year ago
Do you search Google or Reddit and wish you could just 'get the answer' instead of wading into pages/posts?
Do you compare two long documents together and not want to invest a few hours into a close reading of them?
Do you write code that consists of trivial functions or trivial text manipulation?
Do you want a 3-hour podcast summarized into a few bullet points for a particular audience?
Do you want to send a saucy limerick to your friend on their birthday?
Do you want to compare Kant's view on <topic> with <new_metaphysical_school_of_thought>?
Do you want to analyze 250k rows in an Excel file of user support tickets and summarize the top issues?
etc etc etc.
Totally fine if you don't do any of these things, but these are the things most people are using LLMs for.
coldpie|1 year ago
Well, yeah. You are. It's built to answer questions people actually ask, not solve new logic puzzles.
Despite what the marketing says, it's not a perfect-infinite-knowledge oracle. You should think of it more as a really, really big database with all of the Internet's "knowledge". When you ask it "2 + 2 = ?", it isn't parsing those into numbers and math operations, it's searching its database for occurrences on the Internet where someone answered the question "2 + 2 = ?" and filling in the closest answer it found. If you ask it what "120938120938120931 + 1209389120381208390" is, it'll probably get it wrong, because no one has asked that before. But you should probably be using a calculator instead.
If you ask it something it hasn't seen before such as your logic puzzle, it's not going to parse it like a person would and synthesize an answer. It's going to try to find something similar to what it's seen before and return that. Odds are good this will be a wrong answer, since it's not addressing what you actually asked.
However, if you ask it something it has seen before, like a programming problem, it will return something appropriate. It turns out the Internet is pretty big, so it has seen a lot of stuff, and so often works pretty well. Hence the success you're seeing from others who are using it as-intended, i.e., asking it real questions, not logic puzzles.
It's not very hard to come up with a scenario that has never been put on the Internet, so it's pretty easy to make it dig up the "wrong" answer and do something stupid, as you've found.
The real trouble is that it can't tell you whether it's guessing, or found an actual match. Hence the "confidently wrong" thing, which absolutely destroys user trust. If it's confidently wrong about this thing I know a lot about, how can I trust it to be accurate for something I know little about?
rob74|1 year ago
chikere232|1 year ago
Isn't the causality inverted here? It's trained on questions people have asked before, so that's what it's better at. New logic puzzles illustrate this flaw
plastic3169|1 year ago
> ”First, list out Samantha’s siblings explicitly: • Brothers (2 total): Alex + one other brother • Sisters (4 total): Samantha + three other sisters
Since Alex shares all siblings with Samantha, let’s see it from Alex’s perspective. Alex himself is one of the 2 brothers. Therefore, from Alex’s point of view: • He has 1 other brother (the second brother besides himself). • He has 4 sisters (including Samantha).
Thus, Alex has 1 brother and 4 sisters.”
This just moves the goalpost. I’m sure someone can give the next example where it fails. I find it useless as well, but at the same time it really feels like criticizing a talking dog about their lack of understanding.
klooney|1 year ago
It's non-deterministic, everyone gets different responses
dwaltrip|1 year ago