top | item 42905986

(no title)

geoffhill | 1 year ago

Idk, `o3-mini-high` was able to pop this Prolog code out in about 20 seconds:

  solve(WaterDrinker, ZebraOwner) :-
      % H01: Five houses with positions 1..5.
      Houses = [ house(1, _, norwegian, _, _, _),  % H10: Norwegian lives in the first house.
                 house(2, blue, _, _, _, _),       % H15: Since the Norwegian lives next to the blue house,
                 house(3, _, _, milk, _, _),        %       and house1 is Norwegian, house2 must be blue.
                 house(4, _, _, _, _, _),
                 house(5, _, _, _, _, _) ],
  
      % H02: The Englishman lives in the red house.
      member(house(_, red, englishman, _, _, _), Houses),
      % H03: The Spaniard owns the dog.
      member(house(_, _, spaniard, _, dog, _), Houses),
      % H04: Coffee is drunk in the green house.
      member(house(_, green, _, coffee, _, _), Houses),
      % H05: The Ukrainian drinks tea.
      member(house(_, _, ukrainian, tea, _, _), Houses),
      % H06: The green house is immediately to the right of the ivory house.
      right_of(house(_, green, _, _, _, _), house(_, ivory, _, _, _, _), Houses),
      % H07: The Old Gold smoker owns snails.
      member(house(_, _, _, _, snails, old_gold), Houses),
      % H08: Kools are smoked in the yellow house.
      member(house(_, yellow, _, _, _, kools), Houses),
      % H11: The man who smokes Chesterfields lives in the house next to the man with the fox.
      next_to(house(_, _, _, _, _, chesterfields), house(_, _, _, _, fox, _), Houses),
      % H12: Kools are smoked in a house next to the house where the horse is kept.
      next_to(house(_, _, _, _, horse, _), house(_, _, _, _, _, kools), Houses),
      % H13: The Lucky Strike smoker drinks orange juice.
      member(house(_, _, _, orange_juice, _, lucky_strike), Houses),
      % H14: The Japanese smokes Parliaments.
      member(house(_, _, japanese, _, _, parliaments), Houses),
      % (H09 is built in: Milk is drunk in the middle house, i.e. house3.)
      
      % Finally, find out:
      % Q1: Who drinks water?
      member(house(_, _, WaterDrinker, water, _, _), Houses),
      % Q2: Who owns the zebra?
      member(house(_, _, ZebraOwner, _, zebra, _), Houses).
  
  right_of(Right, Left, Houses) :-
      nextto(Left, Right, Houses).
  
  next_to(X, Y, Houses) :-
      nextto(X, Y, Houses);
      nextto(Y, X, Houses).
Seems ok to me.

   ?- solve(WaterDrinker, ZebraOwner).
   WaterDrinker = norwegian,
   ZebraOwner = japanese .

discuss

order

orbital-decay|1 year ago

That's because it uses a long CoT. The actual paper [1] [2] talks about the limitations of decoder-only transformers predicting the reply directly, although it also establishes the benefits of CoT for composition.

This is all known for a long time and makes intuitive sense - you can't squeeze more computation from it than it can provide. The authors just formally proved it (which is no small deal). And Quanta is being dramatic with conclusions and headlines, as always.

[1] https://arxiv.org/abs/2412.02975

[2] https://news.ycombinator.com/item?id=42889786

antirez|1 year ago

LLMs using CoT are also decoder-only, it's not a paradigm shift like people want to claim now to don't say they were wrong: it's still next token prediction, that is forced to explore more possibilities in the space it contains. And with R1-Zero we also know that LLMs can train themselves to do so.

teruakohatu|1 year ago

gpt-4o, asked to produce swi-prolog code, gets the same result using a very similar code. gpt4-turbo can do it with slightly less nice code. gpt-3.5-turbo struggled to get the syntax correct but I think with some better prompting could manage it.

COT is defiantly optional. Although I am sure all LLM have seen this problem explained and solved in training data.

mycall|1 year ago

This doesn't include Encoder-Decoder Transformer Fusion for machine translation, or Encoder-Only like text classification, named entity recognition or BERT.

leonidasv|1 year ago

Also, notice that the original study is from 2023.

echelon|1 year ago

The LLM doesn't understand it's doing this, though. It pattern matched against your "steering" in a way that generalized. And it didn't hallucinate in this particular case. That's still cherry picking, and you wouldn't trust this to turn a $500k screw.

I feel like we're at 2004 Darpa Grand Challenge level, but we're nowhere near solving all of the issues required to run this on public streets. It's impressive, but leaves an enormous amount to be desired.

I think we'll get there, but I don't think it'll be in just a few short years. The companies hyping that this accelerated timeline is just around the corner are doing so out of existential need to keep the funding flowing.

simonw|1 year ago

Solving it with Prolog is neat, and a very realistic way of how LLMs with tools should be expected to handle this kind of thing.

EdwardDiego|1 year ago

I would've been very surprised if Prolog to solve this wasn't something that the model had already ingested.

Early AI hype cycles, after all, is where Prolog, like Lisp, shone.

lsy|1 year ago

If the LLM’s user indicates that the input can and should be translated as a logic problem, and then the user runs that definition in an external Prolog solver, what’s the LLM really doing here? Probabilistically mapping a logic problem to Prolog? That’s not quite the LLM solving the problem.

xyzzy123|1 year ago

Do you feel differently if it runs the prolog in a tool call?

baq|1 year ago

But the problem is solved. Depends what you care about.

endofreach|1 year ago

Psst, don't tell my clients that it's not actually me but the languages syntax i use, that's solving their problem.

choeger|1 year ago

So you asked an LLM to translate. It excells in translation. But ask it to solve and it will, inevitably, fail. But that's also expected.

The interesting question is: Given a C compiler and the problem, could an LLM come up with something like Prolog on its own?

charlieyu1|1 year ago

I think it could even solve, these kinds of riddles are heavily trained

intended|1 year ago

Science is not in the proving of it.

It’s in the disproving of it, and in the finding of the terms that help others understand the limits.

I dont know why it took me so long to come to that sentence. Yes, everyone can trot out their core examples that reinforce the point.

The research is motivated by these examples in the first place.

Agraillo|1 year ago

Good point. LLMs can be treated as "theories" and then they definitely meet falsifiability [1] allowing researchers finding "black swans" for years to come. Theories in this case can be different. But if the theory is of logical or symbolic solver then Wolfram's Mathematica may be struggle with understanding the human language as an input, but when evaluating the results, well, I think Stephen (Wolfram) can sleep soundly, at least for now

[1] https://en.wikipedia.org/wiki/Falsifiability

est|1 year ago

I'd say not only LLM stuggle with these kind of problems, 99% of humans do.

tuatoru|1 year ago

    solve (make me a sandwich)
Moravec's Paradox is still a thing.

AtlasBarfed|1 year ago

Can it port sed to java? I just tried to do that in chatgippity and it failed