I think this is the biggest flaw in LLMs and what is likely going to sour a lot of businesses on their usage (at least in their current state). It is preferable to give the right answer to a query, it is acceptable to be unable to answer a query - we run into real issues, though, when a query is confidently answered incorrectly. This recently caused a major headache for AirCanada - businesses should be held to the statements they make, even if those statements were made by an AI or call center employee.
I mean, in this context I agree. But most people doing math in high school or university are graded on their working of a problem, with the final result usually equating to a small proportion of the total marks received.
I don't know... here's a prompt query for a standard problem in introductory integral calculus, and it seems to go pretty smoothly from a discrete arithmetical series into the continuous integral:
"Consider the following word problem: "A 100 meter long chain is hanging off the end of a cliff. It weighs one metric ton. How much physical work is required to pull the chain to the top of the cliff if we discretize the problem such that one meter is pulled up at a time?" Note that the remaining chain gets lighter after each lifting step. Find the equation that describes this discrete problem and from that, generate the continuous expression and provide the Latex code for it."
It has gotten quite impressive at handling calculus word problems. GPT-4 (original) failed miserably on this problem (attempted to set it up using constant acceleration equations); GPT-4O finally gets it correct:
> I am driving a car at 65 miles per hour and release the gas pedal. The only force my car is now experiencing is air resistance, which in this problem can be assumed to be linearly proportional to my velocity.
> When my car has decelerated to 55 miles per hour, I have traveled 300 feet since I released the gas pedal.
> How much further will I travel until my car is moving at only 30 miles per hour?
Does it get the answer right every single time you ask the question the same way? If not, who cares how it’s coming to an answer, it’s not consistently correct and therefore not dependable. That’s what the article was exploring.
munk-a|1 year ago
astrange|1 year ago
Chinjut|1 year ago
bombadilo|1 year ago
photochemsyn|1 year ago
"Consider the following word problem: "A 100 meter long chain is hanging off the end of a cliff. It weighs one metric ton. How much physical work is required to pull the chain to the top of the cliff if we discretize the problem such that one meter is pulled up at a time?" Note that the remaining chain gets lighter after each lifting step. Find the equation that describes this discrete problem and from that, generate the continuous expression and provide the Latex code for it."
usaar333|1 year ago
It has gotten quite impressive at handling calculus word problems. GPT-4 (original) failed miserably on this problem (attempted to set it up using constant acceleration equations); GPT-4O finally gets it correct:
> I am driving a car at 65 miles per hour and release the gas pedal. The only force my car is now experiencing is air resistance, which in this problem can be assumed to be linearly proportional to my velocity.
> When my car has decelerated to 55 miles per hour, I have traveled 300 feet since I released the gas pedal.
> How much further will I travel until my car is moving at only 30 miles per hour?
xienze|1 year ago
sabrina_ramonov|1 year ago
unknown|1 year ago
[deleted]
HDThoreaun|1 year ago
fmbb|1 year ago
huhuhu111|1 year ago
[deleted]