(no title)
COAGULOPATH | 1 year ago
You will not unlock "o1-like" reasoning by making a model think step by step. This is an old trick that people were using on GPT3 in 2020. If it were that simple, it wouldn't have taken OpenAI so long to release it.
Additionally, some of the prompt seems counterproductive:
>Be aware of your limitations as an llm and what you can and cannot do.
The LLM doesn't have a good idea of its limitations (any more than humans do). I expect this will create false refusals, as the model becomes overcautious.
anshumankmr|1 year ago
Can it not be trained to do so? From my anecdotal observations, the knowledge cutoff is one thing that LLMs are really well trained to know about. Those are limitations that LLMs are currently well trained to handle. Why can it not be trained to know that it is quite frequently bad at math, it may produce sometimes inaccurate code etc.
For humans also, some people know some things are just not their cup of tea. Sure there are times people may have half baked knowledge about things but one can tell if they are good at XYZ things, and not so much at other things.
fudged71|1 year ago
regularfry|1 year ago
whimsicalism|1 year ago
alignment is a tough problem and aligning long reasoning sequences to correct answer is also a tough problem. collecting high quality CoT from experts is another tough problem. they started this project in october, more than plausible it could take this time
TrapLord_Rhodo|1 year ago
Meganet|1 year ago
A LLM has a huge amount of data ingested. It can create character profiles, audience, personas etc.
Why wouldn't it have potentially even learned to 'understand' what 'being aware of your limitations' means?
Right now for me 'change of reasoning' feels a little bit of quering the existing meta space through the reasoning process to adjust weights. Basically priming the model.
I would also not just call it a 'trick'. This looks simple, weird or whatnot but i do believe that this is part of AI thinking process research.
Its a good question though what did they train? New Architecture? More parameters? Is this training a mix of experiments they did? Some auto optimization mechanism?
Hugsun|1 year ago