(no title)
juxtaposicion | 1 year ago
Under the hood Reflection 70B seems to be a Llama-3.1 finetune that encourages the model to add <think>, <reflection> and <output> tokens and corresponding phases. This is an evolution of Chain-of-Thought's "think step by step" -- but instead of being a prompting technique, this fine-tune bakes examples of these phases more directly into the model. So the model starts with an initial draft and 'reflects' on it before issuing a final output.
The extra effort spent on tokens, which effectively let the model 'think more' appears to let it defeat prompts which other strong models (4o, 3.5 Sonnet) appear to fumble. So for example, when asked "which is greater 9.11 or 9.9" the Reflection 70b model initially gets the wrong answer, then <reflects> on it, then spits the right output.
Personally, the comparison to Claude and 4o doesn't quite seem apples-to-apples. If you were to have 4o/Claude take multiple rounds to review and reflect on their initial drafts, would we see similar gains? I suspect they would improve massively as well.
QuantumGood|1 year ago
rgbrgb|1 year ago
They may already implement this technique, we can't know.
astrange|1 year ago
kgeist|1 year ago
bluejay2387|1 year ago
I have been testing the model for the last few hours and it does seem to be an improvement on LLAMA 3.1 upon which it is based. I have not tried to compare it to Claude or GPT4o because I don't expect a 70b model to outperform models of that class no matter how good it is. I would happy to be wrong though...
cedws|1 year ago
[0]: https://news.ycombinator.com/item?id=41377042
HanClinto|1 year ago
[0]: https://github.com/ggerganov/llama.cpp/blob/master/grammars/...
praneel_08|1 year ago
niutech|1 year ago
rasz|1 year ago
chhabraamit|1 year ago
hank808|1 year ago