top | item 41915735

Show HN: Steiner – An open-source reasoning model inspired by OpenAI o1

83 points| peakji | 1 year ago |medium.com

Steiner is a series of reasoning models trained on synthetic data using reinforcement learning. These models can explore multiple reasoning paths in an autoregressive manner during inference and autonomously verify or backtrack when necessary, enabling a linear traversal of the implicit search tree.

Blog: https://medium.com/@peakji/a-small-step-towards-reproducing-...

Hugging Face: https://huggingface.co/collections/peakji/steiner-preview-67...

19 comments

order

nxobject|1 year ago

As someone without specific background in the subfield (I do embedded programming) – thanks for spelling out what people "in the know" seem to understand about o1's functioning!

zby|1 year ago

Can it be mixed with the sampling based approaches from optillm (https://github.com/codelion/optillm)?

peakji|1 year ago

Approaches like best of n sampling and majority voting are definitely feasible. But I don't recommend trying things related to CoT, as it might interfere with the internalized reasoning patterns.

nwnwhwje|1 year ago

Silly question time.

Is this a fined tuned LLM, for example drop in replacement for Llama etc.

Or is it some algorithm on top of an LLM, doing some chain of reasoning?

peakji|1 year ago

It is an LLM fine-tuned using a new type of dataset and RL reward. It's good at reasoning, but I would not recommend to replace Llama for general tasks.

Mr_Bees69|1 year ago

Really hope this goes somewhere, o1 without openai's costs and restrictions would be sweet.

peakji|1 year ago

The model can already answer some tricky questions that other models (including GPT-4o) have failed to address, achieving a +5.56 improvement on the GPQA-Diamond dataset. Unfortunately, it has not yet managed to reproduce inference-time scaling. I will continue to explore different approaches!

ActorNightly|1 year ago

OpenAIs o1 isnt really going that far though. Its definitelly better in some areas, but not overall better.

Im wondering if we can abstract chain of thought further down into the computation levels to replace a lot of matrix multiply. Like smaller transformers with less parameters and more selection of which transformer to use through search.