This reminded me of ViperGPT[1] from a couple of years ago, which is similar but specific to vision language models. Both of them have a root llm which given a query produces a python program to decompose the query into separate steps, with the generated python program calling a sub model. One difference is this model has a mutable environment in the notebook, but I'm not sure how much of a meaningful difference that is.[1] https://viper.cs.columbia.edu/static/viper_paper.pdf
No comments yet.