I have a saying: "any sufficiently advanced agent is indistinguishable from a DSL"
If I'm really leaning into multi-tool use for anything resembling a mutation, then I'd like to see an execution plan first. In my experience, asking an AI to code up a script that calls some functions with the same signature as tools and then executing that script actually ends up being more accurate than asking it to internalize its algorithm. Plus, I can audit it before I run it. This is effectively the same as asking it to "think step by step."
I like the idea of Command R+ but multitool feels like barking up the wrong tree. Maybe my use cases are too myopic.
You mean manually pre-baking a DAG from the user query, then “spawning” other LLMs to resolve each node and pass their input up the graph? This is the approach we take too. It seems to be a sufficiently performant approach that is - intuitively - generically useful regardless of ontology / domain, but would love to hear others’ experiences.
It would be nice to know if this is sort of how OpenAI’s native “file_search” retriever works - that’s certainly the suggestion in some of the documentation but it hasn’t, to my knowledge, been confirmed.
> ... code up a script that calls some functions with the same signature as tools and then executing that script actually ends up being more accurate than asking it to internalize its algorithm.
This is called defunctionalization and useful without LLMs as well.
I think you are imagining a scenario where you are using the LLM manually. Tools are designed to serve as a backend for other GPT like products.
You don't have the capacity to "audit" stuff.
Furthermore tool execution occurs not in the LLM but in the code that calls the LLM through API. So whatever code executes the tool, it also orders the calling sequence graph. You don't need to audit it, you are calling it.
I have developed multiple multi-step LLM workflows, expressible as both conditional and parallel DAGs, using mostly plain Python, and I still don't understand why these langchain-type libraries feel the need to exist. Plain Python is quite sufficient for advanced LLM workflows if you know how to use it.
LLMs are innately unreliable, and they require a lot of hand-holding and prompt-tuning to get them to work well. Getting into the low-level details of the prompts is too essential. I don't want any libraries to come in the way because I have to be able to find and cleverly prevent the failure cases that happen just 1 in 500 times.
These libraries seem to mainly just advertise each other. If I am missing something, I don't know what it is.
If you wanted to compare OpenAI models against Anthropic or Google, wouldn't the framework help a lot? Breaking APIs is more about bad framework development than frameworks in general.
I think frameworks tend to provide an escape hatch. LlamaIndex comes to mind. It seems to me that by not learning and using an existing framework, you're building your own, which is a calculated tradeoff.
Always felt the same way, but could never put it in words as eloquently as you just did. Python (or any other programming language) already is the best glue. With these frameworks, you just waste brain cycles on learning APIs that change and break every couple of months.
What’s the difference between tool calling and “agents”?
How would you handle map-reduce type of tool calls where you have a lot of parallel tools that you want to merge later on? What’s a good way to scale that without running into API limits?
I know they're not considered the leader in the foundational model space, but their developer documentation is great, their api is really nice to use, and they have a set of products that really differentiate themselves from OpenAI and Anthropic and others. I'm rooting for the success of this company.
That said, we as an industry need to be moving away from langchain, not more deeply embedding ourselves in that monstrosity. It’s just way too much of its own thing now and you can totally start to see how the VC funding is shaping their incentives. They put everyone who uses it in a position of massive technical debt, create more abstractions like langgraph to lock people into their tools and then and then create paid tools on top of it to solve the problems that they created (langsmith).
It's something that we know it will backfire very spectacularly, but might happen anyway.
I could see this backfire and have a industry wide reversal to other technologies. I have a gut feeling that already happened in other areas, like companies going back to bare-metal, but it's not really the best example sine cloud was and still is the best solution for most companies.
Does anyone with more experience than me have memories of similar things happening? Where a technology was hyped and adopted anywhere until something happened that caused an industry-wide reversal to more established ways of doing things?
Sensational news: LLM can flip multiple bits at once with one request! This is so awesome. How could our CPUs ever work without LLMs built in? I bet IBM had a secret LLM whisperer in all of their mainframes. To this day.
madrox|1 year ago
If I'm really leaning into multi-tool use for anything resembling a mutation, then I'd like to see an execution plan first. In my experience, asking an AI to code up a script that calls some functions with the same signature as tools and then executing that script actually ends up being more accurate than asking it to internalize its algorithm. Plus, I can audit it before I run it. This is effectively the same as asking it to "think step by step."
I like the idea of Command R+ but multitool feels like barking up the wrong tree. Maybe my use cases are too myopic.
darkteflon|1 year ago
It would be nice to know if this is sort of how OpenAI’s native “file_search” retriever works - that’s certainly the suggestion in some of the documentation but it hasn’t, to my knowledge, been confirmed.
fzeindl|1 year ago
This is called defunctionalization and useful without LLMs as well.
TZubiri|1 year ago
You don't have the capacity to "audit" stuff.
Furthermore tool execution occurs not in the LLM but in the code that calls the LLM through API. So whatever code executes the tool, it also orders the calling sequence graph. You don't need to audit it, you are calling it.
ai4ever|1 year ago
whereas, a DSL still aims for accurate and deterministic modeling of the specific usecase.
RecycledEle|1 year ago
I don't think you mean Digital Subscriber Line, so may I ask: What is a DSL in this context?
patleeman|1 year ago
Has anyone seen anyone using this approach? Any resources available?
TZubiri|1 year ago
But Agent!=Language
Oras|1 year ago
That’s said, it’s a bit annoying to see langchain examples all over. Not everyone uses it, and many consider it bloated and hard to maintain.
Would be great just to have a simple example in Python showing the capabilities.
el-ai-ne|1 year ago
The following cookbooks contain slightly more advanced code examples, using just the cohere API for multi-step: https://docs.cohere.com/page/calendar-agent https://docs.cohere.com/page/pdf-extractor https://docs.cohere.com/page/agentic-multi-stage-rag
Cheers
OutOfHere|1 year ago
LLMs are innately unreliable, and they require a lot of hand-holding and prompt-tuning to get them to work well. Getting into the low-level details of the prompts is too essential. I don't want any libraries to come in the way because I have to be able to find and cleverly prevent the failure cases that happen just 1 in 500 times.
These libraries seem to mainly just advertise each other. If I am missing something, I don't know what it is.
etse|1 year ago
I think frameworks tend to provide an escape hatch. LlamaIndex comes to mind. It seems to me that by not learning and using an existing framework, you're building your own, which is a calculated tradeoff.
leobg|1 year ago
SeriousStorm|1 year ago
Are you just running an LLM server (Ollama, llama.cpp, etc) and then making API calls to that server with plain Python or is it more than that?
TZubiri|1 year ago
Before they were called tools they were called function calls in ChatGpt.
Before that we had response_format = "json_object"
And even before that we were prompting with function signatures and asking it to output parameters.
bhl|1 year ago
How would you handle map-reduce type of tool calls where you have a lot of parallel tools that you want to merge later on? What’s a good way to scale that without running into API limits?
cpursley|1 year ago
openmajestic|1 year ago
laborcontract|1 year ago
I know they're not considered the leader in the foundational model space, but their developer documentation is great, their api is really nice to use, and they have a set of products that really differentiate themselves from OpenAI and Anthropic and others. I'm rooting for the success of this company.
That said, we as an industry need to be moving away from langchain, not more deeply embedding ourselves in that monstrosity. It’s just way too much of its own thing now and you can totally start to see how the VC funding is shaping their incentives. They put everyone who uses it in a position of massive technical debt, create more abstractions like langgraph to lock people into their tools and then and then create paid tools on top of it to solve the problems that they created (langsmith).
mostelato|1 year ago
Check out the examples here: https://docs.cohere.com/docs/multi-step-tool-use
and this notebook https://github.com/cohere-ai/notebooks/blob/main/notebooks/a...
walterbell|1 year ago
esafak|1 year ago
politelemon|1 year ago
https://github.com/cohere-ai/notebooks/blob/main/notebooks/D...
mostelato|1 year ago
And here are the docs: https://docs.cohere.com/docs/multi-step-tool-use
danw1979|1 year ago
lnrd|1 year ago
Does anyone with more experience than me have memories of similar things happening? Where a technology was hyped and adopted anywhere until something happened that caused an industry-wide reversal to more established ways of doing things?
classified|1 year ago