I recently started diving into LLMs a few weeks ago, and one thing that immediately caught me off guard was how little standardization there is across all the various pieces you would use to build a chat stack.
Want to swap out your client for a different one? Good luck - it probably expects a completely different schema. Trying a new model? Hope you're ready to deal with a different chat template. It felt like every layer had its own way of doing things, which made understanding the flow pretty frustrating for a noobie.
So I sketched out a diagram that maps out what (rough) schema is being used at each step of the process - from the initial request all the way through Ollama and an MCP server with OpenAI-compatible endpoints showing what transformations occur where.
Have you tried BAML? We use it to manage APIs and clients, as well as prompts and types. It gives great low level control over your prompts and logic, but acts as a nice standardisation later.
I found this really helpful. I've read a few different bits around this area, and being able to quickly click and scroll around this has confirmed my understanding of it now - thanks!
I thought it funny to think how this is all to give the impression to the user that the AI, for example, _knows_ the weather. The AI doesn't: it's just getting it from a weather API and wrapping some text around it.
Now, imagine being given a requirement 5 years ago like: "When the user asks, we need to be able to show them the weather from this API, and wrap some text around it". Imagine something like your diagram came back as the proposed the solution:| Not at all a criticism of any of your stuff, but it blows my mind how tech develops.
I think it's interesting and odd that tool calling took the form of this gnarly json blob. I much prefer the NexusRaven[1] style where you provide python function stubs with docstrings and get back python function invocations with the arguments populated. Of course I don't really understand why MCP is popular over REST or CLI, either.
The actual API call is still going to be JSON. How do you deal with that? Pack your Python function definitions into an array of huge opaque strings? And who would want to write a parser for that?
Easy. The LLM is never making MCP calls. The LLM simply identifies an endpoint it thinks would be useful and provides the required request parameters (like the text to be searched for or processed). As far as an LLM is concerned, MCP calls are handled “client-side” (from its perspective). This is why you configure MCP servers in your client and not on the server. (Yes, some providers allow you to configure MCP servers, but that is just a layer between you and the LLM and not a feature of the LLM itself.
So back to the credentials, that means that the credentials are managed “client-side” and the LLM never needs to see any of that. Think of it like this, say you set up an MCP url (my-mcp.com); the LLM knows nothing of this url, or what MCP server you use. So if instead you called my-mcp.com/<some-long-string>/, the LLM still doesn’t know. Now, instead of a URL parameter, your tool calls the MCP with a header (Bearer: <token>), the LLM still doesn’t know and you’ve accessed an OAUTH endpoint.
I know I'm a bit late. But for MCP servers running over HTTP/custom it should use OAuth 2.0. If it's served via stdout, it should use configuration files and/or environment variables.
One of the many issues with Spring is that abstractions it provides are extremely leaky [1]. It leaks frequently and when it does, an engineer is faced with the need to comprehend a pile of technology[2] that was supposed to be abstracted away in the first place.
[+] [-] _moog|7 months ago|reply
Want to swap out your client for a different one? Good luck - it probably expects a completely different schema. Trying a new model? Hope you're ready to deal with a different chat template. It felt like every layer had its own way of doing things, which made understanding the flow pretty frustrating for a noobie.
So I sketched out a diagram that maps out what (rough) schema is being used at each step of the process - from the initial request all the way through Ollama and an MCP server with OpenAI-compatible endpoints showing what transformations occur where.
Figured I'd share it as it may help someone else.
https://moog.sh/posts/openai_ollama_mcp_flow.html
Somewhat ironically, Claude built the JS hooks for my SVG with about five minutes of prompting.
[+] [-] youdont|7 months ago|reply
[+] [-] 1dom|7 months ago|reply
I thought it funny to think how this is all to give the impression to the user that the AI, for example, _knows_ the weather. The AI doesn't: it's just getting it from a weather API and wrapping some text around it.
Now, imagine being given a requirement 5 years ago like: "When the user asks, we need to be able to show them the weather from this API, and wrap some text around it". Imagine something like your diagram came back as the proposed the solution:| Not at all a criticism of any of your stuff, but it blows my mind how tech develops.
[+] [-] nimchimpsky|7 months ago|reply
[deleted]
[+] [-] upghost|7 months ago|reply
[1]: https://github.com/nexusflowai/NexusRaven-V2
[+] [-] max-privatevoid|7 months ago|reply
[+] [-] unknown|7 months ago|reply
[deleted]
[+] [-] rapidaneurism|7 months ago|reply
[+] [-] therealpygon|7 months ago|reply
So back to the credentials, that means that the credentials are managed “client-side” and the LLM never needs to see any of that. Think of it like this, say you set up an MCP url (my-mcp.com); the LLM knows nothing of this url, or what MCP server you use. So if instead you called my-mcp.com/<some-long-string>/, the LLM still doesn’t know. Now, instead of a URL parameter, your tool calls the MCP with a header (Bearer: <token>), the LLM still doesn’t know and you’ve accessed an OAUTH endpoint.
[+] [-] asabla|7 months ago|reply
ref: https://modelcontextprotocol.io/specification/2025-03-26/bas...
[+] [-] theblazehen|7 months ago|reply
[+] [-] nullorempty|7 months ago|reply
[+] [-] sorokod|7 months ago|reply
One of the many issues with Spring is that abstractions it provides are extremely leaky [1]. It leaks frequently and when it does, an engineer is faced with the need to comprehend a pile of technology[2] that was supposed to be abstracted away in the first place.
- [1] https://en.wikipedia.org/wiki/Leaky_abstraction
- [2] https://github.com/spring-projects/spring-ai
[+] [-] greenchair|7 months ago|reply
[+] [-] esafak|7 months ago|reply