top | item 44849695

(no title)

cle | 6 months ago

I just tried this in Claude Code. I made an MCP server whose tool output is declared as an integer but it returns a string at runtime.

Claude Code validated the response against the schema and did not pass the response to the LLM.

     test - test_tool (MCP)(input: "foo")
      ⎿  Error: Output validation error: 'bar' is not of type 'integer'

discuss

whoknowsidont|6 months ago

How many times does this need to be repeated.

It works in this instance. On this run. It is not guaranteed to work next time. There is a error percentage here that makes it _INEVITABLE_ that eventually, with enough executions, the validation will pass when it should fail.

It will choose not to pass this to the validator, at some point in the future. It will create its own validator, at some point in the future. It will simply pretend like it did any of the above, at some point in the future.

This might be fine for your B2B use case. It is not fine for underlying infrastructure for a financial firm or communications.

cle|6 months ago

Every time the LLM uses this tool, the response schema is validated--deterministically. The LLM will never see a non-integer value as output from the tool.

ohdeargodno|6 months ago

This time.

Can you guarantee it will validate it every time ? Can you guarantee the way MCPs/tool calling are implemented (which is already an incredible joke that only python brained developers would inflict upon the world) will always go through the validation layer, are you even sure of what part of Claude handles this validation ? Sure, it didn't cast an int into a Toyota Yaris. Will it cast "70Y074" into one ? Maybe a 2022 one. What if there are embedded parsing rules into a string, will it respect it every time ? What if you use it outside of Claude Code, but just ask nicely through the API, can you guarantee this validation still works ? Or that they won't break it next week ?

The whole point of it is, whichever LLM you're using is already too dumb to not trip when lacing its own shoes. Why you'd trust it to reliably and properly parse input badly described by a terrible format is beyond me.

dragonwriter|6 months ago

> Can you guarantee it will validate it every time ?

Yes, to the extent you can guarantee the behavior of third party software, you can (which you can't really guarantee no matter what spec the software supposedly implements, so the gaps aren't an MCP issue), because “the app enforces schema compliance before handing the results to the LLM” is deterministic behavior in the traditional app that provides the toolchain that provides the interface between tools (and the user) and the LLM, not non-deterministic behavior driven by the LLM. Hence, “before handing the results to the LLM”.

> The whole point of it is, whichever LLM you're using is already too dumb to not trip when lacing its own shoes. Why you'd trust it to reliably and properly parse input badly described by a terrible format is beyond me.

The toolchain is parsing, validating, and mapping the data into the format preferred by the chosen models promot template, the LLM has nothing to do with doing that, because that by definition has to happen before it can see the data.

You aren't trusting the LLM.

cle|6 months ago

This is deterministic, it is validating the response using a JSON Schema validator and refusing to pass it to an LLM inference.

I can't gaurantee that behavior will remain the same more than any other software. But all this happens before the LLM is even involved.

You are describing why MCP supports JSON Schema. It requires parsing & validating the input using deterministic software, not LLMs.

thwarted|6 months ago

As an example.

"1979010112345" is a unix timestamp that looks like it might be Jan 1 1979 datetime formatted as an integer, but is really Sep 17 2032 05:01:52.