Show HN: Gambit, an open-source agent harness for building reliable AI agents
91 points| randall | 1 month ago |github.com
Wanted to show our open source agent harness called Gambit.
If you’re not familiar, agent harnesses are sort of like an operating system for an agent... they handle tool calling, planning, context window management, and don’t require as much developer orchestration.
Normally you might see an agent orchestration framework pipeline like:
compute -> compute -> compute -> LLM -> compute -> compute -> LLM
we invert this so with an agent harness, it’s more like:
LLM -> LLM -> LLM -> compute -> LLM -> LLM -> compute -> LLM
Essentially you describe each agent in either a self contained markdown file, or as a typescript program. Your root agent can bring in other agents as needed, and we create a typesafe way for you to define the interfaces between those agents. We call these decks.
Agents can call agents, and each agent can be designed with whatever model params make sense for your task.
Additionally, each step of the chain gets automatic evals, we call graders. A grader is another deck type… but it’s designed to evaluate and score conversations (or individual conversation turns).
We also have test agents you can define on a deck-by-deck basis, that are designed to mimic scenarios your agent would face and generate synthetic data for either humans or graders to grade.
Prior to Gambit, we had built an LLM based video editor, and we weren’t happy with the results, which is what brought us down this path of improving inference time LLM quality.
We know it’s missing some obvious parts, but we wanted to get this out there to see how it could help people or start conversations. We’re really happy with how it’s working with some of our early design partners, and we think it’s a way to implement a lot of interesting applications:
- Truly open source agents and assistants, where logic, code, and prompts can be easily shared with the community.
- Rubric based grading to guarantee you (for instance) don’t leak PII accidentally
- Spin up a usable bot in minutes and have Codex or Claude Code use our command line runner / graders to build a first version that is pretty good w/ very little human intervention.
We’ll be around if ya’ll have any questions or thoughts. Thanks for checking us out!
Walkthrough video: https://youtu.be/J_hQ2L_yy60
elgrantomate|1 month ago
You have some great working examples, but, for example: translate_text specifies the default language in three places: the card, the input schema, and the deck. This can't be necessary; I'll experiment, but shouldn't it just be defined in one place?
The descriptive language of the project is a bit dense for me too. I'm having a hard time figuring out how to do basic things like parameters -- let's say that I want to constrain summarize_text to a certain length... I've tried to write language in the cards/decks, but the model doesn't seem to be paying attention.
I also want to be able to load a file, e.g. not just "translate 'hello my friend' to Italian" but "translate '/test/hello_my_friend.txt' to Italian" and have it load the contents of the file as input text. How do I do that?
Super cool project!
randall|1 month ago
you can set up really complex validation.
thanks for checking it out!!
niyikiza|1 month ago
One thing I've been thinking about is that schema validation catches "is this data shaped correctly?" but not "is this action permitted given who initiated the request?" When you have deck → child deck → grandchild deck chains, a prompt injection at any level could trigger actions the root caller never intended.
I've been working on offline capability verification for this using cryptographically signed warrants that attenuate as they propagate down the call chain. Curious if you've thought about that layer, or if you're relying on the model to self-police tool selection?
randall|1 month ago
1/ crypto signing is totally the right way to think about this. 2/ I'm limiting prompt injection by using chain of command: https://model-spec.openai.com/2025-12-18.html#chain_of_comma...
we have a "gambit_init" tool call that is synthetically injected into every call which has the context. Because it's the result of a tool call, it gets injected into layer 6 of the chain of command, so it's less likely to be subject to prompt injections.
Also, relatedly, yes i have thought EXTREMELY deeply about cryptographic primitives to replace HTTP with peer-to-peer webs of trust as the primary units of compute and information.
Imagine being able to authenticate the source of an image using "private blockchains" ala holepunch's hypercore.
iainctduncan|1 month ago
Trufa|1 month ago
How would it compare?
randall|1 month ago
I look at Gambit as more of an "agent harness", meaning you're building agents that can decide what to do more than you're orchestrating pipelines.
Basically, if we're successful, you should be able to chain agents together to accomplish things extremely simply (using markdown). Mastra, as far as I'm aware, is focused on helping people use programming languages (typescript) to build pipelines and workflows.
So yes it's an alternative, but more like an alternative approach rather than a direct competitor if that makes sense.
yencabulator|1 month ago
That does not sound like a "guarantee", at all.
randall|1 month ago
benban|1 month ago
curious how you're handling context lifetimes when agents call other agents. do you drop context between calls or is there a way to bound it? that's been the trickiest part for us.
randall|1 month ago
thinking about ways to deal with that but we haven’t yet done it.
elgrantomate|1 month ago
randall|1 month ago
tomhow|1 month ago
[see https://news.ycombinator.com/item?id=45988611 for explanation]
unknown|1 month ago
[deleted]
franciscomello|1 month ago
unknown|1 month ago
[deleted]
sofdao|1 month ago
are things like file system baked in?
fan of the design of the system. looks great architecturally
alberson|1 month ago
pych|1 month ago
randall|1 month ago
Agent_Builder|1 month ago
[deleted]
randall|1 month ago
hard to explain… we’ll keep going.
Agent_Builder|1 month ago
[deleted]
brap|1 month ago
My philosophy is make the LLMs do as little work as possible. Only small, simple steps. Anything that can be reasonably done in code (orchestration, tool calls, etc) should be done in code. Basically any time you find yourself instructing an LLM to follow a certain recipe, just break it down to multiple agents and do what you can with code.
salesplay|1 month ago
[deleted]