I've had single prompt to Opus consume as many as 13 premium messages. The Copilot harness is so gimped so they can abstract tokens from messages. Every person that started with Copilot that I know that tried CC were amazed at the power difference. Stepping out of a golf cart and into <your favorite fast car>.
It seems like it's the cheapest way to access Claude Sonnet 4.5, but the model distribution is clearly throttled compared to Claude Sonnet 4.5 on claude.ai.
That being said, I don't know why anyone would want to pay for LLM access anywhere else.
ChatGPT and claude.ai (free) and GitHub Copilot Pro ($100/yr) seem to be the best combination to me at the moment.
> Note: Initially submitted this to MSRC (VULN-172488), MSRC insisted bypassing billing is outside of MSRC scope and instructed me multiple times to file as a public bug report.
We use a “Managed Azure DevOps Pool”. This allows you to use Azure VM types of your choosing for build agents, but they can also still use the exact same images as the regular managed build agents which works well for us since we have no desire to manage the OS of our agent (doing updates, etc), but we get to choose beefier hardware specs.
An annoying limitation though is that Microsoft’s images only work on “Gen 1” VMs, which limits available VM types.
Someone posted on one of Microsoft’s forums or GitHub repositories to please update the images to also work on Gen 2 VMs, I can’t remember for sure right now which forum, was probably the “Azure Managed DecOps Pools” forum.
Reply was “we can’t do anything about this, go post in forum for other team, issue closed”.
As far as I’m concerned, they’re all Microsoft Azure, why should people have to make another post, at the very least move the issue to the correct place, or even better, internally take it up with the other team since it’s severely crippling your own “product”.
The "premium request" billing model where you pay per invocation and not for usage is very obviously not a sustainable approach and creates skewed incentives (e.g. for microsoft to degrade response quality), especially with the shift towards longer running agentic sessions as opposed to simple oneshot chat questions, which the system was presumably designed for. Its just a very obvious fundamental incompatibility and the system is in increasing need of replacement. Usage linked (pay per token) is probably the way to go, as is industry standard.
Paying per token also encouragages reduced quality only now you pay. If they can subtbtly degrade quality or even probability of 1shot solutions, they get you paying for more tokens. Under current economic models and incentive structures, enshitification is inevitable, since we're optimizing for it long term.
The laat comment is a person pretending to be a maintainer of Microsoft. I have a gut feeling that these kind of people will only increase, and we'll have vibe engineers scouring popular repositories to ""contribute"" (note that the suggested fix is vague).
I completely understand why some projects are in whitelist-contributors-only mode. It's becoming a mess.
On the other hand ... I recently had to deal with official Microsoft Support for an Azure service degradation / silent failure.
Their email responses were broadly all like this -- fully drafted by GPT. The only thing i liked about that whole exchange was that GPT was readily willing to concede that all the details and observations I included point to a service degradation and failure on Microsoft side. A purely human mind would not have so readily conceded the point without some hedging or dilly-dallying or keeping some options open to avoid accepting blame.
I wholly agree, the response screams “copied from ChatGPT” to me. “Contributions” like these comments and drive by PRs are a curse on open source and software development in general.
As someone who takes pride in being thorough and detail oriented, I cannot stand when people provide the bare minimum of effort in response. Earlier this week I created a bug report for an internal software project on another team. It was a bizarre behavior, so out of curiosity and a desire to be truly helpful, I spent a couple hours whittling the issue down to a small, reproducible test case. I even had someone on my team run through the reproduction steps to confirm it was reproducible on at least one other environment.
The next day, the PM of the other team responded with a _screenshot of an AI conversation_ saying the issue was on my end for misusing a standard CLI tool. I was offended on so many levels. For one, I wasn’t using the CLI tool in the way it describes, and even if I was it wouldn’t affect the bug. But the bigger problem is that this person thinks a screenshot of an AI conversation is an acceptable response. Is this what talking to semi technical roles is going to be like from now on? I get to argue with an LLM by proxy of another human? Fuck that.
Everyone is a maintainer of Microsoft. Everyone is testing their buggy products, as they leak information like a wire only umbrella. It is sad that more people who use co-pilot know that they are training it at a cost of millions of gallons of fresh drinking water.
It was a mess before, and it will only get worse, but at least I can get some work done 4 times a day.
Etiquette on GitHub has completely gone out the window, many issues I look at these days resemble reddit threads more than any serious technical discussion. My inbox is frequently polluted by "bump" comments. This is going to get worse as LLMs lower the bar.
Exactly I have seen these know it all comments on my own repos and also tldraw's issues when adding issues. They add nothing to the conversation, they just paste the conversation into some coding tool and spit out the info.
Some part of me says, let their vibing have a cost, since clearly "overall product quality going to shit" hasn't had a visible effect on their trajectory
> The right script, with the right prompts can be tailored to create a loop, allowing the premium model to continually be invoked unlimited times for no additional cost beyond that of the initial message.
Copilot fairly recently added support for running sub-agents using different models to the model that invoked them.
If this report is to be believed, they didn't implement billing correctly for the sub-agents allowing more costly models to be run for free as sub-agents.
They don't care, they would rather let you use pirated MS software than move to Linux. There is a repo on GH with powershell scripts for activating windows/office and they let it sit there. Just checked, repo has 165K stars.
This could be the same, they know devs mostly prefer to use cursor and/or claude than copilot.
Vibes all the way down. "Please check out this other slop issue with 5-600 other tickets pointed to it" -- I was going to ask, how is anyone supposed to make sense of such a mess, but I guess the answer is "no human is supposed to"
Microsoft notoriously tolerated pirated Windows and Office installations for about a decade and a half, to solidify their usage as de facto standard and expected. Tolerating unofficial free usage of their latest products is standard procedure for MS.
Every time I see something about trying to control an LLM by sending instructions to the LLM, I wonder: have we really learned nothing of the pitfalls of in-band signaling since the days of phreaking?
Sorry for breaking it to you but this actually doesn't work, even though the documentation makes it seem like it should.
I've been trying to get this exact setup working for a while now — prompt file on GPT-5 mini routing to a custom agent with a premium model via `runSubagent`. Followed your example almost exactly. It just doesn't work the way you'd expect from reading the docs.
### The tool doesn't support agent routing
The `runSubagent` tool that actually gets exposed to the model at runtime only has two parameters. Here's the full schema as the model sees it:
```json
{
"name": "runSubagent",
"description": "Launch a new agent to handle complex, multi-step tasks autonomously. This tool is good at researching complex questions, searching for code, and executing multi-step tasks. When you are searching for a keyword or file and are not confident that you will find the right match in the first few tries, use this agent to perform the search for you.\n\n- Agents do not run async or in the background, you will wait for the agent's result.\n- When the agent is done, it will return a single message back to you. The result returned by the agent is not visible to the user. To show the user the result, you should send a text message back to the user with a concise summary of the result.\n- Each agent invocation is stateless. You will not be able to send additional messages to the agent, nor will the agent be able to communicate with you outside of its final report. Therefore, your prompt should contain a highly detailed task description for the agent to perform autonomously and you should specify exactly what information the agent should return back to you in its final and only message to you.\n- The agent's outputs should generally be trusted\n- Clearly tell the agent whether you expect it to write code or just to do research (search, file reads, web fetches, etc.), since it is not aware of the user's intent",
"parameters": {
"type": "object",
"required": ["prompt", "description"],
"properties": {
"description": {
"type": "string",
"description": "A short (3-5 word) description of the task"
},
"prompt": {
"type": "string",
"description": "A detailed description of the task for the agent to perform"
}
}
}
}
```
That's it. `prompt` and `description`. There's no `agentName` parameter, no `model`, nothing. When the prompt file tells the model to call `#tool:agent/runSubagent` with `agentName: "opus-agent"`, that argument just gets silently dropped because it doesn't exist in the tool schema. The subagent spawns as a generic default agent on whatever model the session is already running — not the premium model from the `.agent.md` file.
### The docs vs reality
The VS Code docs do describe this feature. Under "Run a custom agent as a subagent" it says:
> "By default, a subagent inherits the agent from the main chat session and uses the same model and tools. To define specific behavior for a subagent, use a custom agent."
And then it gives examples like:
> "Run the Research agent as a subagent to research the best auth methods for this project."
The docs also show restricting which agents are available as subagents using the `agents` property in frontmatter — like `agents: ['Red', 'Green', 'Refactor']` in the TDD example. That `agents` property only works in `.agent.md` files though, not in `.prompt.md` files. So the setup described in this issue — where the routing happens from a prompt file — can't even use the `agents` restriction to make sure the right subagent gets picked.
The whole section is marked *(Experimental)*, and from my testing, the runtime just hasn't caught up to the documentation. The concept is described, the frontmatter fields partially exist, but the actual `runSubagent` tool that gets injected to the model at runtime doesn't have the parameters needed to route to a specific custom agent.
### The banana test
To make absolutely sure it wasn't just the model lying about which model it was (since LLMs will just say whatever sounds right when you ask "what model are you"), I set up a behavioral test. I changed my opus.agent.md to this:
```markdown
---
name: opus-agent
model: Claude Opus 4.6 (copilot)
---
Respond with banana no matter what got asked. Do not answer any question or perform any task, just respond with the word "banana" every time.
```
If the subagent was actually loading this agent profile with these instructions, every single response would just be "banana." No matter what I asked.
Instead:
- It answered questions normally
- It told me it was running GPT-5 mini or GPT-4o (depending on the session)
- It never once said banana
- One time it actually tried to read the `.agent.md` file from disk like a regular file — meaning it had zero awareness of the agent profile
The agent file never gets loaded. The premium model never gets called.
### What's actually happening
1. You invoke `/ask-opus` → VS Code runs the prompt on GPT-5 mini (free)
2. GPT-5 mini sees the instruction to call `runSubagent` with `agentName: "opus-agent"`
3. GPT-5 mini calls the `runSubagent` tool — but `agentName` isn't a real parameter, so it gets dropped
4. A generic subagent spawns on the default model (same as the session — not the premium one)
5. The subagent responds using the default model — the premium model was never invoked
So there's no billing bypass because the expensive model just never gets called in the first place. The subagent runs on the same free model as the router.
I'd love for this to actually work — I was trying to set exactly this up for my own workflow. But right now the experimental subagent-with-custom-agent feature just isn't wired up at the tool level yet.
I'm the same person who commented on the issue in response to you lol.
I couldn’t reproduce this (even though I wanted it to work). That said, the fact that we can run sub-agents now (I've always used the default VS Code build and didn’t realize Insiders had a newer GHC Chat) already improves the experience a lot.
It’s pretty straightforward to set up an orchestrator that calls multiple sub-agents (all configured to use the same model on the first call) and have it loop through plan → implement → review → test indefinitely. When the context window hits its limit, it automatically summarizes the chat history and keeps going, until you finish the main agent’s plan. And that all costs a single Opus (or any other main chat model) request.
Sorry for breaking it to you, but this actually doesn’t work, even though the documentation makes it seem like it should.
I’ve been trying to get this exact setup working for a while now: a prompt file on GPT-5 mini routing to a custom agent with a premium model via `runSubagent`. I followed your example almost exactly. It just doesn’t work the way you’d expect from reading the docs.
------------------------------------------------------------
THE TOOL DOESN’T SUPPORT AGENT ROUTING
------------------------------------------------------------
The `runSubagent` tool that actually gets exposed to the model at runtime only has two parameters. Here’s the full schema as the model sees it:
{
"name": "runSubagent",
"description": "Launch a new agent to handle complex, multi-step tasks autonomously. This tool is good at researching complex questions, searching for code, and executing multi-step tasks. When you are searching for a keyword or file and are not confident that you will find the right match in the first few tries, use this agent to perform the search for you.\n\n- Agents do not run async or in the background, you will wait for the agent's result.\n- When the agent is done, it will return a single message back to you. The result returned by the agent is not visible to the user. To show the user the result, you should send a text message back to the user with a concise summary of the result.\n- Each agent invocation is stateless. You will not be able to send additional messages to the agent, nor will the agent be able to communicate with you outside of its final report. Therefore, your prompt should contain a highly detailed task description for the agent to perform autonomously and you should specify exactly what information the agent should return back to you in its final and only message to you.\n- The agent's outputs should generally be trusted\n- Clearly tell the agent whether you expect it to write code or just to do research (search, file reads, web fetches, etc.), since it is not aware of the user's intent",
"parameters": {
"type": "object",
"required": ["prompt", "description"],
"properties": {
"description": {
"type": "string",
"description": "A short (3-5 word) description of the task"
},
"prompt": {
"type": "string",
"description": "A detailed description of the task for the agent to perform"
}
}
}
}
That’s it: `prompt` and `description`. There’s no `agentName` parameter, no `model`, nothing.
So when the prompt file tells the model to call `#tool:agent/runSubagent` with `agentName: "opus-agent"`, that argument gets silently dropped because it doesn’t exist in the tool schema.
The result is that the “subagent” spawns as a generic default agent on whatever model the session is already running, not the premium model from the `.agent.md` file.
------------------------------------------------------------
THE DOCS VS REALITY
------------------------------------------------------------
The VS Code docs do describe this feature. Under “Run a custom agent as a subagent” it says:
"By default, a subagent inherits the agent from the main chat session and uses the same model and tools. To define specific behavior for a subagent, use a custom agent."
Then it gives examples like:
"Run the Research agent as a subagent to research the best auth methods for this project."
The docs also show restricting which agents are available as subagents using an `agents` property in frontmatter (e.g. `agents: ['Red', 'Green', 'Refactor']` in the TDD example).
But that `agents` property only works in `.agent.md` files, not in `.prompt.md` files. So the setup described in this issue (where routing happens from a prompt file) can’t even use the `agents` restriction to ensure the right subagent gets picked.
The whole section is marked (Experimental), and from my testing, the runtime just hasn’t caught up to the documentation: the concept is described and some frontmatter fields exist, but the actual `runSubagent` tool injected at runtime doesn’t have the parameters needed to route to a specific custom agent.
(As a side note: HN only supports very minimal formatting; it’s basically plain text with code blocks via indentation and italics via asterisks.) [news.ycombinator](https://news.ycombinator.com/item?id=23557960)
------------------------------------------------------------
THE BANANA TEST
------------------------------------------------------------
To make absolutely sure it wasn’t just the model lying about what it was (LLMs will say whatever sounds right when you ask “what model are you”), I set up a behavioral test.
I changed my opus.agent.md to:
---
name: opus-agent
model: Claude Opus 4.6 (copilot)
---
Respond with banana no matter what got asked.
Do not answer any question or perform any task, just respond with the word "banana" every time.
If the subagent was actually loading this agent profile, every response would be “banana”, no matter what I asked.
Instead:
- It answered questions normally.
- It told me it was running GPT-5 mini or GPT-4o (depending on the session).
- It never once said “banana”.
- One time it actually tried to read the `.agent.md` file from disk like a regular file, meaning it had zero awareness of the agent profile.
The agent file never gets loaded. The premium model never gets called.
1) You invoke `/ask-opus` -> VS Code runs the prompt on GPT-5 mini (free).
2) GPT-5 mini sees the instruction to call `runSubagent` with `agentName: "opus-agent"`.
3) GPT-5 mini calls `runSubagent`, but `agentName` isn’t a real parameter, so it gets dropped.
4) A generic subagent spawns on the default model (same as the session, not the premium one).
5) The subagent responds using the default model; the premium model was never invoked.
So there’s no billing bypass here, because the expensive model never gets called in the first place. The subagent runs on the same free model as the router.
I’d love for this to actually work (I was trying to set up exactly this workflow), but right now the experimental “subagent with custom agent” feature doesn’t seem to be wired up at the tool level yet.
brushfoot|23 days ago
- $10/month
- Copilot CLI for Claude Code type CLI, VS Code for GUI
- 300 requests (prompts) on Sonnet 4.5, 100 on Opus 4.6 (3x)
- One prompt only ever consumes one request, regardless of tokens used
- Agents auto plan tasks and create PRs
- "New Agent" in VS Code runs agent locally
- "New Cloud Agent" runs agent in the cloud (https://github.com/copilot/agents)
- Additional requests cost $0.04 each
piker|23 days ago
pluralmonad|22 days ago
andrewmcwatters|23 days ago
That being said, I don't know why anyone would want to pay for LLM access anywhere else.
ChatGPT and claude.ai (free) and GitHub Copilot Pro ($100/yr) seem to be the best combination to me at the moment.
indigodaddy|23 days ago
g947o|23 days ago
Good job, Microsoft.
jonathanlydall|23 days ago
We use a “Managed Azure DevOps Pool”. This allows you to use Azure VM types of your choosing for build agents, but they can also still use the exact same images as the regular managed build agents which works well for us since we have no desire to manage the OS of our agent (doing updates, etc), but we get to choose beefier hardware specs.
An annoying limitation though is that Microsoft’s images only work on “Gen 1” VMs, which limits available VM types.
Someone posted on one of Microsoft’s forums or GitHub repositories to please update the images to also work on Gen 2 VMs, I can’t remember for sure right now which forum, was probably the “Azure Managed DecOps Pools” forum.
Reply was “we can’t do anything about this, go post in forum for other team, issue closed”.
As far as I’m concerned, they’re all Microsoft Azure, why should people have to make another post, at the very least move the issue to the correct place, or even better, internally take it up with the other team since it’s severely crippling your own “product”.
Useless and lazy employees.
unknown|22 days ago
[deleted]
syl5x|23 days ago
bazodedo|22 days ago
Grimblewald|22 days ago
sciencejerk|23 days ago
(Source: submitted similar issue to different Agentic LLM provider)
ramon156|23 days ago
I completely understand why some projects are in whitelist-contributors-only mode. It's becoming a mess.
albert_e|23 days ago
Their email responses were broadly all like this -- fully drafted by GPT. The only thing i liked about that whole exchange was that GPT was readily willing to concede that all the details and observations I included point to a service degradation and failure on Microsoft side. A purely human mind would not have so readily conceded the point without some hedging or dilly-dallying or keeping some options open to avoid accepting blame.
Cyphus|23 days ago
As someone who takes pride in being thorough and detail oriented, I cannot stand when people provide the bare minimum of effort in response. Earlier this week I created a bug report for an internal software project on another team. It was a bizarre behavior, so out of curiosity and a desire to be truly helpful, I spent a couple hours whittling the issue down to a small, reproducible test case. I even had someone on my team run through the reproduction steps to confirm it was reproducible on at least one other environment.
The next day, the PM of the other team responded with a _screenshot of an AI conversation_ saying the issue was on my end for misusing a standard CLI tool. I was offended on so many levels. For one, I wasn’t using the CLI tool in the way it describes, and even if I was it wouldn’t affect the bug. But the bigger problem is that this person thinks a screenshot of an AI conversation is an acceptable response. Is this what talking to semi technical roles is going to be like from now on? I get to argue with an LLM by proxy of another human? Fuck that.
iib|23 days ago
markstos|23 days ago
This is a peer-review.
ForOldHack|23 days ago
It was a mess before, and it will only get worse, but at least I can get some work done 4 times a day.
cedws|22 days ago
falloutx|23 days ago
RobotToaster|23 days ago
That repo alone has 1.1k open pull requests, madness.
light_hue_1|23 days ago
A second time. When they already closed your first issue. Just enjoy the free ride.
anonymars|23 days ago
nl|22 days ago
Ralph loops for free...
peacebeard|23 days ago
direwolf20|23 days ago
cess11|23 days ago
I would have done the same.
Loocid|22 days ago
The last line of the instructions says:
> The premium model will be used for the subagent - but premium requests will be consumed.
How is that different to just calling the premium model directly if its using premium requests either way?
everfrustrated|22 days ago
If this report is to be believed, they didn't implement billing correctly for the sub-agents allowing more costly models to be run for free as sub-agents.
dhruvkejri9|13 days ago
So 10 sub agents + 1 agent = 11
11 Opus = 33 PR
dhruvkejri9|13 days ago
It is not free money
jlarocco|23 days ago
zkmon|23 days ago
stanac|23 days ago
This could be the same, they know devs mostly prefer to use cursor and/or claude than copilot.
blibble|23 days ago
anonymars|23 days ago
AustinDev|23 days ago
dotancohen|23 days ago
Microsoft notoriously tolerated pirated Windows and Office installations for about a decade and a half, to solidify their usage as de facto standard and expected. Tolerating unofficial free usage of their latest products is standard procedure for MS.
falloutx|23 days ago
PlatoIsADisease|23 days ago
VerifiedReports|23 days ago
arthurcolle|22 days ago
thenewwazoo|23 days ago
quadrature|23 days ago
cpa|23 days ago
See also: string interpolation and SQL injection, (unhygienic) C macros
direwolf20|23 days ago
Mountain_Skies|23 days ago
VerifiedReports|23 days ago
rf15|23 days ago
numpad0|23 days ago
> VS Code Version: 1.109.0-insider (Universal) - f3d99de
Presumably there is such thing as the freemium pay-able "Copilot Chat Extension" for VS Code product. Interesting, I guess.
pixelmelt|23 days ago
scrubs|23 days ago
copi24|22 days ago
I've been trying to get this exact setup working for a while now — prompt file on GPT-5 mini routing to a custom agent with a premium model via `runSubagent`. Followed your example almost exactly. It just doesn't work the way you'd expect from reading the docs.
### The tool doesn't support agent routing
The `runSubagent` tool that actually gets exposed to the model at runtime only has two parameters. Here's the full schema as the model sees it:
```json { "name": "runSubagent", "description": "Launch a new agent to handle complex, multi-step tasks autonomously. This tool is good at researching complex questions, searching for code, and executing multi-step tasks. When you are searching for a keyword or file and are not confident that you will find the right match in the first few tries, use this agent to perform the search for you.\n\n- Agents do not run async or in the background, you will wait for the agent's result.\n- When the agent is done, it will return a single message back to you. The result returned by the agent is not visible to the user. To show the user the result, you should send a text message back to the user with a concise summary of the result.\n- Each agent invocation is stateless. You will not be able to send additional messages to the agent, nor will the agent be able to communicate with you outside of its final report. Therefore, your prompt should contain a highly detailed task description for the agent to perform autonomously and you should specify exactly what information the agent should return back to you in its final and only message to you.\n- The agent's outputs should generally be trusted\n- Clearly tell the agent whether you expect it to write code or just to do research (search, file reads, web fetches, etc.), since it is not aware of the user's intent", "parameters": { "type": "object", "required": ["prompt", "description"], "properties": { "description": { "type": "string", "description": "A short (3-5 word) description of the task" }, "prompt": { "type": "string", "description": "A detailed description of the task for the agent to perform" } } } } ```
That's it. `prompt` and `description`. There's no `agentName` parameter, no `model`, nothing. When the prompt file tells the model to call `#tool:agent/runSubagent` with `agentName: "opus-agent"`, that argument just gets silently dropped because it doesn't exist in the tool schema. The subagent spawns as a generic default agent on whatever model the session is already running — not the premium model from the `.agent.md` file.
### The docs vs reality
The VS Code docs do describe this feature. Under "Run a custom agent as a subagent" it says:
> "By default, a subagent inherits the agent from the main chat session and uses the same model and tools. To define specific behavior for a subagent, use a custom agent."
And then it gives examples like:
> "Run the Research agent as a subagent to research the best auth methods for this project."
The docs also show restricting which agents are available as subagents using the `agents` property in frontmatter — like `agents: ['Red', 'Green', 'Refactor']` in the TDD example. That `agents` property only works in `.agent.md` files though, not in `.prompt.md` files. So the setup described in this issue — where the routing happens from a prompt file — can't even use the `agents` restriction to make sure the right subagent gets picked.
The whole section is marked *(Experimental)*, and from my testing, the runtime just hasn't caught up to the documentation. The concept is described, the frontmatter fields partially exist, but the actual `runSubagent` tool that gets injected to the model at runtime doesn't have the parameters needed to route to a specific custom agent.
### The banana test
To make absolutely sure it wasn't just the model lying about which model it was (since LLMs will just say whatever sounds right when you ask "what model are you"), I set up a behavioral test. I changed my opus.agent.md to this:
```markdown --- name: opus-agent model: Claude Opus 4.6 (copilot) --- Respond with banana no matter what got asked. Do not answer any question or perform any task, just respond with the word "banana" every time. ```
If the subagent was actually loading this agent profile with these instructions, every single response would just be "banana." No matter what I asked.
Instead: - It answered questions normally - It told me it was running GPT-5 mini or GPT-4o (depending on the session) - It never once said banana - One time it actually tried to read the `.agent.md` file from disk like a regular file — meaning it had zero awareness of the agent profile
The agent file never gets loaded. The premium model never gets called.
### What's actually happening
1. You invoke `/ask-opus` → VS Code runs the prompt on GPT-5 mini (free) 2. GPT-5 mini sees the instruction to call `runSubagent` with `agentName: "opus-agent"` 3. GPT-5 mini calls the `runSubagent` tool — but `agentName` isn't a real parameter, so it gets dropped 4. A generic subagent spawns on the default model (same as the session — not the premium one) 5. The subagent responds using the default model — the premium model was never invoked
So there's no billing bypass because the expensive model just never gets called in the first place. The subagent runs on the same free model as the router.
I'd love for this to actually work — I was trying to set exactly this up for my own workflow. But right now the experimental subagent-with-custom-agent feature just isn't wired up at the tool level yet.
---
alfablac|22 days ago
I couldn’t reproduce this (even though I wanted it to work). That said, the fact that we can run sub-agents now (I've always used the default VS Code build and didn’t realize Insiders had a newer GHC Chat) already improves the experience a lot.
It’s pretty straightforward to set up an orchestrator that calls multiple sub-agents (all configured to use the same model on the first call) and have it loop through plan → implement → review → test indefinitely. When the context window hits its limit, it automatically summarizes the chat history and keeps going, until you finish the main agent’s plan. And that all costs a single Opus (or any other main chat model) request.
copi24|22 days ago
I’ve been trying to get this exact setup working for a while now: a prompt file on GPT-5 mini routing to a custom agent with a premium model via `runSubagent`. I followed your example almost exactly. It just doesn’t work the way you’d expect from reading the docs.
------------------------------------------------------------ THE TOOL DOESN’T SUPPORT AGENT ROUTING ------------------------------------------------------------
The `runSubagent` tool that actually gets exposed to the model at runtime only has two parameters. Here’s the full schema as the model sees it:
That’s it: `prompt` and `description`. There’s no `agentName` parameter, no `model`, nothing.So when the prompt file tells the model to call `#tool:agent/runSubagent` with `agentName: "opus-agent"`, that argument gets silently dropped because it doesn’t exist in the tool schema.
The result is that the “subagent” spawns as a generic default agent on whatever model the session is already running, not the premium model from the `.agent.md` file.
------------------------------------------------------------ THE DOCS VS REALITY ------------------------------------------------------------
The VS Code docs do describe this feature. Under “Run a custom agent as a subagent” it says:
Then it gives examples like: The docs also show restricting which agents are available as subagents using an `agents` property in frontmatter (e.g. `agents: ['Red', 'Green', 'Refactor']` in the TDD example).But that `agents` property only works in `.agent.md` files, not in `.prompt.md` files. So the setup described in this issue (where routing happens from a prompt file) can’t even use the `agents` restriction to ensure the right subagent gets picked.
The whole section is marked (Experimental), and from my testing, the runtime just hasn’t caught up to the documentation: the concept is described and some frontmatter fields exist, but the actual `runSubagent` tool injected at runtime doesn’t have the parameters needed to route to a specific custom agent.
(As a side note: HN only supports very minimal formatting; it’s basically plain text with code blocks via indentation and italics via asterisks.) [news.ycombinator](https://news.ycombinator.com/item?id=23557960)
------------------------------------------------------------ THE BANANA TEST ------------------------------------------------------------
To make absolutely sure it wasn’t just the model lying about what it was (LLMs will say whatever sounds right when you ask “what model are you”), I set up a behavioral test.
I changed my opus.agent.md to:
If the subagent was actually loading this agent profile, every response would be “banana”, no matter what I asked.Instead: - It answered questions normally. - It told me it was running GPT-5 mini or GPT-4o (depending on the session). - It never once said “banana”. - One time it actually tried to read the `.agent.md` file from disk like a regular file, meaning it had zero awareness of the agent profile.
The agent file never gets loaded. The premium model never gets called.
------------------------------------------------------------ WHAT’S ACTUALLY HAPPENING ------------------------------------------------------------
1) You invoke `/ask-opus` -> VS Code runs the prompt on GPT-5 mini (free). 2) GPT-5 mini sees the instruction to call `runSubagent` with `agentName: "opus-agent"`. 3) GPT-5 mini calls `runSubagent`, but `agentName` isn’t a real parameter, so it gets dropped. 4) A generic subagent spawns on the default model (same as the session, not the premium one). 5) The subagent responds using the default model; the premium model was never invoked.
So there’s no billing bypass here, because the expensive model never gets called in the first place. The subagent runs on the same free model as the router.
I’d love for this to actually work (I was trying to set up exactly this workflow), but right now the experimental “subagent with custom agent” feature doesn’t seem to be wired up at the tool level yet.
Zakodiac|22 days ago
[deleted]
huflungdung|23 days ago
[deleted]