(no title)
alexc05 | 10 days ago
1. Assume It's running a better model, even a dedicated coding model. High scoring but obviously not opus 4.5 2. Instead of the standard send-receive paradigm we set up a pipeline of agents, each of whom parses the output of the previous.
At 17k/tps running locally, you could effectively spin up tasks like "you are an agent who adds semicolons to the end of the line in javascript", with some sort of dedicated software in the style of claude code you could load an array of 20 agents each with a role to play in improving outpus.
take user input and gather context from codebase -> rewrite what you think the human asked you in the form of an LLM-optimized instructional prompt -> examine the prompt for uncertainties and gaps in your understanding or ability to execute -> <assume more steps as relevant> -> execute the work
Could you effectively set up something that is configurable to the individual developer - a folder of system prompts that every request loops through?
Do you really need the best model if you can pass your responses through a medium tier model that engages in rapid self improvement 30 times in a row before your claude server has returned its first shot response?
AmazingTurtle|10 days ago
rockostrich|6 days ago
dalenw|10 days ago
So in my opinion, in a scenario like this where the token output is near instant but you're running a lower tier model, good tooling can overcome the differences between a frontier cloud model.
rustyhancock|10 days ago
Basically logistically it's going to need to be in a data centre.
It's ideal for small context high throughput. Perhaps parsing huge text piles like if you had the entire Epstein files as text.
I think Claude code benefits from larger context to keep your entire project in view and deep reasoning.
What this would certainly replace is when Claude dispatched to Haiku for manual NLP tasks.
runeks|9 days ago
I wonder how you cool a 3x3cm die that outputs 2.5 kW of heat. In the article they mention that the traditional setup requires water cooling, but surely this does as well, right?