top | item 47092371

(no title)

alexc05 | 10 days ago

If I could have one of these cards in my own computer do you think it would be possible to replace claude code?

1. Assume It's running a better model, even a dedicated coding model. High scoring but obviously not opus 4.5 2. Instead of the standard send-receive paradigm we set up a pipeline of agents, each of whom parses the output of the previous.

At 17k/tps running locally, you could effectively spin up tasks like "you are an agent who adds semicolons to the end of the line in javascript", with some sort of dedicated software in the style of claude code you could load an array of 20 agents each with a role to play in improving outpus.

take user input and gather context from codebase -> rewrite what you think the human asked you in the form of an LLM-optimized instructional prompt -> examine the prompt for uncertainties and gaps in your understanding or ability to execute -> <assume more steps as relevant> -> execute the work

Could you effectively set up something that is configurable to the individual developer - a folder of system prompts that every request loops through?

Do you really need the best model if you can pass your responses through a medium tier model that engages in rapid self improvement 30 times in a row before your claude server has returned its first shot response?

discuss

AmazingTurtle|10 days ago

Models can't improve themselves with their own (model) input, they need to be grounded in truth and reality.

rockostrich|6 days ago

But at one point the model is sufficiently large enough to accomplish any task a human could specify. For software development, I think we're pretty much at that point with the latest Anthropic/Google/OpenAI models. We have no idea where the direction of token pricing is going to go in the future, but the consensus seems to be that it will only get more expensive. If Taalas can offer the same functionality that we have with frontier models today at a 1/10 of the cost and 10x the speed then they're going to take over a large part of the market.

dalenw|10 days ago

I think so. The last few months have shown us that it isn't necessarily the models themselves that provide good results, but the tooling / harness around it. Codex, Opus, GLM 5, Kimi 2.5, etc. all each have their quirks. Use a harness like opencode and give the model the right amount of context, they'll all perform well and you'll get a correct answer every time.

So in my opinion, in a scenario like this where the token output is near instant but you're running a lower tier model, good tooling can overcome the differences between a frontier cloud model.

rustyhancock|10 days ago

It's 2.5kW so it likely won't sit in your computer (quite beyond what a desktop could provide in power alone to a single card, let alone cool). It's 8.5cm^2 which is a beast of a single die.

Basically logistically it's going to need to be in a data centre.

It's ideal for small context high throughput. Perhaps parsing huge text piles like if you had the entire Epstein files as text.

I think Claude code benefits from larger context to keep your entire project in view and deep reasoning.

What this would certainly replace is when Claude dispatched to Haiku for manual NLP tasks.

runeks|9 days ago

> It's 2.5kW so it likely won't sit in your computer (quite beyond what a desktop could provide in power alone to a single card, let alone cool). It's 8.5cm^2 which is a beast of a single die.

I wonder how you cool a 3x3cm die that outputs 2.5 kW of heat. In the article they mention that the traditional setup requires water cooling, but surely this does as well, right?