top | item 47108975

(no title)

abrichr | 8 days ago

ChatGPT Deep Research dug through Taalas' WIPO patent filings and public reporting to piece together a hypothesis. Next Platform notes at least 14 patents filed [1]. The two most relevant:

"Large Parameter Set Computation Accelerator Using Memory with Parameter Encoding" [2]

"Mask Programmable ROM Using Shared Connections" [3]

The "single transistor multiply" could be multiplication by routing, not arithmetic. Patent [2] describes an accelerator where, if weights are 4-bit (16 possible values), you pre-compute all 16 products (input x each possible value) with a shared multiplier bank, then use a hardwired mesh to route the correct result to each weight's location. The abstract says it directly: multiplier circuits produce a set of outputs, readable cells store addresses associated with parameter values, and a selection circuit picks the right output. The per-weight "readable cell" would then just be an access transistor that passes through the right pre-computed product. If that reading is correct, it's consistent with the CEO telling EE Times compute is "fully digital" [4], and explains why 4-bit matters so much: 16 multipliers to broadcast is tractable, 256 (8-bit) is not.

The same patent reportedly describes the connectivity mesh as configurable via top metal masks, referred to as "saving the model in the mask ROM of the system." If so, the base die is identical across models, with only top metal layers changing to encode weights-as-connectivity and dataflow schedule.

Patent [3] covers high-density multibit mask ROM using shared drain and gate connections with mask-programmable vias, possibly how they hit the density for 8B parameters on one 815mm2 die.

If roughly right, some testable predictions: performance very sensitive to quantization bitwidth; near-zero external memory bandwidth dependence; fine-tuning limited to what fits in the SRAM sidecar.

Caveat: the specific implementation details beyond the abstracts are based on Deep Research's analysis of the full patent texts, not my own reading, so could be off. But the abstracts and public descriptions line up well.

[1] https://www.nextplatform.com/2026/02/19/taalas-etches-ai-mod...

[2] https://patents.google.com/patent/WO2025147771A1/en

[3] https://patents.google.com/patent/WO2025217724A1/en

[4] https://www.eetimes.com/taalas-specializes-to-extremes-for-e...

discuss

generuso|8 days ago

LSI Logic and VLSI Systems used to do such things in 1980s -- they produced a quantity of "universal" base chips, and then relatively inexpensively and quickly customized them for different uses and customers, by adding a few interconnect layers on top. Like hardwired FPGAs. Such semi-custom ASICs were much less expensive than full custom designs, and one could order them in relatively small lots.

Taalas of course builds base chips that are already closely tailored for a particular type of models. They aim to generate the final chips with the model weights baked into ROMs in two months after the weights become available. They hope that the hardware will be profitable for at least some customers, even if the model is only good enough for a year. Assuming they do get superior speed and energy efficiency, this may be a good idea.

cpldcpu|8 days ago

It could simply be bit serial. With 4 bit weights you only need four serial addition steps, which is not an issue if the weight are stored nearby in a rom.