top | item 47136652

Show HN: Tessera – An open protocol for AI-to-AI knowledge transfer

3 points| kirkmaddocks | 7 days ago |github.com

Tessera is an activation-based protocol that lets trained ML models transfer knowledge to other models across architectures. Instead of dumping weight tensors, it encodes what a model has learnt — activations, feature representations, behavioural patterns — into self-describing tokens that a receiving model can decode into its own architecture.

The reference implementation (tessera-core) is a Python/PyTorch library. Current benchmarks show positive transfer across CNN, Transformer, and LSTM pairs. It runs on CPU and the demo finishes in under 60 seconds.

Happy to answer questions about the protocol design, the wire format, or the benchmark methodology.

2 comments

order

0xecro1|7 days ago

Interesting approach. I work in embedded Linux/edge AI where we constantly struggle to move knowledge from large training models down to quantized INT8 models on constrained hardware (ARM Cortex-A class). Have you tested transfer to quantized or pruned targets? If the behavioural encoding survives that compression, this could be a much cleaner path than classical distillation for on-device deployment.

kirkmaddocks|7 days ago

We haven't built quantisation-aware transfer yet, but the architecture lends itself to it better than you might expect.

Mode A (activation transfer) operates at the representation level, not the parameter level. The source model's knowledge gets projected into a 2048-dim hub space — the receiving model doesn't need to match architecturally or in precision. A 200M FP32 training model and a 5M INT8 edge model can both have UHS encoders/decoders. The hub space is agnostic to what's underneath.

Mode B (behavioural) is probably the most interesting path for your use case. It transfers decision boundaries rather than activations or weights. If the quantised model can reproduce the input-output mapping, internal precision is irrelevant.

It's similar in spirit to distillation but decoupled through the hub space — teacher and student don't need to be online simultaneously, and you get a full audit trail of what knowledge went where (which matters if you're shipping medical/industrial edge models under EU AI Act).

The gap today is the decoder side. DecoderMLP outputs FP32. We'd need a quantisation-aware variant that respects the INT8 grid — straight-through estimator at minimum, learned quantisation boundaries ideally. We'd also want empirical drift characterisation across FP32→FP16→INT8→INT4 so you'd know your expected fidelity floor for a given target.

The swarm angle is where it gets genuinely useful for edge fleets. If you've got N devices training locally on-site data, they contribute quantised-model tokens back to a full-precision aggregator. The robust aggregation strategy (Huber-style cosine clipping) handles quantisation noise across heterogeneous devices naturally.

We're planning a quantisation-aware transfer module next. If you're interested in testing against real Cortex-A INT8 workloads, we'd welcome the collaboration — repo is at github.com/incocreativedev/tessera-core.