top | item 46089764

Show HN: Zero-power photonic language model–code

18 points| damir00 | 3 months ago |zenodo.org

The model uses a 1024-dimensional complex Hilbert space with 32 layers of programmable Mach–Zehnder meshes (Reck architecture) and derives token probabilities directly via the Born rule.

Despite using only unitary operations and no attention mechanism, a 1024×32 model achieves coherent TinyStories generation after < 1.8 hours of training on a single consumer GPU.

This is Part 1 - the next step is physical implementation with $50 of optics from AliExpress.

7 comments

tliltocatl|3 months ago

Stupid question - how is it even possible given that you lose information on each layer? And how do one implement a non-linear activation function without an amplifier of a sort?

IronyMan100|3 months ago

Normally in this kind of systems, the detection is the nonlinearity. That is, you send light through the system, the light can interfere, Changes path through the system but in the end you can detect only the intensities, |E|^2.

cpldcpu|3 months ago

"Zero power" does not include the power needed to translate information between electronic and optical domains and the light source itself.

damir00|3 months ago

Yes, correct. I will phrase this better in the future. The zero-power refers only to what is, in effect, the optical replacement for the ocean of matmul you have in standard Transformer implementation.

I apologize for not being clearer.

The goal isn't actually "zero power" - the goal is "so little heat dissipation in orbit is easy".

bastawhiz|3 months ago

This is a neat idea, but it's extremely light (no pun intended) on real details. Translating a simulation into real hardware that can do real computation in a reliable manner is properly hard. As much as I'd love to be an optimist about this project, I have to say I'll believe it when I see it actually running on a workbench.

If it does work, I think one of the biggest challenges will be adding enough complexity to it for it to do real, useful computation. Running the equivalent of GPT-2 is a cool tech demo, but if there's not an obvious path to scaling it up, it's a bit of a dead end.

damir00|3 months ago

Oh absolutely...this is kitchen-table level at this point. There is a clear path to really huge number of parameters, but a bunch of things need to be proven first. Like...can the detector meaningfully read what comes out the end of the optical chain?

I expect to have an answer this week...

ifuknowuknow|3 months ago

meds