top | item 45970391

OrthoRoute – GPU-accelerated autorouting for KiCad

220 points| wanderingjew | 3 months ago |bbenchoff.github.io

29 comments

morellt|3 months ago

I would love to know the application of this ludicrous PCB, and I'd be even more interested to see the quote price

Hey, guy who made this here. This probably deserves a little explanation. First off, I'd like to tell you I'm really, really unemployed, and have the freedom to do some cool stuff. So I came up with a project idea. This is only a small part of a project I'm working on, but you'll see where this is going.

I was inspired by this video: https://www.youtube.com/watch?v=HRfbQJ6FdF0 from bitluni that's a cluster of $0.10-0.20 RISC-V microcontrollers. For ten or twenty cents, these have a lot of GPIOs compared to other extremely low-cost microcontrollers. 18 GPIOs on the CH32V006F4U6. This got me thinking, what if I built a cluster of these chips. Basically re-doing bitluni's build.

But then I started thinking, at ten cents a chip, you could scale this to thousands. But how do you connect them? That problem was already solved in the 80s, with the Connection Machine. The basic idea here is to get 2^(whatever) chips, and connect them so each chip connects to (whatever) many other chips. The Connection Machine sold this as a hypercube, but it's better described as a hamming-distance-one graph or something.

So I started building that. I did the LEDs first, just to get a handle on thousands of parts: https://x.com/ViolenceWorks/status/1987596162954903808 and started laying out the 'cards' of this thing. With a 'hypecube topology' you can split up the cube into different parts, so this thing is made of sixteen cards (2^4), with 256 chips on each card (2^8), meaning 4096 (2^12) chips in total. This requires a backplane. A huge backplane with 8196 nets. Non-trivial stuff.

So the real stumbling block for this project is the backplane, and this is basically the only way I could figure out how to build it; write an autorouter. It's a fun project that really couldn't have been done before the launch of KiCad 9; the new IPC API was a necessity to make this a reality. After that it's just some CuPy because of sparse matrices and a few blockers trying to adapt PathFinder to circuit boards.

Last week I finished up the 'cloud routing' functionality and was able to run this on an A100 80GB instance on Vast.io; the board wouldn't fit in my 16GB 5080 I used for testing. That instance took 41 hours to route the board, and now I have the result back on my main battlestation ready for the bit of hand routing that's still needed. No, it's not perfect, but it's an autorouter. It's never going to be perfect.

This was a fun project but what I really should have been doing the past three months or so is grinding leetcode. It's hard out there, and given that I've been rejected from every technician job I've applied to, I don't think this project is going to help me. Either way, this project.... is not useful. There's probably a dozen engineers out there in the world that this _could_ help.

So, while it's working for my weird project, this is really not what hiring managers want to see.

beachy|3 months ago

It would be interesting to know if the finished board would work at all, given there must be some non-zero failure rate for each via.

NoiseBert69|3 months ago

A4-sized, 32 layers.. I'd guess something around 1500€ at JLC.

RicoElectrico|3 months ago

Me too. I can't imagine a backplane where the connections would be so irregular as to require bringing out such big guns.

zeroping|3 months ago

Beautiful closing to the write-up. "Never trust the autorouter. But at least this one is fast."

X-Ryl669|3 months ago

I don't get why you process the whole connector board at once. If I understand correctly, you're connecting individual & identic boards to your board. So each connector on you giant board is actually dealing a bunch of small board right?

In that case, can't you exploit the inherent symmetry in the design here to only route a quarter of your connectors and then mirror/rotate the result for the other one? Or, if you have a X*X matrix, route one size minus the corners and replicate to the other sides?

Also, with such a huge connection board, it smells a NIH issue here. I think you'd better serialize the IO to a bus (whatever) and few lines and perform the connection in software (in a GoWin FPGA for example, both extremely cheap and quite powerful). Just think of the harness you'll need to build to fit the connectors in. The obvious routing bugs, and so on. Any maintenance will be a nightmare, if you need to swap 2 pins on a connector or re-run the routing.

wanderingjew|3 months ago

Hi, author here, for this project, the backplane is as much of the computer as the 'daughter cards'. Think of it like the wire-wrap boards of _really old_ minicomputers. I'm using the PDP straight-8 as an analogy here because that's the oldest computer I've been inside of, but the backplane connects the different daughter cards together in a way such that the backplane _is_ the computer.

As far as symmetry goes, there really isn't any. For example, Board 0 conects to 1, 2, 4, and 8. Board 1 connects to 0, 3, 5, and 9. Board 3 connections to 1, 2 , 7, and 11.

There's one way I can think of to make this routing easier. Of of the 16 daughter boards, make the pinout unique to each daughter board. If I was doing this as a product, for manufacturing, this is exactly what I would do. I'd rearrange the pins on each daughter card so it would be easier to route. The drawback of this technique is that there would be 16 different varieties of daughter cards; not economical if you're just building one of these things.

So, with those constraints the only real optimization I have left is ensuring that the existing net plan is optimal. I already did that when I generated the netlist; used simulated annealing to ensure the minimal net length for the board before I even imported it into KiCad.

And yeah, serializing the IO would be better, but even better than that would be to emulate the entire system in a giant black box of compute. But then I wouldn't have written a GPU autorouter. I'm trying not to, but there is some optimization for _cool_ here, you know?

actionfromafar|3 months ago

But then you need 4096 CPUs plus 4096 FPGAs?

bsder|3 months ago

Was there a particular reason why "Think real hard on the symmetry and write a Python script." wouldn't do the job? Or was it simply "It would be cool to abuse some GPUs"?

Either option is cool, though.

wanderingjew|3 months ago

Author here – I did do the ‘think real hard and write Python’ part, but at the netlist level. I used simulated annealing on the 8196 nets to minimize total length / crossings before importing into KiCad.

Where the GPU router comes in is the geometric part: obeying layer stack, via rules, keepouts, blind-via constraints, etc. You can absolutely hand-encode one or two nice symmetric patterns in code; this board is ‘what if we made the search space big enough that you want Dijkstra + PathFinder + sparse GPU data structures to do it for you’.

NoiseBert69|3 months ago

Being that engineer in a PCB Fab in China who has to punch all the vias by hand

heart attack

varispeed|3 months ago

Are these pushed by hand? I'd believe if you said it happens in the UK. Fabs are still stuck in the 90s.