(no title)
Jack000 | 2 years ago
This kind of reminds me of dalle-1 where the image is represented as 256 image tokens then generated one token at a time. That approach is the most direct way to adapt a causal-LM architecture but it clearly didn't make a lot of sense because images don't have a natural top-down-left-right order.
For vector graphics, the closest analogous concept to pixel-wise convolution would be the Minkowski sum. I wonder if a Minkowski sum-based diffusion model would work for svg images.
SerCe|2 years ago
briandw|2 years ago
Jack000|2 years ago
You could start off with a random polygon and the reverse diffusion process would slowly turn it into a text glyph.