top | item 37077046

(no title)

Not directly related to the post, but still feels somewhat relevant.

Back in March I used a bit more elaborate multi-step prompts for GPT3.5 to generate amusing pictures and published a gallery [1]. However, I eventually reached a point where changing prompts did not consistently improve the final results. At the end of the day, the quality of images are only as good as the training dataset, and GPT is a black box.

For something different, to test whether it is possible to "compress" visual content specifically for GPT, I ran another experiment. SVG, being a verbose format, takes time to generate a detailed image, and it also becomes expensive over time. I translated a subset of SVG elements into Forth words [2], which has a nice synergy with GPT tokens--this allowed me to progressively render pictures and produce smaller outputs without sacrificing much in quality.

Finally, I training my own GPT2-like model on the QuickDraw dataset [3]. It's not surprising that a sequence transformer can be trained to produce coherent brush strokes and recognizable images as long as there is a way to translate a graphical content into a sequence of tokens. That said, I found myself with more questions than I started, and trying other ideas now.

[1] https://drawmeasheep.net/pages/about.html

[2] https://drawmeasheep.net/pages/gpt-forth.html

[3] https://drawmeasheep.net/pages/nn-training.html

discuss

No comments yet.