top | item 32907901

(no title)

fzzt | 3 years ago

The prospect of the images getting "structurally" garbled in unpredictable ways would probably limit real-world applications: https://miro.medium.com/max/4800/1*RCG7lcPNGAUnpkeSsYGGbg.pn...

There's something to be said about compression algorithms being predictable, deterministic, and only capable of introducing defects that stand out as compression artifacts.

Plus, decoding performance and power consumption matters, especially on mobile devices (which also happens be the setting where bandwidth gains are most meaningful).

discuss

kevincox|3 years ago

While that is kind of true it is also sort of the point.

The optimal lossy compression algorithm would be based on humans as a target. it would remove details that we wouldn't notice to reduce the target size. If you show me a photo of a face in front of some grass the optimal solution would likely be to reproduce that face in high detail but replace the grass with "stock imagery".

I guess it comes down to what is important. In the past algorithms were focused on visual perception, but maybe we are getting so good at convincingly removing unnecessary detail that we need to spend more time teaching the compressor what details are important. For example if I know the person in the grass preserving the face is important. If I don't know them then it could be replaced by a stock face as well. Maybe the optimal compression of a crowd of people is the 2 faces of people I know preserved accurately and the rest replaced with "stock" faces.

anilakar|3 years ago

Remember the Xerox scan-to-email scandal in which tiling compression was replacing numbers in structural drawings? We're talking about similar repercussions here.

behnamoh|3 years ago

This reminds me of a question I have about SD: why can’t it do a simple OCR to know those are characters not random shapes? It’s baffling that neither SD nor DE2 have any understanding of the content they produce.

Xcelerate|3 years ago

You could certainly apply a “duct tape” solution like that, but the issue is that neural networks were developed to replace what were previously entire solutions built on a “duct tape” collection of rule-based approaches (see the early attempts at image recognition). So it would be nice to solve the problem in a more general way.

nl|3 years ago

> why can’t it do a simple OCR to know those are characters not random shapes?

It's pretty easy to add this if you wanted to.

But a better method would be to fine tune on a bunch of machine-generated images of words if you want your model to be good at generating characters. You'll need to consider which of the many Unicode character sets you want your model to specialize in though.

cma|3 years ago

With compression you often make a prediction then delta off of it. A structurally garbled one could be discarded or just result in a worse baseline for the delta.

montebicyclelo|3 years ago

Just a note that stable diffusion is/can be deterministic (if set an rng seed).

shrx|3 years ago

I was told (on the Unstable Diffusion discord, so this info might not be reliable) that even with using the same seed the results will differ if the model is running on a different GPU. This was also my experience when I couldn't reproduce the results generated by the discord's SD txt2img generating bot.