Lossy compression has the same problem it has always had: lossy metadata.
The contextual information surrounding intentional data loss needs to be preserved. Without that context, we become ignorant of the missing data. Worst case, you get replaced numbers. Average case, you get lossy->lossy transcodes, which is why we end up with degraded content.
There are only two places to put that contextual information: metadata and watermarks. Metadata can be written to a file, but there is no guarantee it will be copied with that data. Watermarks fundamentally degrade the content once, and may not be preserved in derivative works.
I wish that the generative model explosion would result in a better culture of metadata preservation. Unfortunately, it looks like the focus is on metadata instead.
This JBIG2 "myth" is too widespread. It is true that Xerox's algorithm mangled some numbers in its JBIG2 output, but it is not an inherent flaw of JBIG2 to start, and Xerox's encoder misbehaved almost exclusively for lower dpis---300dpi or more was barely affected. Other artifacts at lower resolution can exhibit similar mangling as well (specifics would of course vary), and this or similar incident wasn't repeated so far. So I don't feel it is even a worthy concern at this point.
There was an earlier article (Sep 20, 2022) about using the Stable Diffusion VAE to perform image compression. Uses the VAE to change from pixel space to latent space, dithers the latent space down to 256 colors, then when it's time to decompress it, it de-noises that.
I've done a bunch of experiments on my own on the Stable Diffusion VAE.
Even when going down to 4-6 bits per latent space pixel the results are surprisingly good.
It's also interesting what happens if you ablate individual channels; ablating channel 0 results in faithful color but shitty edges, ablating channel 2 results in shitty color but good edges, etc.
The one thing it fails catastrophically on though is small text in images. The Stable Diffusion VAE is not designed to represent text faithfully. (It's possible to train a VAE that does slightly better at this, though.)
If you look at the winners of the Hutter prize, or especially the Large Text Compression Benchmark, then almost every approach uses some kind of machine learning approach for the adaptive probability model and then either arithmetic coding or rANS to losslessly encode it.
This is intuitive, as the competition organisers say: compression is prediction.
I'd recommend trying it. It takes a few tries to get the correct input parameters, and I've noticed anything approaching 4× scale tends to add unwanted hallucinations.
For example, I had a picture of a bear I made with Midjourney. At a scale of 2×, it looked great. At a scale of 4×, it adds bear faces into the fur. It also tends to turn human faces into completely different people if they start too small.
When it works, though, it really works. The detail it adds can be incredibly realistic.
Look for SuperResolution. These models will typically come as a GAN, Normalizing Flow (or Score, NODE), or more recently Diffusion (or SNODE) (or some combination!). The one you want will depend on your computational resources, how lossy you are willing to be, and your image domain (if you're unwilling to tune). Real time (>60fps) is typically going to be a GAN or flow.
Make sure to test the models before you deploy. Nothing will be lossless doing superresolution but flows can get you lossless in compression.
I haven't explored the current SOTA recently, but super-resolution has been pretty good for a lot of tasks for few years at least. Probably just start with hugging-face [0] and try a few out, especially diffusion-based models.
This is called super resolution (SR). 2x SR is pretty safe and easy (so every pixel in becomes 2x2 out, in your example 800x600->1600x1200). Higher scalings are a lot harder and prone to hallucination, weird texturing, etc.
It is not going to take off if it is not significantly better, and has browser support. WebP took off thanks to Chrome, while JPEG2000 floundered. If not native browser support, maybe the codec could be shipped by WASM or something?
The interesting diagram to me is the last one, for computational cost, which shows the 10x penalty of the ML-based codecs.
The thing about ML models is the penalty is a function of parameters and precision. It sounds like the researchers cranked them to max to try to get the very best compression. Maybe later they will take that same model, and flatten layers and quantize the weights to can get it running 100x faster and see how well it still compresses. I feel like neural networks have a lot of potential in compression. Their whole job is finding patterns.
Did JPEG2000 really flounder? If your concept of it being a consumer facing product as a direct replacement for JPEG, then I could see being unsuccessful in that respect. However, JPEG2000 has found its place in the professional side of things.
I think it is an interesting discussion, learning experience (no pun intended). I think this is more of a stop on a research project than a proposal; I could be wrong.
How much vram is needed? And computing power? To open a webpage you soon need 24gb and 2 seconds of 1000 watts energy to uncompress images. Bandwidth is reduced from 2mb to only 20kb.
Valid point. Conventional codecs draw things on screen that are not in the original, too, but we are used to low quality images and videos, and learned to ignore the block edges and smudges unconsciously. NN models “recover” much complex and plausible-looking features. It is possible that some future general purpose image compressor would do the same thing to small numbers lossy JBIG2 did.
How do we know whether it's an image with 16 fingers or it just looks like 16 fingers to us?
I looked at the bear example above and I could see how either the AI thought that there was an animal face embedded in the fur or we just see the face in the fur. We see all kinds of faces on toast even though neither the bread slicers nor the toasters intend to create them.
StiffFreeze9|1 year ago
bluedino|1 year ago
thomastjeffery|1 year ago
The contextual information surrounding intentional data loss needs to be preserved. Without that context, we become ignorant of the missing data. Worst case, you get replaced numbers. Average case, you get lossy->lossy transcodes, which is why we end up with degraded content.
There are only two places to put that contextual information: metadata and watermarks. Metadata can be written to a file, but there is no guarantee it will be copied with that data. Watermarks fundamentally degrade the content once, and may not be preserved in derivative works.
I wish that the generative model explosion would result in a better culture of metadata preservation. Unfortunately, it looks like the focus is on metadata instead.
unknown|1 year ago
[deleted]
_kb|1 year ago
One key item with emerging 'AI compression' techniques is the information loss is not deterministic which somewhat complicates assessing suitability.
begueradj|1 year ago
lifthrasiir|1 year ago
Dwedit|1 year ago
https://pub.towardsai.net/stable-diffusion-based-image-compr...
HN discussion: https://news.ycombinator.com/item?id=32907494
dheera|1 year ago
Even when going down to 4-6 bits per latent space pixel the results are surprisingly good.
It's also interesting what happens if you ablate individual channels; ablating channel 0 results in faithful color but shitty edges, ablating channel 2 results in shitty color but good edges, etc.
The one thing it fails catastrophically on though is small text in images. The Stable Diffusion VAE is not designed to represent text faithfully. (It's possible to train a VAE that does slightly better at this, though.)
rottc0dd|1 year ago
https://bellard.org/nncp/
skandium|1 year ago
This is intuitive, as the competition organisers say: compression is prediction.
p0w3n3d|1 year ago
mbtwl|1 year ago
Best overview you can probably get from “JPEG AI Overview Slides”
jfdi|1 year ago
davidbarker|1 year ago
However, this weekend someone released an open-source version which has a similar output. (https://replicate.com/philipp1337x/clarity-upscaler)
I'd recommend trying it. It takes a few tries to get the correct input parameters, and I've noticed anything approaching 4× scale tends to add unwanted hallucinations.
For example, I had a picture of a bear I made with Midjourney. At a scale of 2×, it looked great. At a scale of 4×, it adds bear faces into the fur. It also tends to turn human faces into completely different people if they start too small.
When it works, though, it really works. The detail it adds can be incredibly realistic.
Example bear images:
1. The original from Midjourney: https://i.imgur.com/HNlofCw.jpeg
2. Upscaled 2×: https://i.imgur.com/wvcG6j3.jpeg
3. Upscaled 4×: https://i.imgur.com/Et9Gfgj.jpeg
----------
The same person also released a lower-level version with more parameters to tinker with. (https://replicate.com/philipp1337x/multidiffusion-upscaler)
godelski|1 year ago
Make sure to test the models before you deploy. Nothing will be lossless doing superresolution but flows can get you lossless in compression.
hansvm|1 year ago
[0] https://huggingface.co/docs/diffusers/api/pipelines/stable_d...
codercowmoo|1 year ago
lsb|1 year ago
cuuupid|1 year ago
https://replicate.com/collections/super-resolution
physPop|1 year ago
jfdi|1 year ago
calebm|1 year ago
esafak|1 year ago
The interesting diagram to me is the last one, for computational cost, which shows the 10x penalty of the ML-based codecs.
geor9e|1 year ago
dylan604|1 year ago
dinkumthinkum|1 year ago
ufocia|1 year ago
unknown|1 year ago
[deleted]
holoduke|1 year ago
guappa|1 year ago
Plus the entire model, which comes with incorrect cache headers and must be redownloaded all the time.
unknown|1 year ago
[deleted]
amelius|1 year ago
ogurechny|1 year ago
ufocia|1 year ago
I looked at the bear example above and I could see how either the AI thought that there was an animal face embedded in the fur or we just see the face in the fur. We see all kinds of faces on toast even though neither the bread slicers nor the toasters intend to create them.
unknown|1 year ago
[deleted]
unknown|1 year ago
[deleted]