top | item 34816918

The dangers behind image resizing (2021)

306 points| qwertyforce | 3 years ago |zuru.tech | reply

97 comments

order
[+] planede|3 years ago|reply
Problems with image resizing is a much deeper rabbit hole than this. Some important talking points:

1. The form of interpolation (this article).

2. The colorspace used for doing the arithmetic for interpolation. You most likely want a linear colorspace here.

3. Clipping. Resizing is typically done in two phases, once resizing in x then in y direction, not necessarily in this order. If the kernel used has values outside of the range [0, 1] (like Lanczos) and for intermediate results you only capture the range [0,1], then you might get clipping in the intermediate image, which can cause artifacts.

4. Quantization and dithering.

5. If you have an alpha channel, using pre-multiplied alpha for interpolation arithmetic.

I'm not trying to be exhaustive here. ImageWorsener's page has a nice reading list[1].

[1] https://entropymine.com/imageworsener/

[+] PaulHoule|3 years ago|reply
I've definitely learned a lot about these problems from the viewpoint of art and graphic design. When using Pillow I convert to linear light with high dynamic range and work in that space.

One pet peeve of mine is algorithms for making thumbnails, most of the algorithms from the image processing book don't really apply as they are usually trying to interpolate between points based on a small neighborhood whereas if you are downscaling by a large factor (say 10) the obvious thing to do is sample the pixels in the input image that intersect with the pixel in the output image (100 in that case.)

That box averaging is a pretty expensive convolution so most libraries usually downscale images by powers of 2 and then interpolate from the closest such image which I think is not quite perfect and I think you could do better.

[+] phkahler|3 years ago|reply
Yeah I was shocked at how naive this quote is:

>> The definition of scaling function is mathematical and should never be a function of the library being used.

I could just as easily say "hey, why is you NN affected by image artifacts, isn't it supposed to be robust?"

[+] ChrisMarshallNY|3 years ago|reply
> 3. Clipping. Resizing is typically done in two phases, once resizing in x then in y direction, not necessarily in this order. If the kernel used has values outside of the range [0, 1] (like Lanczos) and for intermediate results you only capture the range [0,1], then you might get clipping in the intermediate image, which can cause artifacts.

Also, gamut clipping and interpolation[0]. That's a real rabbithole.

[0] https://www.cis.rit.edu/people/faculty/montag/PDFs/057.PDF (Downloads a PDF)

[+] actionfromafar|3 years ago|reply
Wow, points 2, 3 and 5 wouldn't have occured to me even if I tried. Thanks. I now have a mental note to look stuff up if my resizing ever gives results I'm not happy with. :)
[+] abainbridge|3 years ago|reply
I'd also add speed to that list. Resizing is an expensive operation. Correctness is often traded off for speed. I've written code that deliberately ignored the conversation to a linear color space and back in order to gain speed.
[+] SuchAnonMuchWow|3 years ago|reply
A connected rabbit hole is image decoding of lossy format such as jpeg: from my experience depending on the library used (opencv vs tensorflow vs pillow) you get rgb values that varies between 1-2% of each others with default decoders.
[+] BlueTemplar|3 years ago|reply
And also (for humans at least) the rabbit hole coming from effectively displaying the resulting image : various forms of subpixel rendering for screens, various forms of printing... which are likely to have a big influence on what is "acceptable quality" or not.
[+] guruparan18|3 years ago|reply
Another thing I had experienced before was a document picture I used after downsizing to mandatory upload size had a character/number randomly changed (6 to b or d). Don't remember which exactly and had to convert the doc to PDF that managed it better.
[+] peepee1982|3 years ago|reply
Wouldn't the clipping be solved by using floating point numbers during the filtering process?
[+] contravariant|3 years ago|reply
If you're doing interpolation you probably don't want a linear colourspace. At least not linear in the way that light works. Interpolation minimizes deviations in the colourspace you're in, so you want it to be somewhat perceptual to get it right.

Of course if you're not interpolating but downscaling the image (which isn't really an interpolation, the value at a particular position in the image does not remain the same) then you do want a linear colourspace to avoid brightening / darkening details, but you need a perceptual colourspace to minimize ringing etc. It's an interesting puzzle.

[+] version_five|3 years ago|reply
I'd argue that if your ML model is sensitive to the anti-aliasing filter used in image resizing, you've got bigger problems than that. Unless it's actually making a visible change that spoils whatever it is the model supposed to be looking for. To use the standard cat / dog example, filter choice or resampling choice is not going to change what you've got a picture of, and if your model is classifying based in features that change with resampling, it's not trustworthy.

If one is concerned about this, one could intentionally vary the resampling or deliberately add different blurring filters during training to make the model robust to these variations

[+] hprotagonist|3 years ago|reply
> I'd argue that if your ML model is sensitive to the anti-aliasing filter used in image resizing, you've got bigger problems than that.

I’ve seen it cause trouble in every model architecture i’ve tried.

[+] derefr|3 years ago|reply
You say that “if your model is classifying based in features that change with resampling, it’s not trustworthy.”

I say that choice of resampling algorithm is what determines whether a model can learn the rule “zebras can be recognized by their uniform-width stripes” or not; as a bad resample will result in non-uniform-width stripes (or, at sufficiently small scales, loss of stripes!)

[+] brucethemoose2|3 years ago|reply
For those going down this rabbit hole, perceptual downscaling is state of the art, and the closest thing we have to a Python implementation is here (with a citation of the original paper): https://github.com/WolframRhodium/muvsfunc/blob/master/muvsf...

Other supposedly better CUDA/ML filters give me strange results.

[+] thrdbndndn|3 years ago|reply
There are so many gems in VapourSynth scene.

I really wish there are some better general-purpose imaging libraries that steadily implement/copy these useful filters, so that more people can use them out of the box.

Most of languages I've involved are surprisingly lacking in this regard despite their huge potential use cases.

Like, in case of Python, Pillow is fine but it has nothing fancy. You can't even fine-tune parameters of bicubic, let alone billions of new algorithms from video communities.

OpenCV or ML tools like to re-invent the wheels themselves, but often only the most basic ones (and badly as noted in this article).

[+] anotheryou|3 years ago|reply
Hm, any examples of that?

I found https://dl.acm.org/doi/10.1145/2766891 but I don't like the comparisons. Any designer will tell you, after down-scaling you do a minimal sharpening pass. The "perceptual downscaling" looks slightly over-sharpened to me.

I'd love to compare something I sharpened in photoshop with these results.

[+] account42|3 years ago|reply
> The definition of scaling function is mathematical and should never be a function of the library being used.

Horseshit. Image resizing or any other kind of resampling is essentially always about filling in missing information. The is no mathematical model that will tell you for certain what the missing information is.

[+] planede|3 years ago|reply
Arguably downscaling does not fill in missing information, it only throws away information. Still, implementations vary a lot here. There might not be a consensus of a unique correct way to do downscaling, but there are certain things that you certainly don't want to do. Like doing naive linear arithmetic on sRGB color values.
[+] actionfromafar|3 years ago|reply
The article talks about downsampling, not upsampling, just so we are clear about that.

And besides, a ranty blog post pointing out pitfall can still be useful for someone else coming from the same naïve (in a good/neutral way) place as the author.

[+] jcynix|3 years ago|reply
Now that's an interesting topic for photographers who like to experiment with anamorphic lenses for panoramas.

An anamorphic lens (optically) "squeezes" the image onto the sensor, and afterwards the digital image has to be "desqueezed" (i.e. upscaled in one axis) to give you the "final" image. Which in turn is downscaled to be viewed on either a monitor or a printout.

But the resulting images I've seen until now nevertheless look good. I think that's because in natural images you have not that many pixel-level details. And we mostly see downscaled images on the web or in youtube videos most of the time ...

[+] thrdbndndn|3 years ago|reply
I'm shocked. I don't even know this is a thing.

By that I mean, I know what bilinear/bicubic/lanczos resizing algorithms are, and I know they should at least have acceptable results (compared to NN).

But I don't know famous libraries (especially OpenCV which is a computer vision library!) could have such poor results.

Also a side note, IIRC bilinear and bicubic have constants in the equation. So technically when you're comparing different implementations you need to make sure this input (parameters) is the same. But this shouldn't excuse the extreme poor results in some.

[+] NohatCoder|3 years ago|reply
At least bilinear and bicubic have a widely agreed upon specific definition. The poor results are the result of that definition. They work reasonably for upscaling, but downscaling more than a trivial amount causes them to weigh a few input pixels highly and outright ignore most of the rest.
[+] pacaro|3 years ago|reply
I've seen more than one team find that reimplementing an OpenCV capability that they use gain them both in quality and performance.

This isn't necessarily a criticism of OpenCV, often the OpenCV implementation is, of necessity, quite general, and a specific use-case can engage optimizations not available in the general case

[+] godshatter|3 years ago|reply
If their worry is the differences between algorithms in libraries in different execution environments, shouldn't they either find a library they like that can be called from all such environments or if they can't find one or there is no single library that can be used in all environments then shouldn't they just write their own using their favorite algorithm? Why make all libraries do this the same way? Which one is undeniably correct?
[+] TechBro8615|3 years ago|reply
That's basically what they did, which they mention in the last paragraph of the article. They released a wrapper library [0] for Pillow so that it can be called from C++:

> Since we noticed that the most correct behavior is given by the Pillow resize and we are interested in deploying our applications in C++, it could be useful to use it in C++. The Pillow image processing algorithms are almost all written in C, but they cannot be directly used because they are designed to be part of the Python wrapper. We, therefore, released a porting of the resize method in a new standalone library that works on cv::Mat so it would be compatible with all OpenCV algorithms. You can find the library here: pillow-resize.

[0] https://github.com/zurutech/pillow-resize

[+] JackFr|3 years ago|reply
Hmmm. With respect to feeding an ML system, are visual glitches and artifacts important? Wouldn't the most important thing to use a transformation which preserves as much information as possible and captures relevant structure? If the intermediate picture doesn't look great, who cares if the result is good.

Ooops. Just thought about generative systems. Nevermind.

[+] brucethemoose2|3 years ago|reply
Just speaking from experience, GAN upscalers pick up artifacts in the training dataset like a bloodhound.

You can use this to your advantage by purposely introducing them into the lowres inputs so they will be removed.

[+] IYasha|3 years ago|reply
So, what are the dangers? (what's the point of the article?) That you'll get different model with same originals processed by different algorithms?

The comparison of resizing algorithms is not something new, importance of adequate input data is obvious, difference in image processing algorithms availability is also understandable. Clickbaity.

[+] azubinski|3 years ago|reply
A friend of mine decided to take up image resizing on the third lane of a six-lane highway.

And he was hit by a truck.

So it's true about the danger of image resizing.

[+] TechBro8615|3 years ago|reply
If you read to the end, they link to a library they made for solving the problem by wrapping Pillow C functions to be callable in C++
[+] ricardobeat|3 years ago|reply
Was hoping to see libvips in the comparison, which is widely used.

I wonder why it's not adopted by any of these frameworks?

[+] intrasight|3 years ago|reply
I was sort of expecting them to describe this danger to resizing: one can feed a piece of an image into one of these new massive ML models and get back the full image - with things that you didn't want to share. Like cropping out my ex.

IS ML sort of like a universal hologram in that respect?

[+] pallas_athena|3 years ago|reply
If you upscale (with interpolation) some sensitive image (think security camera), could that be dismissed in court as it "creates" new information that wasn't there in the original image?
[+] hgomersall|3 years ago|reply
The bigger problem is that the pixel domain is not a very good domain to be operating in. How many hours and of training and thousands of images are used to essentially learn about Gabor filters.
[+] biscuits1|3 years ago|reply
This article throws a red flag on proving negative(s). This is impossible with maths. The void is filled by human subjectivity. In a graphical sense, "visual taste."
[+] mythz|3 years ago|reply
What are some good image upscaler libraries that exist? I'm assuming the high quality ones would need to use some AI model to fill in missing detail.
[+] brucethemoose2|3 years ago|reply
Depends on your needs!

Zimg is a gold standard to me, but yeah, you can get better output depending on the nature of your content and hardware. I think ESRGAN is state-of-the-art above 2x scales, with the right community model from upscale.wiki, but it is slow and artifacty. And pixel art, for instance, may look better upscaled with xBRZ.

[+] erulabs|3 years ago|reply
Image resizing is one of those things that most companies seem to build in-house over and over. There are several hosted services, but obviously sending your users photos to a 3rd party is pretty weak. For those of us looking for a middle-ground: I've had great success with imgproxy (https://github.com/imgproxy/imgproxy) which wraps libvips and well is maintained.
[+] singularity2001|3 years ago|reply
funny that they use tf and pytorch in this context without even mentioning their fantastic upsampling capabilities