The dangers behind image resizing (2021)

[+] planede|3 years ago|reply

Problems with image resizing is a much deeper rabbit hole than this. Some important talking points:

1. The form of interpolation (this article).

2. The colorspace used for doing the arithmetic for interpolation. You most likely want a linear colorspace here.

3. Clipping. Resizing is typically done in two phases, once resizing in x then in y direction, not necessarily in this order. If the kernel used has values outside of the range [0, 1] (like Lanczos) and for intermediate results you only capture the range [0,1], then you might get clipping in the intermediate image, which can cause artifacts.

4. Quantization and dithering.

5. If you have an alpha channel, using pre-multiplied alpha for interpolation arithmetic.

I'm not trying to be exhaustive here. ImageWorsener's page has a nice reading list[1].

[1] https://entropymine.com/imageworsener/

[+] PaulHoule|3 years ago|reply

I've definitely learned a lot about these problems from the viewpoint of art and graphic design. When using Pillow I convert to linear light with high dynamic range and work in that space.

One pet peeve of mine is algorithms for making thumbnails, most of the algorithms from the image processing book don't really apply as they are usually trying to interpolate between points based on a small neighborhood whereas if you are downscaling by a large factor (say 10) the obvious thing to do is sample the pixels in the input image that intersect with the pixel in the output image (100 in that case.)

That box averaging is a pretty expensive convolution so most libraries usually downscale images by powers of 2 and then interpolate from the closest such image which I think is not quite perfect and I think you could do better.

[+] phkahler|3 years ago|reply

Yeah I was shocked at how naive this quote is:

>> The definition of scaling function is mathematical and should never be a function of the library being used.

I could just as easily say "hey, why is you NN affected by image artifacts, isn't it supposed to be robust?"

[+] ChrisMarshallNY|3 years ago|reply

> 3. Clipping. Resizing is typically done in two phases, once resizing in x then in y direction, not necessarily in this order. If the kernel used has values outside of the range [0, 1] (like Lanczos) and for intermediate results you only capture the range [0,1], then you might get clipping in the intermediate image, which can cause artifacts.

Also, gamut clipping and interpolation[0]. That's a real rabbithole.

[0] https://www.cis.rit.edu/people/faculty/montag/PDFs/057.PDF (Downloads a PDF)

[+] bombcar|3 years ago|reply

Captain D on premulitplication and the alpha channel (with regards to video): https://www.youtube.com/watch?v=XobSAXZaKJ8

[+] actionfromafar|3 years ago|reply

Wow, points 2, 3 and 5 wouldn't have occured to me even if I tried. Thanks. I now have a mental note to look stuff up if my resizing ever gives results I'm not happy with. :)

[+] abainbridge|3 years ago|reply

I'd also add speed to that list. Resizing is an expensive operation. Correctness is often traded off for speed. I've written code that deliberately ignored the conversation to a linear color space and back in order to gain speed.

[+] SuchAnonMuchWow|3 years ago|reply

A connected rabbit hole is image decoding of lossy format such as jpeg: from my experience depending on the library used (opencv vs tensorflow vs pillow) you get rgb values that varies between 1-2% of each others with default decoders.

[+] BlueTemplar|3 years ago|reply

And also (for humans at least) the rabbit hole coming from effectively displaying the resulting image : various forms of subpixel rendering for screens, various forms of printing... which are likely to have a big influence on what is "acceptable quality" or not.

[+] guruparan18|3 years ago|reply

Another thing I had experienced before was a document picture I used after downsizing to mandatory upload size had a character/number randomly changed (6 to b or d). Don't remember which exactly and had to convert the doc to PDF that managed it better.

[+] peepee1982|3 years ago|reply

Wouldn't the clipping be solved by using floating point numbers during the filtering process?

[+] contravariant|3 years ago|reply

If you're doing interpolation you probably don't want a linear colourspace. At least not linear in the way that light works. Interpolation minimizes deviations in the colourspace you're in, so you want it to be somewhat perceptual to get it right.

Of course if you're not interpolating but downscaling the image (which isn't really an interpolation, the value at a particular position in the image does not remain the same) then you do want a linear colourspace to avoid brightening / darkening details, but you need a perceptual colourspace to minimize ringing etc. It's an interesting puzzle.

[+] version_five|3 years ago|reply

I'd argue that if your ML model is sensitive to the anti-aliasing filter used in image resizing, you've got bigger problems than that. Unless it's actually making a visible change that spoils whatever it is the model supposed to be looking for. To use the standard cat / dog example, filter choice or resampling choice is not going to change what you've got a picture of, and if your model is classifying based in features that change with resampling, it's not trustworthy.

If one is concerned about this, one could intentionally vary the resampling or deliberately add different blurring filters during training to make the model robust to these variations

[+] hprotagonist|3 years ago|reply

> I'd argue that if your ML model is sensitive to the anti-aliasing filter used in image resizing, you've got bigger problems than that.

I’ve seen it cause trouble in every model architecture i’ve tried.

[+] derefr|3 years ago|reply

You say that “if your model is classifying based in features that change with resampling, it’s not trustworthy.”

I say that choice of resampling algorithm is what determines whether a model can learn the rule “zebras can be recognized by their uniform-width stripes” or not; as a bad resample will result in non-uniform-width stripes (or, at sufficiently small scales, loss of stripes!)

[+] brucethemoose2|3 years ago|reply

For those going down this rabbit hole, perceptual downscaling is state of the art, and the closest thing we have to a Python implementation is here (with a citation of the original paper): https://github.com/WolframRhodium/muvsfunc/blob/master/muvsf...

Other supposedly better CUDA/ML filters give me strange results.

[+] thrdbndndn|3 years ago|reply

There are so many gems in VapourSynth scene.

I really wish there are some better general-purpose imaging libraries that steadily implement/copy these useful filters, so that more people can use them out of the box.

Most of languages I've involved are surprisingly lacking in this regard despite their huge potential use cases.

Like, in case of Python, Pillow is fine but it has nothing fancy. You can't even fine-tune parameters of bicubic, let alone billions of new algorithms from video communities.

OpenCV or ML tools like to re-invent the wheels themselves, but often only the most basic ones (and badly as noted in this article).

[+] anotheryou|3 years ago|reply

Hm, any examples of that?

I found https://dl.acm.org/doi/10.1145/2766891 but I don't like the comparisons. Any designer will tell you, after down-scaling you do a minimal sharpening pass. The "perceptual downscaling" looks slightly over-sharpened to me.

I'd love to compare something I sharpened in photoshop with these results.

[+] account42|3 years ago|reply

> The definition of scaling function is mathematical and should never be a function of the library being used.

Horseshit. Image resizing or any other kind of resampling is essentially always about filling in missing information. The is no mathematical model that will tell you for certain what the missing information is.

[+] mytailorisrich|3 years ago|reply

Not at all. He is correct that those functions are defined mathematically and that the results should therefore be the same using any libraries which claim to implement them.

An example used in the article: https://en.wikipedia.org/wiki/Lanczos_resampling

[+] planede|3 years ago|reply

Arguably downscaling does not fill in missing information, it only throws away information. Still, implementations vary a lot here. There might not be a consensus of a unique correct way to do downscaling, but there are certain things that you certainly don't want to do. Like doing naive linear arithmetic on sRGB color values.

[+] actionfromafar|3 years ago|reply

The article talks about downsampling, not upsampling, just so we are clear about that.

And besides, a ranty blog post pointing out pitfall can still be useful for someone else coming from the same naïve (in a good/neutral way) place as the author.

[+] unknown|3 years ago|reply

[deleted]

[+] jcynix|3 years ago|reply

Now that's an interesting topic for photographers who like to experiment with anamorphic lenses for panoramas.

An anamorphic lens (optically) "squeezes" the image onto the sensor, and afterwards the digital image has to be "desqueezed" (i.e. upscaled in one axis) to give you the "final" image. Which in turn is downscaled to be viewed on either a monitor or a printout.

But the resulting images I've seen until now nevertheless look good. I think that's because in natural images you have not that many pixel-level details. And we mostly see downscaled images on the web or in youtube videos most of the time ...

[+] thrdbndndn|3 years ago|reply

I'm shocked. I don't even know this is a thing.

By that I mean, I know what bilinear/bicubic/lanczos resizing algorithms are, and I know they should at least have acceptable results (compared to NN).

But I don't know famous libraries (especially OpenCV which is a computer vision library!) could have such poor results.

Also a side note, IIRC bilinear and bicubic have constants in the equation. So technically when you're comparing different implementations you need to make sure this input (parameters) is the same. But this shouldn't excuse the extreme poor results in some.

[+] NohatCoder|3 years ago|reply

At least bilinear and bicubic have a widely agreed upon specific definition. The poor results are the result of that definition. They work reasonably for upscaling, but downscaling more than a trivial amount causes them to weigh a few input pixels highly and outright ignore most of the rest.

[+] pacaro|3 years ago|reply

I've seen more than one team find that reimplementing an OpenCV capability that they use gain them both in quality and performance.

This isn't necessarily a criticism of OpenCV, often the OpenCV implementation is, of necessity, quite general, and a specific use-case can engage optimizations not available in the general case

[+] godshatter|3 years ago|reply

If their worry is the differences between algorithms in libraries in different execution environments, shouldn't they either find a library they like that can be called from all such environments or if they can't find one or there is no single library that can be used in all environments then shouldn't they just write their own using their favorite algorithm? Why make all libraries do this the same way? Which one is undeniably correct?

[+] TechBro8615|3 years ago|reply

That's basically what they did, which they mention in the last paragraph of the article. They released a wrapper library [0] for Pillow so that it can be called from C++:

> Since we noticed that the most correct behavior is given by the Pillow resize and we are interested in deploying our applications in C++, it could be useful to use it in C++. The Pillow image processing algorithms are almost all written in C, but they cannot be directly used because they are designed to be part of the Python wrapper. We, therefore, released a porting of the resize method in a new standalone library that works on cv::Mat so it would be compatible with all OpenCV algorithms. You can find the library here: pillow-resize.

[0] https://github.com/zurutech/pillow-resize

[+] JackFr|3 years ago|reply

Hmmm. With respect to feeding an ML system, are visual glitches and artifacts important? Wouldn't the most important thing to use a transformation which preserves as much information as possible and captures relevant structure? If the intermediate picture doesn't look great, who cares if the result is good.

Ooops. Just thought about generative systems. Nevermind.

[+] brucethemoose2|3 years ago|reply

Just speaking from experience, GAN upscalers pick up artifacts in the training dataset like a bloodhound.

You can use this to your advantage by purposely introducing them into the lowres inputs so they will be removed.

[+] IYasha|3 years ago|reply

So, what are the dangers? (what's the point of the article?) That you'll get different model with same originals processed by different algorithms?

The comparison of resizing algorithms is not something new, importance of adequate input data is obvious, difference in image processing algorithms availability is also understandable. Clickbaity.

[+] azubinski|3 years ago|reply

A friend of mine decided to take up image resizing on the third lane of a six-lane highway.

And he was hit by a truck.

So it's true about the danger of image resizing.

[+] TechBro8615|3 years ago|reply

If you read to the end, they link to a library they made for solving the problem by wrapping Pillow C functions to be callable in C++

[+] ricardobeat|3 years ago|reply

Was hoping to see libvips in the comparison, which is widely used.

I wonder why it's not adopted by any of these frameworks?

[+] intrasight|3 years ago|reply

I was sort of expecting them to describe this danger to resizing: one can feed a piece of an image into one of these new massive ML models and get back the full image - with things that you didn't want to share. Like cropping out my ex.

IS ML sort of like a universal hologram in that respect?

[+] pallas_athena|3 years ago|reply

If you upscale (with interpolation) some sensitive image (think security camera), could that be dismissed in court as it "creates" new information that wasn't there in the original image?

[+] hgomersall|3 years ago|reply

The bigger problem is that the pixel domain is not a very good domain to be operating in. How many hours and of training and thousands of images are used to essentially learn about Gabor filters.

[+] unknown|3 years ago|reply

[deleted]

[+] biscuits1|3 years ago|reply

This article throws a red flag on proving negative(s). This is impossible with maths. The void is filled by human subjectivity. In a graphical sense, "visual taste."

[+] mythz|3 years ago|reply

What are some good image upscaler libraries that exist? I'm assuming the high quality ones would need to use some AI model to fill in missing detail.

[+] soderfoo|3 years ago|reply

Waifu2x - I've used the library to upscale both old photos and videos with enough success to be pleased with the results.

https://github.com/nagadomi/waifu2x

[+] brucethemoose2|3 years ago|reply

Depends on your needs!

Zimg is a gold standard to me, but yeah, you can get better output depending on the nature of your content and hardware. I think ESRGAN is state-of-the-art above 2x scales, with the right community model from upscale.wiki, but it is slow and artifacty. And pixel art, for instance, may look better upscaled with xBRZ.

[+] erulabs|3 years ago|reply

Image resizing is one of those things that most companies seem to build in-house over and over. There are several hosted services, but obviously sending your users photos to a 3rd party is pretty weak. For those of us looking for a middle-ground: I've had great success with imgproxy (https://github.com/imgproxy/imgproxy) which wraps libvips and well is maintained.

[+] singularity2001|3 years ago|reply

funny that they use tf and pytorch in this context without even mentioning their fantastic upsampling capabilities

[+] est|3 years ago|reply

Is there any hacks/study to maximize the downsampling errors?

E.g. looks totally different on original vs 224x224 pictures

[+] version_five|3 years ago|reply

There is a "resizing attack" that's been published that does what you're suggesting

https://embracethered.com/blog/posts/2020/husky-ai-image-res...

[+] WithinReason|3 years ago|reply

torch.nn.functional.interpolate has an "antialias" switch that's off by default

[+] qwertyforce|3 years ago|reply

It seems it was introduced after 1.9.0

https://pytorch.org/docs/1.9.0/generated/torch.nn.functional...

97 comments