So, basically, this is the thing in a crime detective movie where the forensic analyst is looking at a terrible pixelated surveillance camera still and says "enhance," and the computer magically increases the resolution to reveal the culprit's face.
Just another entry on the "things that are supposed to be impossible that convolutional nets can do now."
yup, to certain point! there are information theoretic limits though. You can fill in information, but there will be biases to a certain point. in this case defined by the dataset. if the "enhance" is too strong, we should be careful with what we do with the results in forensics.
but man, it can make your internet pics look smooth! :)
thanks for the comment!
I think it's always problematic to compare to images upscaled via nearest-neighbor. The big pixels are hard to parse for our brain, we detect all the blocky edges.
A good content unaware upscaling would be nice (one of the default photoshop algos)
I also wonder what they used for the downscaling. I see 4x4 pixel blocks, but also some with 3px or 7px lengths.
The pic with the boat on page 13 is interesting. In the SRGAN version I would take the shore for some sort of cliff, while the original shows separated boulders.
I'm not familiar enough with the field to understand how the "neutral net" part feeds in, other than to do parallel computation on the x-pos, y-pos, (RGB) color-type-intensity tensor interpolated/weighted into a larger/finer tensor.
(linear algebra speak for upscaling my old DVD to HD, that sort of thing)
At the risk of exposing my ignorance, this has nothing to do with "AI", right? It's "just" parallel computation?
yeah, no AI. Its low level computer vision. There is no implicit understanding of the scene to enhance it here. We show the neural nets several examples of low and high quality images it learns a function that makes the low quality looks more like the high quality.
this may make you feel disappointed now, but in the write up we are also pitching this same module to be used in generative networks and other models that do build an understanding of the scene. Lets see what the community (and ourselves) can do next...
It seems that this subpixel convolution layer is equivalent to what is known in the neural net community as the "deconvolution layer" but it is much more memory and computation efficient. The interlacing rainbow picture was a bit hard to understand until I read this https://export.arxiv.org/ftp/arxiv/papers/1609/1609.07009.pd...
I'm not sure, but there seems to be something wonky in the input images. They are very blocky, so I thought that they would be just pixel doubled (or quadrupled) from low-res pictures, but the blockiness lacks the regularity I'd expect from pixel-doubled images.
Super wonky indeed. Also it should compare to something like photoshops bicubic enlargement or the original size, because the brain gets stuck on the pixel edges.
The explanation in the README of the github project is excellent and well-written! Here's a really great set of animations by Vincent Dumoulin on how various conv operators work: https://github.com/vdumoulin/conv_arithmetic
This is impressive! But, I'll be really impressed once this 'new thing' brings us roto masks in motion. That is, isolating objects from background on a movie with pixel-perfect accuracy. It will also make a lot of people out of job and a lot of people happy at the same time.
The problem with subpixel images is that there are RBG and GBR monitors. Not only that, there are horizontal and vertical variations. And there's no way to tell which one the user is using on the web. And that's not even counting all the mobile number like pentile.
It's still useful though, browsers, for instance, could use it for displaying downscaled images.
This project is using 'subpixel' not to refer to monitor subpixels, but instead, lost information between existing pixels in an image.
You're right though, and that's why chroma hinting for subpixel AA has fallen out of favor. It also doesn't work on mobile where the screen can be rotated from RGB-horz to RGB-vert at a moment's notice. This was changed for ClearType in Windows 8 (DirectWrite never did chroma hinting).
this is supposed to be used in the data processing step. you load your image from jpeg or your video using ffmpeg, enhance the images and then pass it to the next step where color rendering is done. you can do that in the browser or mobile just as fine.
[+] [-] maxander|9 years ago|reply
Just another entry on the "things that are supposed to be impossible that convolutional nets can do now."
[+] [-] Eliezer|9 years ago|reply
[+] [-] eejr|9 years ago|reply
but man, it can make your internet pics look smooth! :) thanks for the comment!
[+] [-] dankohn1|9 years ago|reply
(Created by the super talented duncanrobson)
[+] [-] duaneb|9 years ago|reply
[+] [-] joelthelion|9 years ago|reply
[+] [-] anotheryou|9 years ago|reply
A good content unaware upscaling would be nice (one of the default photoshop algos)
I also wonder what they used for the downscaling. I see 4x4 pixel blocks, but also some with 3px or 7px lengths.
This looks pixely and is supposed to be a source file?: https://raw.githubusercontent.com/Tetrachrome/subpixel/d2e28...
[+] [-] anotheryou|9 years ago|reply
https://arxiv.org/abs/1609.04802
The pic with the boat on page 13 is interesting. In the SRGAN version I would take the shore for some sort of cliff, while the original shows separated boulders.
[+] [-] Roboprog|9 years ago|reply
I'm not familiar enough with the field to understand how the "neutral net" part feeds in, other than to do parallel computation on the x-pos, y-pos, (RGB) color-type-intensity tensor interpolated/weighted into a larger/finer tensor.
(linear algebra speak for upscaling my old DVD to HD, that sort of thing)
At the risk of exposing my ignorance, this has nothing to do with "AI", right? It's "just" parallel computation?
[+] [-] eejr|9 years ago|reply
this may make you feel disappointed now, but in the write up we are also pitching this same module to be used in generative networks and other models that do build an understanding of the scene. Lets see what the community (and ourselves) can do next...
[+] [-] tree_of_item|9 years ago|reply
[+] [-] unknown|9 years ago|reply
[deleted]
[+] [-] markisus|9 years ago|reply
[+] [-] amelius|9 years ago|reply
[1] http://waifu2x.udp.jp/
[+] [-] unknown|9 years ago|reply
[deleted]
[+] [-] zokier|9 years ago|reply
How were the input images prepared?
[+] [-] anotheryou|9 years ago|reply
[+] [-] ericjang|9 years ago|reply
[+] [-] transcranial|9 years ago|reply
[+] [-] Keyframe|9 years ago|reply
[+] [-] trobertson|9 years ago|reply
[+] [-] robertkrahn01|9 years ago|reply
[+] [-] thoreauway|9 years ago|reply
[+] [-] ct520|9 years ago|reply
[+] [-] imaginenore|9 years ago|reply
It's still useful though, browsers, for instance, could use it for displaying downscaled images.
[+] [-] mappu|9 years ago|reply
You're right though, and that's why chroma hinting for subpixel AA has fallen out of favor. It also doesn't work on mobile where the screen can be rotated from RGB-horz to RGB-vert at a moment's notice. This was changed for ClearType in Windows 8 (DirectWrite never did chroma hinting).
[+] [-] eejr|9 years ago|reply