top | item 3934919

Video Stabilization on YouTube

196 points| Garbage | 14 years ago |googleresearch.blogspot.in

48 comments

order
[+] ajb|14 years ago|reply
Seems to work well- I just tried it on a tiger cub vid, which I've had for years and was taken with an el-cheapo digital camera: Shaky version: http://youtu.be/g5P9WwHdOSI Stabilised version http://youtu.be/wXfbRUk_1Bg

There is some motion blur (around 24 seconds, when the momma tiger lies down) which is a bit puzzling once the image has been stabilised. But hey, it's better than it was before.

[+] drewyeaton|14 years ago|reply
The motion blur was there in the first place, you just don't notice it because there's also _motion_. When you remove the motion (shaking, jerking the camera), you're left with just the blur.
[+] CoffeeAndCoffee|14 years ago|reply
It also looks like the stabilized video is magnified a little bit. Regardless, these tools are really neat and they are still in their infancy. It's good to see that GooTube still believes in user-generated content.
[+] apu|14 years ago|reply
As mentioned in the blogpost, the rolling-shutter version of this won the best paper prize at the International Conference on Computation Photography (ICCP), which was held last weekend in Seattle. This is a fairly new but very high quality conference. In many respects, I prefer it to the standard-bearing vision conferences like CVPR, ICCV, or ECCV -- although of course, ICCP is more narrowly focused on computational imaging and photography applications.

In their talk, the authors of this work showed many more video results and they were all quite impressive. In fact, they were good enough to fall into an "uncanny valley of motion", similar to the "uncanny valley" of faces or humans [1] that most people are familiar with. I.e., the motion correction was almost perfect, but just enough off that something felt vaguely surreal about the results. Nevertheless, it's a nice step forward.

Also, as others have pointed out, this is a fully uncalibrated method -- requiring no knowledge of how the video was captured. If you do have some knowledge, then you can often exploit it to do better. But the authors mentioned that most videos uploaded to youtube have either no calibration information, or if present, it's often incorrect. As such, it made sense for them to focus on the uncalibrated case.

Finally, I should point out that rolling shutter, standard on most mobile cameras, is causing all sorts of problems for traditional image and video analysis algorithms, which often make the assumption, sometimes implicitly, that the entire frame was captured at a single instance in time. This is not true anymore, and can lead to gross errors in many methods. Hence the many recent papers on correcting for, and in some cases exploiting, rolling shutter effects.

A recent interesting work along these lines from my former lab at Columbia University is "coded rolling shutter photography: flexible space-time photography" [2]. This paper takes advantage of the fact that different rows in an image are seeing the world at slightly different instances in time to do things like high-speed photography, HDR imaging, etc.

[1] http://en.wikipedia.org/wiki/Uncanny_valley

[2] http://www.cs.columbia.edu/CAVE/projects/crsp/

[+] elithrar|14 years ago|reply
> Finally, I should point out that rolling shutter, standard on most mobile cameras, is causing all sorts of problems for traditional image and video analysis algorithms, which often make the assumption, sometimes implicitly, that the entire frame was captured at a single instance in time. This is not true anymore, and can lead to gross errors in many methods. Hence the many recent papers on correcting for, and in some cases exploiting, rolling shutter effects.

This is not just true of mobile phones, but of any current CMOS-sensored imaging device (most of them on the market). Compact cameras and SLR's included.

[+] lith7|14 years ago|reply
You can try something similar on linux using transcode with these two lines:

    transcode -J stabilize --mplayer_probe -i $infile
    transcode -J transform --mplayer_probe -i $infile -y xvid4 -o $outfile
I found this blog post with more info: http://kevin.deldycke.com/tag/transcode/
[+] hahainternet|14 years ago|reply
Just as a note for anyone using this, you'll want to read some of the options as well. The amount of smoothing is hard to get right automatically, and you will want different values for different effects.

Having said that, my results with this tool have been excellent in the past.

[+] vasco|14 years ago|reply
I really hope this keeps being optional because otherwise a lot of the authentic value of the videos will be lost. Also, the demo they showed looked like it was algorithmically deteriorated to make the change more noticeable. There's shaky hands and then there's parkinsons-level shaking which was what the demo showed...
[+] delinka|14 years ago|reply
"...algorithmically deteriorated..."

Or the person holding the camera was shaking like mad to demonstrate the abilities of the algorithm. Artistically speaking, shake removal is also authenticity removal. But most times, personal videos are shot by people with no eye for framing and stability. And most times, artistic (and professionals) videos are shot with an eye toward these things.

I can't foresee a reason not to keep this feature as an option rather than enforcing it on all uploads.

[+] sp332|14 years ago|reply
Sometimes you want authentic value, and sometimes motion sickness isn't what you were aiming for in your video. This gives video makers more tools to express their vision.

Edit: also what's new here isn't the stabilization, it's that they will fix "rolling shutter" artifacts in each frame as well. Rolling shutter is something photographers generally dislike.

[+] TazeTSchnitzel|14 years ago|reply
Of course it's optional. You can choose to apply it in the editor if you wish, and can vary how much shaking is removed.
[+] hmottestad|14 years ago|reply
If they used a telephoto lens then the shaking would be much more noticeable. Also the motion hints at someone actually attempting to correct for the movements, but because it's zoomed so far in every movement to counteract the shaking is greatly magnified.
[+] ck2|14 years ago|reply
That video demo was quite impressive.

I've used video stabilizer filters on virtualdub but I doubt they could fix as much as was done in that demo.

Another interesting stabilization demo http://www.youtube.com/watch?v=_Pr_fpbAok8

[+] nextparadigms|14 years ago|reply
I'd like to see this come in Android so you can automatically shoot and record videos without the shaking in them. It would be a great selling point in my opinion.
[+] objclxt|14 years ago|reply
You should already start to see that working its way out: the iPhone 4S does video stabilisation, I am sure high-end Android phones do or will start to do the same.

The algorithm being discussed here is specifically designed for when information about the camera or environment is not available: there are much better ways of carrying out digital image stabilisation on the device itself, such as using the accelerometer data to compensate, or in significantly advanced cameras (DSLRs, for example) compensating by moving the lens itself.

[+] hamoid|14 years ago|reply
I wonder when will we get this quality http://pages.cs.wisc.edu/~fliu/project/3dstab.htm
[+] chriszf|14 years ago|reply
If you read the Google paper, you'll notice that they actually refer to this and other work by Liu et al. The overall technique is the same, estimate the original camera path, calculate an optimal camera path, retarget the input frames to a crop window that fits the optimal path.

The primary difference seems to be estimation and calculation technique. Liu's work does a structure-from-motion reconstruction, ie: rebuild a 3d model of the original scene. Google's work uses something called pyramidal Lucas-Kanade to do 'feature tracking' instead. This is sort of localized reconstruction, it seems to only care about the viewport differences from frame to frame. They then feed it through some linear programming voodoo to get the best path.

I don't understand either well enough to say why one is better than the other, although I'd guess it's because Lucas-Kanade is temporally and physically localized, it's easier to farm out to a parallel cluster than an SfM technique.

There also seems to be a difference on the rear end of the technique, having feature detection allows them to add 'saliency' constraints, ie: retarget based on the inclusion of certain features, like a person's face. Again, the math is beyond my understanding, but it seems like this isn't part of Liu's work.

[+] modeless|14 years ago|reply
Have you tried it? I have and I'd say the quality is pretty close if not the same. The much bigger problem to solve now is that shaky videos shot in less-than-perfect lighting contain motion blur, which is extremely hard to remove. You'll notice that all of these demo videos were conveniently shot outside in direct sunlight and contain no motion blur at all.
[+] ChuckMcM|14 years ago|reply
Thanks for that link, this should have a submission of its own. I can easily see how something like this would make a 'point and shoot' video camera really useful. Think "Flip Camera meets James Cameron"
[+] MrMike|14 years ago|reply
This is amazing. Wonder if there is a commercial implementation yet?
[+] Pelayo|14 years ago|reply
They should apply this during movie fighting scenes. Then we might actually see the fight instead of the blur caused by "exciting cameras".
[+] objclxt|14 years ago|reply
Going off on a tangent here, but from the distant memory of my film studies degree days one of the reasons you get so much fast cutting and hard-to-make-out action in modern fight scenes is because choreographing and shooting a fight scene properly is hard work, particularly if your actors aren't that experienced in stage combat. It's a big cheat, designed to make shooting fight scenes much easier (this is especially true if you're shooting a fight-scene where one participant is CG'd in).
[+] modeless|14 years ago|reply
Higher frame rates would help a lot with the blur. I think Peter Jackson made a mistake shooting The Hobbit at 48 FPS for the entire movie. He should have shot most of it at the traditional 24 FPS but used 48 or 72 for fast motion shots. Hopefully his blunder won't poison high FPS forever in the minds of filmgoers.
[+] ck2|14 years ago|reply
I wish my tv/movie playback devices had a "de-lensflare" filter.
[+] rabidsnail|14 years ago|reply
Protip: add ?m=1 to the end of blogger urls to remove the pointless javascript bloat.
[+] adrianwaj|14 years ago|reply
Isn't quality inherently lost because the same video has to be reencoded again but without the shakes? Also, I just tested on a video and it looked slightly smudgy. OK, so if I am filming driving down a dirt road or after half a bottle of Jack Daniel's (or both) then it'd be good, otherwise it does more harm than good.
[+] obtu|14 years ago|reply
As I understand it the motion blur is a product of lossy compression (CCD has very short pixel-local exposure times; the shearing the article refers to appears when sweeping the whole image); which means that yes, stabilisation algorithms would work best with source data that hasn't been compressed using a perceptual model of motion blur.
[+] Too|14 years ago|reply
What's the next step? Include accelerometer data from the camera, synced with the video, to use as support for stabilizing algorithms.
[+] est|14 years ago|reply
that's a really good idea actually. Maybe the video container format could add that metadata support.
[+] tambourine_man|14 years ago|reply
Submitting video to a streaming service makes your content look better.

Shows how far we are in this whole cloud era.