Seems to work well- I just tried it on a tiger cub vid, which I've had for years and was taken with an el-cheapo digital camera:
Shaky version: http://youtu.be/g5P9WwHdOSI
Stabilised version http://youtu.be/wXfbRUk_1Bg
There is some motion blur (around 24 seconds, when the momma tiger lies down) which is a bit puzzling once the image has been stabilised. But hey, it's better than it was before.
The motion blur was there in the first place, you just don't notice it because there's also _motion_. When you remove the motion (shaking, jerking the camera), you're left with just the blur.
It also looks like the stabilized video is magnified a little bit. Regardless, these tools are really neat and they are still in their infancy. It's good to see that GooTube still believes in user-generated content.
As mentioned in the blogpost, the rolling-shutter version of this won the best paper prize at the International Conference on Computation Photography (ICCP), which was held last weekend in Seattle. This is a fairly new but very high quality conference. In many respects, I prefer it to the standard-bearing vision conferences like CVPR, ICCV, or ECCV -- although of course, ICCP is more narrowly focused on computational imaging and photography applications.
In their talk, the authors of this work showed many more video results and they were all quite impressive. In fact, they were good enough to fall into an "uncanny valley of motion", similar to the "uncanny valley" of faces or humans [1] that most people are familiar with. I.e., the motion correction was almost perfect, but just enough off that something felt vaguely surreal about the results. Nevertheless, it's a nice step forward.
Also, as others have pointed out, this is a fully uncalibrated method -- requiring no knowledge of how the video was captured. If you do have some knowledge, then you can often exploit it to do better. But the authors mentioned that most videos uploaded to youtube have either no calibration information, or if present, it's often incorrect. As such, it made sense for them to focus on the uncalibrated case.
Finally, I should point out that rolling shutter, standard on most mobile cameras, is causing all sorts of problems for traditional image and video analysis algorithms, which often make the assumption, sometimes implicitly, that the entire frame was captured at a single instance in time. This is not true anymore, and can lead to gross errors in many methods. Hence the many recent papers on correcting for, and in some cases exploiting, rolling shutter effects.
A recent interesting work along these lines from my former lab at Columbia University is "coded rolling shutter photography: flexible space-time photography" [2]. This paper takes advantage of the fact that different rows in an image are seeing the world at slightly different instances in time to do things like high-speed photography, HDR imaging, etc.
> Finally, I should point out that rolling shutter, standard on most mobile cameras, is causing all sorts of problems for traditional image and video analysis algorithms, which often make the assumption, sometimes implicitly, that the entire frame was captured at a single instance in time. This is not true anymore, and can lead to gross errors in many methods. Hence the many recent papers on correcting for, and in some cases exploiting, rolling shutter effects.
This is not just true of mobile phones, but of any current CMOS-sensored imaging device (most of them on the market). Compact cameras and SLR's included.
Just as a note for anyone using this, you'll want to read some of the options as well. The amount of smoothing is hard to get right automatically, and you will want different values for different effects.
Having said that, my results with this tool have been excellent in the past.
I really hope this keeps being optional because otherwise a lot of the authentic value of the videos will be lost. Also, the demo they showed looked like it was algorithmically deteriorated to make the change more noticeable. There's shaky hands and then there's parkinsons-level shaking which was what the demo showed...
Or the person holding the camera was shaking like mad to demonstrate the abilities of the algorithm. Artistically speaking, shake removal is also authenticity removal. But most times, personal videos are shot by people with no eye for framing and stability. And most times, artistic (and professionals) videos are shot with an eye toward these things.
I can't foresee a reason not to keep this feature as an option rather than enforcing it on all uploads.
Sometimes you want authentic value, and sometimes motion sickness isn't what you were aiming for in your video. This gives video makers more tools to express their vision.
Edit: also what's new here isn't the stabilization, it's that they will fix "rolling shutter" artifacts in each frame as well. Rolling shutter is something photographers generally dislike.
If they used a telephoto lens then the shaking would be much more noticeable. Also the motion hints at someone actually attempting to correct for the movements, but because it's zoomed so far in every movement to counteract the shaking is greatly magnified.
I'd like to see this come in Android so you can automatically shoot and record videos without the shaking in them. It would be a great selling point in my opinion.
You should already start to see that working its way out: the iPhone 4S does video stabilisation, I am sure high-end Android phones do or will start to do the same.
The algorithm being discussed here is specifically designed for when information about the camera or environment is not available: there are much better ways of carrying out digital image stabilisation on the device itself, such as using the accelerometer data to compensate, or in significantly advanced cameras (DSLRs, for example) compensating by moving the lens itself.
If you read the Google paper, you'll notice that they actually refer to this and other work by Liu et al. The overall technique is the same, estimate the original camera path, calculate an optimal camera path, retarget the input frames to a crop window that fits the optimal path.
The primary difference seems to be estimation and calculation technique. Liu's work does a structure-from-motion reconstruction, ie: rebuild a 3d model of the original scene. Google's work uses something called pyramidal Lucas-Kanade to do 'feature tracking' instead. This is sort of localized reconstruction, it seems to only care about the viewport differences from frame to frame. They then feed it through some linear programming voodoo to get the best path.
I don't understand either well enough to say why one is better than the other, although I'd guess it's because Lucas-Kanade is temporally and physically localized, it's easier to farm out to a parallel cluster than an SfM technique.
There also seems to be a difference on the rear end of the technique, having feature detection allows them to add 'saliency' constraints, ie: retarget based on the inclusion of certain features, like a person's face. Again, the math is beyond my understanding, but it seems like this isn't part of Liu's work.
Have you tried it? I have and I'd say the quality is pretty close if not the same. The much bigger problem to solve now is that shaky videos shot in less-than-perfect lighting contain motion blur, which is extremely hard to remove. You'll notice that all of these demo videos were conveniently shot outside in direct sunlight and contain no motion blur at all.
Thanks for that link, this should have a submission of its own. I can easily see how something like this would make a 'point and shoot' video camera really useful. Think "Flip Camera meets James Cameron"
Going off on a tangent here, but from the distant memory of my film studies degree days one of the reasons you get so much fast cutting and hard-to-make-out action in modern fight scenes is because choreographing and shooting a fight scene properly is hard work, particularly if your actors aren't that experienced in stage combat. It's a big cheat, designed to make shooting fight scenes much easier (this is especially true if you're shooting a fight-scene where one participant is CG'd in).
Higher frame rates would help a lot with the blur. I think Peter Jackson made a mistake shooting The Hobbit at 48 FPS for the entire movie. He should have shot most of it at the traditional 24 FPS but used 48 or 72 for fast motion shots. Hopefully his blunder won't poison high FPS forever in the minds of filmgoers.
Isn't quality inherently lost because the same video has to be reencoded again but without the shakes? Also, I just tested on a video and it looked slightly smudgy. OK, so if I am filming driving down a dirt road or after half a bottle of Jack Daniel's (or both) then it'd be good, otherwise it does more harm than good.
As I understand it the motion blur is a product of lossy compression (CCD has very short pixel-local exposure times; the shearing the article refers to appears when sweeping the whole image); which means that yes, stabilisation algorithms would work best with source data that hasn't been compressed using a perceptual model of motion blur.
[+] [-] ajb|14 years ago|reply
There is some motion blur (around 24 seconds, when the momma tiger lies down) which is a bit puzzling once the image has been stabilised. But hey, it's better than it was before.
[+] [-] drewyeaton|14 years ago|reply
[+] [-] CoffeeAndCoffee|14 years ago|reply
[+] [-] apu|14 years ago|reply
In their talk, the authors of this work showed many more video results and they were all quite impressive. In fact, they were good enough to fall into an "uncanny valley of motion", similar to the "uncanny valley" of faces or humans [1] that most people are familiar with. I.e., the motion correction was almost perfect, but just enough off that something felt vaguely surreal about the results. Nevertheless, it's a nice step forward.
Also, as others have pointed out, this is a fully uncalibrated method -- requiring no knowledge of how the video was captured. If you do have some knowledge, then you can often exploit it to do better. But the authors mentioned that most videos uploaded to youtube have either no calibration information, or if present, it's often incorrect. As such, it made sense for them to focus on the uncalibrated case.
Finally, I should point out that rolling shutter, standard on most mobile cameras, is causing all sorts of problems for traditional image and video analysis algorithms, which often make the assumption, sometimes implicitly, that the entire frame was captured at a single instance in time. This is not true anymore, and can lead to gross errors in many methods. Hence the many recent papers on correcting for, and in some cases exploiting, rolling shutter effects.
A recent interesting work along these lines from my former lab at Columbia University is "coded rolling shutter photography: flexible space-time photography" [2]. This paper takes advantage of the fact that different rows in an image are seeing the world at slightly different instances in time to do things like high-speed photography, HDR imaging, etc.
[1] http://en.wikipedia.org/wiki/Uncanny_valley
[2] http://www.cs.columbia.edu/CAVE/projects/crsp/
[+] [-] elithrar|14 years ago|reply
This is not just true of mobile phones, but of any current CMOS-sensored imaging device (most of them on the market). Compact cameras and SLR's included.
[+] [-] lith7|14 years ago|reply
[+] [-] hahainternet|14 years ago|reply
Having said that, my results with this tool have been excellent in the past.
[+] [-] vasco|14 years ago|reply
[+] [-] delinka|14 years ago|reply
Or the person holding the camera was shaking like mad to demonstrate the abilities of the algorithm. Artistically speaking, shake removal is also authenticity removal. But most times, personal videos are shot by people with no eye for framing and stability. And most times, artistic (and professionals) videos are shot with an eye toward these things.
I can't foresee a reason not to keep this feature as an option rather than enforcing it on all uploads.
[+] [-] sp332|14 years ago|reply
Edit: also what's new here isn't the stabilization, it's that they will fix "rolling shutter" artifacts in each frame as well. Rolling shutter is something photographers generally dislike.
[+] [-] TazeTSchnitzel|14 years ago|reply
[+] [-] hmottestad|14 years ago|reply
[+] [-] unknown|14 years ago|reply
[deleted]
[+] [-] thrownaway2424|14 years ago|reply
[+] [-] ck2|14 years ago|reply
I've used video stabilizer filters on virtualdub but I doubt they could fix as much as was done in that demo.
Another interesting stabilization demo http://www.youtube.com/watch?v=_Pr_fpbAok8
[+] [-] nextparadigms|14 years ago|reply
[+] [-] objclxt|14 years ago|reply
The algorithm being discussed here is specifically designed for when information about the camera or environment is not available: there are much better ways of carrying out digital image stabilisation on the device itself, such as using the accelerometer data to compensate, or in significantly advanced cameras (DSLRs, for example) compensating by moving the lens itself.
[+] [-] hamoid|14 years ago|reply
[+] [-] chriszf|14 years ago|reply
The primary difference seems to be estimation and calculation technique. Liu's work does a structure-from-motion reconstruction, ie: rebuild a 3d model of the original scene. Google's work uses something called pyramidal Lucas-Kanade to do 'feature tracking' instead. This is sort of localized reconstruction, it seems to only care about the viewport differences from frame to frame. They then feed it through some linear programming voodoo to get the best path.
I don't understand either well enough to say why one is better than the other, although I'd guess it's because Lucas-Kanade is temporally and physically localized, it's easier to farm out to a parallel cluster than an SfM technique.
There also seems to be a difference on the rear end of the technique, having feature detection allows them to add 'saliency' constraints, ie: retarget based on the inclusion of certain features, like a person's face. Again, the math is beyond my understanding, but it seems like this isn't part of Liu's work.
[+] [-] modeless|14 years ago|reply
[+] [-] ChuckMcM|14 years ago|reply
[+] [-] MrMike|14 years ago|reply
[+] [-] Pelayo|14 years ago|reply
[+] [-] objclxt|14 years ago|reply
[+] [-] modeless|14 years ago|reply
[+] [-] ck2|14 years ago|reply
[+] [-] rabidsnail|14 years ago|reply
[+] [-] adrianwaj|14 years ago|reply
[+] [-] obtu|14 years ago|reply
[+] [-] Too|14 years ago|reply
[+] [-] est|14 years ago|reply
[+] [-] tambourine_man|14 years ago|reply
Shows how far we are in this whole cloud era.
[+] [-] CubicleNinjas|14 years ago|reply
sigh