top | item 37478403

Chrome: Heap buffer overflow in WebP

333 points| skilled | 2 years ago |chromereleases.googleblog.com | reply

262 comments

order
[+] londons_explore|2 years ago|reply
Within Google Chrome, WebP images are decoded by the renderer, and therefore any exploit can only get renderer code execution - which is inside a sandbox.

Since the renderer is very complex, there are lots of exploits discovered each year for it. However, when someone manages to get code execution in a renderer they don't get much more access than a webpage would normally get. Specifically, they can't see or leave files on the filesystem of your machine, nor even read cookies for other domains.

[+] dspillett|2 years ago|reply
> when someone manages to get code execution in a renderer they don't get much more access than a webpage would normally get

Unless of course there is a sandbox break-out exploit to combine this with. So not an immediate priority, unless there is such an exploit in the wild that I don't know about, but important to patch as soon as not terribly inconvenient.

[+] sigmoid10|2 years ago|reply
Getting permissions on a single website user context is already enough considering the vast amount of user uploaded images on the modern internet. It's a bit like XSS on steroids, because it is not limited to one site or frontend.
[+] amelius|2 years ago|reply
Are you saying that the renderer can't send information back to the website?

Also, perhaps you turned off JavaScript, and now all of a sudden the website can still execute code?

[+] acdha|2 years ago|reply
This is why I’m more sympathetic to browser developers being slow to adopt new formats. WebP is a marginal advantage over JPEG (mostly transparency) which hasn’t seen much success but now that’s translated into multiple high-priority security holes and we’re all going to be spending the next month deploying patches everywhere which links against libwebp.

That doesn’t mean we shouldn’t do new things but I think as developers we’re prone to underestimate the cost pretty heavily.

[+] dvhh|2 years ago|reply
To be fair issues in jpeg decoding libraries also have been used in the past as vector for malware payload. While the webp ecosystem is far less mature, I am pretty sure that "older" format handling would also have its fair share of security issues.

But your reasoning is valid, it seems like a few weeks ago, netizen were arguing that jpeg xl should be adopted as fast as possible, and for that to be possible the browser developer "only needed to include the reference decoder code into their codebase" at "very little cost".

[+] frankjr|2 years ago|reply
But would have been these bugs discovered if the browsers had not adopted the new format? There are many other image formats and their libraries which are most likely riddled with bugs, but nobody cares because they are not used by major software (certainly not the people capable of finding and exploiting these bugs because of poor time reward ratio). Being unused for longer wouldn't make them any less buggy.
[+] miohtama|2 years ago|reply
The security holes are caused by C/C++ being unsafe language to develop with, not new image formats. If image, and other, encoders and decoders do not use unsafe languages, it’s unlikely they introduce any such bugs.
[+] userbinator|2 years ago|reply
It's also a far more complex format than JPEG, and complexity almost directly correlates with defect density.

That said, I also blame the culture of making code more complex than necessary, along with developers who barely understand the details.

[+] eviks|2 years ago|reply
So that slowness has bought nothing? You have bad format with exploits in the wild while using that as an excuse to gate great format. (which the browsers could "slowly" RIIR instead to address the safety issue directly)
[+] keepamovin|2 years ago|reply
Agree. This is one of the reasons it's better to go with older and more reliable JPEG for viewport streaming. An exploit chain would need to penetrate screen capture images to pass to the client. Browser zero days do occur and this is why it's important to have protection. For added protection people often use browser isolation, which specifically adds another layer of protection against such zero days.

If you're interested in an open source (AGPL-3.0) solution for zero trust browser isolation check out BrowserBox on GitHub. We are using JPEG (now WebP) for viewport pixels: https://github.com/dosyago/BrowserBoxPro

Technically, we did try using WebP due to its significant bandwidth gains. However, the compute overhead for encoding versus JPEG introduced unacceptable latency into our streaming pipeline, so for now, we're still against it. Security is an additional mark against the newer standard, as good as it is!

[+] o1y32|2 years ago|reply
Eh, I think websites adopt webp for their smaller size than anything else.
[+] SanderNL|2 years ago|reply
It would be cool if an app could use an user-provided “codecs” for all sort of common (media) things. That way I can determine an acceptable risk myself customized for my circumstances.

Maybe I’ll use an open, safe but feature incomplete webp implementation because I don’t care if three pixes are missing or the colors are not quite correct. Maybe I’ll provide a NULL renderer because I just don’t care.

I know this sounds stupid, but a man can wonder.

[+] saagarjha|2 years ago|reply
Did you know the path for decoding JPEGs on Apple hardware goes through the kernel?
[+] lost_tourist|2 years ago|reply
it also has lossless capability as well, that's a big one to me, it can replace png and jpeg for 99% of use cases.
[+] blurbleblurble|2 years ago|reply
It's much better from my experience.
[+] est31|2 years ago|reply
FTR there is a WebP decoder implementation in safe Rust in the image crate: https://github.com/image-rs/image

It used to be quite incomplete for a long time, but work last year has implemented many webp features. Chromium now has a policy of allowing the use of Rust dependencies, so maybe Chromium could start adopting it?

[+] insanitybit|2 years ago|reply
If Chrome had a flag for "use the safe versions of XYZ libraries, even if they might be slower" I would absolutely flip that switch and take the performance hit.
[+] NelsonMinar|2 years ago|reply
It's crazy to me in 2023 we're still writing new C/C++ code for something with as enormous an attack surface as a web browser.
[+] lifthrasiir|2 years ago|reply
The original commit in question: https://github.com/webmproject/libwebp/commit/f75dfbf23d1df1...

The commit that fixes this bug: https://github.com/webmproject/libwebp/commit/902bc919033134...

The original commit optimizes a Huffman decoder. The decoder uses a well-known optimization: it reads N bits in advance and determines how many bits have to be actually consumed and which symbol should be decoded, or, if it's an N-bit prefix of multiple symbols, which table should be consulted for remaining bits.

The old version did use lookup tables for short symbols, but longer symbols needed a graph traversal. The new version improved this by using an array of lookup tables. Each entry contains (nbits, value) where `nbits` is # bits to be consumed and `value` is normally a symbol, but if `nbits` exceeds N `value` is interpreted as a table index and `nbits` is reinterpreted as the longest code length in that subtree. So each subsequent table should have `2^(nbits - N)` entries (the root table is always fixed to 2^N entries).

The new version calculated the maximum number of entries based on the number of symbols (kTableSize). Of course, the Huffman tree comes from an untrusted source and you can easily imagine the case where `nbits` is very big. VP8 Lossless specifically allows up to 15 bits, so the largest possible table has 2^N + 2^15 entries when every single LUT is mapped to its own secondary table, and doing this doesn't need that many symbols (you only need 16-N symbols for each table). Amusingly enough the code itself had a mode where only the table size is calculated (VP8LBuildHuffmanTable called with `root_table == NULL`), but it wasn't somehow unused and the fixed maximum size was assumed. So if the Huffman tree was crafted in the way that maximizes the number of entries, it will overflow the allocation.

To be fair, I can see why this happened; the Huffman decoding step is one of the most computationally intensive part of many compression format and any small improvement matters. The Huffman decoder optimization described above is well known, but the longer code case is commonly considered less important to optimize because longer code should rarely appear in general. The original commit message refuted this, and was able to be merged. I'm not even sure that a memory safe language could have prevented this; this is a rare case where you actively want to avoid the overflow check [1].

[1] I should note that the memory corruption occurs during the table construction, which is not a tight loop, so a partial overflow check would be very helpful. The actual fix never altered the `ReadSymbol` function at all! But the safety of the tight loop should be still justified, so the wrong justification can ruin everything.

[+] tialaramex|2 years ago|reply
> I'm not even sure that a memory safe language could have prevented this; this is a rare case where you actively want to avoid the overflow check.

This component should be written in WUFFS. If you were correct, that the bounds check isn't needed that's fine, WUFFS doesn't emit runtime bounds checks. If, as was the case here, the software is wrong because it has a bounds miss, that won't compile in WUFFS.

You might be thinking, "That's impossible" and if WUFFS was a general purpose programming language you'd be correct. Rice's Theorem, non-trivial semantic properties are Undecidable. Fortunately WUFFS isn't a general purpose language. The vast majority of software can't be written with WUFFS. But you can write image codecs.

[+] avg_dev|2 years ago|reply
Thanks for your explanation.

I’m not a C programmer over the past decade+ and was never very good at it anyway. But I was thinking, based on your description, I agree that a bounds check would have caught the issue, but I’m also curious if an automated test could have been constructed to catch this sort of thing. I personally work with code where you could factor out some calculations to their own function and test them in isolation. Perhaps that would be tough to do here because of performance; I am really not sure.

[+] icelusxl|2 years ago|reply
Seems to be reported by Apple and looks a lot like this security update: https://support.apple.com/en-us/HT213906
[+] masklinn|2 years ago|reply
Reported by Apple “and the citizen lab at the UoT Munk School”, which is exactly who’s attributed on the page you link to. So yeah seems pretty likely, either Apple uses libwebp internally in ImageIO or they made a similar mistake.
[+] brigade|2 years ago|reply
It is interesting to see how different the reactions are now than last week
[+] PaulHoule|2 years ago|reply
There is a long history of vulnerabilities in image codecs. The actual image processing is nice linear code you could write in memory-safe FORTRAN IV but compression means you have a lot of variable length data structures, pointer chasing, etc... And the pressure to make it run fast.
[+] badrabbit|2 years ago|reply
Is there a realistic path to exploiting this? From what I hear, heap-spraying on 64bit is no longer practical. Is there a predictable object in memory that could be overwritten?
[+] nocsi|2 years ago|reply
And people wonder why the pentagon enforced user to run browsers in a VM or print out websites to read as flattened pdfs
[+] londons_explore|2 years ago|reply
I'm sure that in 2023, no mainstream software runs image decoders outside a sandbox right??
[+] intelVISA|2 years ago|reply
Yep, not following best practices was banned late 2022 so thankfully your fave outsourced shop is only shipping High Quality and Secure Code (tm)

Oh wait, that whole LLM thing...

[+] olliej|2 years ago|reply
Right, so all attacks now go:

1. Send payload to a process (image in a browser say)

2. Use payload to get code execution

3. If your current process has all the access you need then go forth and conquer; else

4. Send a payload to another process, go to step 2

Sandboxes are mitigations and roadblocks that increase the complexity of an attack they don’t make them go away.

[+] _ache_|2 years ago|reply
The WeBP and WeBP2 lib are a lot more tested than others format. I tried the official jxl lib and the heic tools, and I was able to identify bugs among these tools.

I'm pretty sure that you can find a buffer overflow in these decoders.

[+] irundebian|2 years ago|reply
It would be interesting to know how this vulnerability has been found.
[+] kouteiheika|2 years ago|reply
It's 2023, surely this is not yet another bug related to memory unsafety that could be avoided if we'd stop writing critical code that deals with extremely complex untrusted input (media codecs) in memory unsafe languages?

Yep, of course it is: https://github.com/webmproject/libwebp/commit/902bc919033134...

I guess libwebp could be excused as it was started when there were no alternatives, but even for new projects today we're still committing the same mistake[1][2][3].

[1] -- https://code.videolan.org/videolan/dav1d

[2] -- https://github.com/AOMediaCodec/libavif

[3] -- https://github.com/AOMediaCodec/libiamf

Yep. Keep writing these in C; surely nothing will go wrong.

[+] montzark|2 years ago|reply
Imo other related code of the person who made that mistake should be audited, could be there is similiar mistake somewhere else.

Additionally imo possible that some developers would intentionally make such mistakes to sell these to interested persons, there are millions to be made here.

[+] vohu43|2 years ago|reply
This does really not sound good
[+] gardaani|2 years ago|reply
This is a vulnerability in the WebP library, so this isn't only about Chrome. Which software is affected by this?

- Android apps?

- Cross-platform Flutter apps?

- Electron apps?

- Microsoft Edge and Brave (and all other Chromium based web browsers)?

- Signal and Telegram displaying WebP stickers?

- Which image editors are affected? Affinity tools, Gimp and Inkscape seem to use libwebp.

- LibreOffice displaying WebP images?

- server software such as WordPress (does it decode images to re-encode them?)?

Apple devices were also affected, but they already got a fix.

Anything else?

[+] smith7018|2 years ago|reply
I mean, it would most likely affect any software that uses libwebp. Taking a quick look at Arch's libwebp page:

* allegro

* emacs (lmao)

* ffmpeg

* freeimage

* gd (required for gnuplot, fceux, graphviz, etc)

* godot

* openimageio

* qt5/6-imageformats

* sdl2_image

* thunderbird

* webkit2gtk

* etc

optionally:

* gdal

* imagemagick

* python-pillow

* python-piexif

* etc

Should note that a vuln doesn't mean an exploit, of course.

[+] duringmath|2 years ago|reply
Firefox?

Android WebView got patched so there's that.