top | item 4677364

Billion laughs

276 points| khet | 13 years ago |en.wikipedia.org

63 comments

order
[+] astrojams|13 years ago|reply
It isn't obvious at first glance that this small xml file actually expands to billion "lols". You really have to give the bad guys credit for ingenuity.
[+] tisme|13 years ago|reply
:(){ :|: & };:

Even simple bash scripts can do weird things like this. And that's a lot smaller.

[+] dguido|13 years ago|reply
Probably should rename this to "billion reposts."

Can we move beyond this simple issue and discuss more complicated aspects of security on HN?

[+] tisme|13 years ago|reply
The problem here is that a lot of newly minted IT professionals will write stuff that will allow a re-use of all those old exploits all over again.

Occasionally warning people about known dangers doesn't harm. Sure that's not exactly moving the needle but it may help to mitigate a few lurking problems that people were not yet aware of.

Think of it as a 'wear your seatbelt' advertisement. If you're already wearing your seatbelt then it wasn't meant for you.

[+] ftwinnovations|13 years ago|reply
I'm so tired of these "seen it before, I'm smarter and more senior than you" responses.

Look, if people vote it up it means they like it. Just because its been talked about before doesn't make it less valuable.

[+] VMG|13 years ago|reply
Disagree. I didn't know about that attack and found it very interesting.
[+] marmot1101|13 years ago|reply
10 year deep IT professional, recent HN member. This is the first I've seen of this attack. Reposts can be a good thing.
[+] angrycoder|13 years ago|reply
Achievement Unlocked: You've read all of the Internet.
[+] ctdonath|13 years ago|reply
The Internet is a much bigger place than some people realize.

If you would like a part of it improved & expanded, please proceed to do so.

[+] wtallis|13 years ago|reply
So, how much memory would a real-world parser actually consume given this file? I'd try it, but I had to RMA my workstation's motherboard yesterday, leaving me with a machine that only has 3GB, which is the obvious minimum for a full expansion. But I could imagine an XML parser might use UCS-2 internally, inflating this to 6GB. Or, some parsers might be clever and not attempt a full expansion.
[+] zurn|13 years ago|reply
So you're asking how much memory this resoource exhaustion attack consumes when you run it.

Each line in the WP examaple amplifies by a factor of 10. It has 9 lines. It's 10e9. That's a billion times 3, which is just enough to 32-bit virtual memory space in common operating systems.

Of course the XML implementation could be smart and short-circuit this while preserving the semantics.

[+] ghshephard|13 years ago|reply
You could try using an iteration that only required 300 MB for a full expansion...
[+] caseydurfee|13 years ago|reply
Is there a legitimate use case for being able to recursively define entities like that?
[+] halter73|13 years ago|reply
Look again. There's no actual recursion going on. It's fairly trivial to identify recursion in DTD entities.
[+] lukeschlather|13 years ago|reply
I don't see any recursion. Recursion would be an infinite loop and presumably a good parser would catch that.
[+] andrewcooke|13 years ago|reply
it's really the underlying macro system that's recursive - http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.3.2

and that's probably because macros systems typically are recursive since they are otherwise fairly limited, so this is a way to give them sufficient expressive power. so the explanation is probably "tradition" (and likely made more sense in sgml, which was the precursor to html).

[+] alexrbarlow|13 years ago|reply
I have to say, i love this, crazy, for a language that is really for transferring data.

I guess you could do this with YAML too?

[+] aardvark179|13 years ago|reply
To pull this particular style of trick you require a schema definition that allows for one object to be expanded into a whole set of objects, and for the resulting data structure to be a tree rather than a simply a rooted directed graph.

I don't know YAML well but I believe if you tried this trick with something like alias nodes then you would end up with a lol9 node with ten separate connections to a single lol8 node with ten separate connections to a single lol7 node and so on. This would not produce the same problem in the parser, though might trigger problems in whatever processed the resulting graph.

[+] unknown|13 years ago|reply

[deleted]

[+] clarkevans|13 years ago|reply
I don't think so. YAML by itself doesn't have any entity expansion or include capabilities -- you'd have to rely on extensions or something else that isn't on by default. The reference (alias/anchor) mechanism just rebuild the serialized graph, so there's no expansion going on there. That said, I'm sure there are quite possible implementation issues, like most software.
[+] mbq|13 years ago|reply
To some extent every protocol which does not transfer message and payload size on a fixed offset in header can be called "crazy" as being vulnerable to all the problems with live parsing, terminators and unpredictable memory requirements.
[+] 055static|13 years ago|reply
This doesn't work with my sed-based XML parsers. :(
[+] ilcavero|13 years ago|reply
so, how do I protect myself against this?
[+] praptak|13 years ago|reply
Assign a limited memory pool to the parser, it's in the article.
[+] davyjones|13 years ago|reply
Most of the xml libraries already have the fix.
[+] njharman|13 years ago|reply
Stop using XML.
[+] Evbn|13 years ago|reply
Use a parser or data structure that doesn't duplicate identical objects.

Functional programming wins here.