It isn't obvious at first glance that this small xml file actually expands to billion "lols". You really have to give the bad guys credit for ingenuity.
The problem here is that a lot of newly minted IT professionals will write stuff that will allow a re-use of all those old exploits all over again.
Occasionally warning people about known dangers doesn't harm. Sure that's not exactly moving the needle but it may help to mitigate a few lurking problems that people were not yet aware of.
Think of it as a 'wear your seatbelt' advertisement. If you're already wearing your seatbelt then it wasn't meant for you.
So, how much memory would a real-world parser actually consume given this file? I'd try it, but I had to RMA my workstation's motherboard yesterday, leaving me with a machine that only has 3GB, which is the obvious minimum for a full expansion. But I could imagine an XML parser might use UCS-2 internally, inflating this to 6GB. Or, some parsers might be clever and not attempt a full expansion.
So you're asking how much memory this resoource exhaustion attack consumes when you run it.
Each line in the WP examaple amplifies by a factor of 10. It has 9 lines. It's 10e9. That's a billion times 3, which is just enough to 32-bit virtual memory space in common operating systems.
Of course the XML implementation could be smart and short-circuit this while preserving the semantics.
and that's probably because macros systems typically are recursive since they are otherwise fairly limited, so this is a way to give them sufficient expressive power. so the explanation is probably "tradition" (and likely made more sense in sgml, which was the precursor to html).
To pull this particular style of trick you require a schema definition that allows for one object to be expanded into a whole set of objects, and for the resulting data structure to be a tree rather than a simply a rooted directed graph.
I don't know YAML well but I believe if you tried this trick with something like alias nodes then you would end up with a lol9 node with ten separate connections to a single lol8 node with ten separate connections to a single lol7 node and so on. This would not produce the same problem in the parser, though might trigger problems in whatever processed the resulting graph.
I don't think so. YAML by itself doesn't have any entity expansion or include capabilities -- you'd have to rely on extensions or something else that isn't on by default. The reference (alias/anchor) mechanism just rebuild the serialized graph, so there's no expansion going on there. That said, I'm sure there are quite possible implementation issues, like most software.
To some extent every protocol which does not transfer message and payload size on a fixed offset in header can be called "crazy" as being vulnerable to all the problems with live parsing, terminators and unpredictable memory requirements.
[+] [-] astrojams|13 years ago|reply
[+] [-] tisme|13 years ago|reply
Even simple bash scripts can do weird things like this. And that's a lot smaller.
[+] [-] lifeformed|13 years ago|reply
[+] [-] dguido|13 years ago|reply
Can we move beyond this simple issue and discuss more complicated aspects of security on HN?
[+] [-] dguido|13 years ago|reply
https://news.ycombinator.com/item?id=259458
https://news.ycombinator.com/item?id=3859853
https://news.ycombinator.com/item?id=1674911
https://news.ycombinator.com/item?id=301296
https://news.ycombinator.com/item?id=4619344
People that exploit these kinds of things continue to innovate, but HN seems to be stuck with XSS, SQLi, and malformed XML.
[+] [-] tisme|13 years ago|reply
Occasionally warning people about known dangers doesn't harm. Sure that's not exactly moving the needle but it may help to mitigate a few lurking problems that people were not yet aware of.
Think of it as a 'wear your seatbelt' advertisement. If you're already wearing your seatbelt then it wasn't meant for you.
[+] [-] ftwinnovations|13 years ago|reply
Look, if people vote it up it means they like it. Just because its been talked about before doesn't make it less valuable.
[+] [-] VMG|13 years ago|reply
[+] [-] marmot1101|13 years ago|reply
[+] [-] angrycoder|13 years ago|reply
[+] [-] ctdonath|13 years ago|reply
If you would like a part of it improved & expanded, please proceed to do so.
[+] [-] wtallis|13 years ago|reply
[+] [-] zurn|13 years ago|reply
Each line in the WP examaple amplifies by a factor of 10. It has 9 lines. It's 10e9. That's a billion times 3, which is just enough to 32-bit virtual memory space in common operating systems.
Of course the XML implementation could be smart and short-circuit this while preserving the semantics.
[+] [-] ghshephard|13 years ago|reply
[+] [-] caseydurfee|13 years ago|reply
[+] [-] halter73|13 years ago|reply
[+] [-] lukeschlather|13 years ago|reply
[+] [-] andrewcooke|13 years ago|reply
and that's probably because macros systems typically are recursive since they are otherwise fairly limited, so this is a way to give them sufficient expressive power. so the explanation is probably "tradition" (and likely made more sense in sgml, which was the precursor to html).
[+] [-] unknown|13 years ago|reply
[deleted]
[+] [-] alexrbarlow|13 years ago|reply
I guess you could do this with YAML too?
[+] [-] aardvark179|13 years ago|reply
I don't know YAML well but I believe if you tried this trick with something like alias nodes then you would end up with a lol9 node with ten separate connections to a single lol8 node with ten separate connections to a single lol7 node and so on. This would not produce the same problem in the parser, though might trigger problems in whatever processed the resulting graph.
[+] [-] unknown|13 years ago|reply
[deleted]
[+] [-] clarkevans|13 years ago|reply
[+] [-] mbq|13 years ago|reply
[+] [-] 055static|13 years ago|reply
[+] [-] ilcavero|13 years ago|reply
[+] [-] praptak|13 years ago|reply
[+] [-] davyjones|13 years ago|reply
[+] [-] njharman|13 years ago|reply
[+] [-] Evbn|13 years ago|reply
Functional programming wins here.