top | item 31945564

Logging in Python like a pro

307 points| EntICOnc | 3 years ago |guicommits.com | reply

112 comments

order
[+] orf|3 years ago|reply
The Python logging module is one of the worst modules in the stdlib. You can see it’s Java inspiration all over it, as being one of the oldest modules it was created before anyone really knew what “pythonic” code was.

Things I really dislike about it is the lack of “context”. Usually log messages are about something - often a request. You can’t attach a request ID or something else a nested set of function calls that log.

The whole library is just really rubbish and has not aged well. There’s no standard way of outputting JSON logs, but there’s a built in way to spawn a socket server that allows the logging configuration to be updated remotely via a bespoke protocol? And everything is built mostly around logging to files.

It also hits that perfect sweet spot of being both overly engineered and totally inflexible. I wanted to funnel our log messages to s3 via the “s3fs” module, which supports a “file like” object that gets written to s3. I had to hack around the various handlers to support this because it assumes it’s a “real” file.

Hate it.

[+] rgovostes|3 years ago|reply
I've been writing Python daily for 15 years but when I have to do any configuration beyond `basicConfig()`, I still groan. In many instances I have wanted to do something like temporarily turn on logging of HTTP requests, which turns into a much more involved effort than it ought to be.

I just recently discovered that there's no documented way to dump the current configuration (https://stackoverflow.com/q/72624883) so you can see what loggers exist and where their output is going.

[+] paulfurtado|3 years ago|reply
It's definitely sad that there isn't built-in context support, but if you're looking for a solution to this, it is actually very easy to attach context to logs using threadlocal and a logging filter.

When running a webserver though, an even simpler trick is to add middleware that sets the current request's info as the current thread's name, and then including threadName in your log format.

[+] formerly_proven|3 years ago|reply
It's unfortunate how blind copying of log4j into many languages seems to have inhibited improvements in logging in the last ~20ish years.
[+] systemvoltage|3 years ago|reply
Agreed. One of the most confusing libraries with a ton of magic going on behind the scenes. I also detest PyTest, although not a std lib, pytest is one of the worst in terms of explicitness. It’s very useful but that’s orthogonal. I’ve seen newcomers to python struggle to grasp how pytest just manipulates a billion things behind the scenes.

Overall, I prefer less magic and more explicitness.

[+] dikei|3 years ago|reply
>> You can see it’s Java inspiration all over it, as being one of the oldest modules it was created before anyone really knew what “pythonic” code was.

The irony is Java logging moved on and with newer libraries offers better supports for lazy evaluation, structured logging and multiple output formats, while Python logging seems stuck with an ancient copy of log4j 1.x

[+] doliveira|3 years ago|reply
> there’s a built in way to spawn a socket server that allows the logging configuration to be updated remotely via a bespoke protocol

That sounds like there's a Log4shell-like vulnerability waiting to be found. But I couldn't even find it, I found the SocketHandler but that's just a log destination.

EDIT: shockingly, I did find it: https://docs.python.org/3/howto/logging-cookbook.html#config...

[+] drzoltar|3 years ago|reply
Are there any good 3rd party alternatives that work better?
[+] helsinki|3 years ago|reply
It’s rock solid, at least. I’ve also had the displeasure of working in the logging package. Totally outdated. I created a JSON formatter (also support for context). Huge pain, but always works.
[+] emmelaich|3 years ago|reply
That might explain why it lacks a NOTICE level. That bugged the hell out of me with Java.

Presumably these levels derive from syslog or older. I've no idea why they omitted NOTICE.

[+] LtWorf|3 years ago|reply
I normally just import syslog or journald. The latter allows for arbitrary fields to be defined and then searched, but it's not in the stdlib.
[+] kortex|3 years ago|reply
You should try the loguru library. I was able to roll a rolling-upload-to-s3 adapter in under an hour. Switching to json logs is one bool flag away. Plus it's gorgeous

https://github.com/Delgan/loguru

Also iirc s3's "file-like interface" does not actually obey the file protocol, which is obnoxious.

[+] miohtama|3 years ago|reply
Not to defend the legacy of good old logging, but if you have ever tried to log in Node.js applications Python logging starts to feel godsent.
[+] wheelerof4te|3 years ago|reply
Do you know about any functional-style logging libraries? Or just plain old procedural ones?
[+] 1337shadow|3 years ago|reply
Maybe worth trying pylogctx and python-logging-loki or see how it's done there.
[+] sorokod|3 years ago|reply
For whatever it's worth, java logging has mapped diagnostic context (MDC).
[+] MichaelRazum|3 years ago|reply
Yep. Have to agree. It works, but is really messy.
[+] thechao|3 years ago|reply
I can't agree with you more. I switched to Python's logger off of my janky thing about 6 months ago, and I'm hating every second of it. I still have no clue how to turn it on, turn it off, get it to fork output (stdout, stderr, file), nor how to override its behavior.

It's literally easier to just have an IOBuffer laying around.

[+] connorbrinton|3 years ago|reply
Another great project for Python logging is loguru: https://loguru.readthedocs.io/en/stable/overview.html

One of my favorite features of loguru is the ability to contextualize logging messages using a context manager.

  with logger.contextualize(user_name=user_name):
    handle_request(...)
Within the context manager, all loguru logging messages include the contextual information. This is especially nice when used in combination with loguru's native JSON formatting support and a JSON-compatible log archiving system.

One downside of loguru is that it doesn't include messages from Python's standard logging system, but it can be integrated as a logging handler for the standard logging system. At the company where I work we've created a helper similar to Datadog's ddtrace-run executable to automatically set up these integrations and log message formatting with loguru

[+] memco|3 years ago|reply
I mentioned this almost every time logging and python come up but one of my favorite feature is logging.exception: https://docs.python.org/3/library/logging.html#logging.Logge.... It allows you to attach the whole exception with stack trace without having to explicitly used “as e” and then use traceback or embed the error in the message. Something like logger.exception(“Failed to connect.”) will print the message and the exception in one go.
[+] atorodius|3 years ago|reply
I'm using this too, but I always found it very unreadable for people unfamiliar with it. It's so surprising that it catches the exception for you! I end up always adding a commend

``` logger.exception("Failed to connect.") # Will print exception. ```

[+] pR0Ps|3 years ago|reply
You can actually do this with any level of log message by passing `exc_info=True` to it. This is super useful in cases where you want to log the exception context, but don't need to be at a high logging level.

Ex:

    logger.debug("A non-critical exception occurred", exc_info=True)
[+] Narushia|3 years ago|reply
I second this, very useful. I kind of expected the article to mention it, but it didn’t!
[+] efitz|3 years ago|reply
Very good advice.

The advice I give to people is "think about logging like a feature with customers, just like any other feature." If you think about logging this way, you ideally put yourself in the position of your logs' "customer"- a bleary-eyed teammate who just got paged out of bed at 2am, or a security guy who isn't intimately familiar with your code. Providing extra context for those users is really kind, as they don't have a "secret decoder ring".

Since the article discusses it, I also highly recommend AWS CloudWatch logs. It's super easy to log to CloudWatch from Lambda with the Python logger class. It's a little harder from EC2, but not really that difficult. The advantage is that you don't need to SSH somewhere and navigate to a file and then locate the interesting stuff in that file - You just keep the CloudWatch console open (or keep a link to it handy) and the latest stuff is on top with easy filtering.

[+] jstarfish|3 years ago|reply
My standard, to the extent possible: make your logs clear enough that your attorney would be able to understand them.

You may well find yourself wishing you did this if you also output DEBUG logs in production.

[+] pid-1|3 years ago|reply
Cloudwatch is cheap (even free) for small environments and has very small operational overhead.

The cost, however, can easily get out of hand and AWS does not have per resource billing enabled by default. So it might not be obvious which Log Group is responsible for for which charge.

I also think its console is way inferior than something like Grafana for searching and visualisation.

[+] AmpsterMan|3 years ago|reply
I don't know if it's the way cloud watch works in my current company or what, but I preferred SSH into the server and grep for what I want lol.

I miss that I can't do that any longer

[+] dragonwriter|3 years ago|reply
> It's super easy to log to CloudWatch from Lambda with the Python logger class

Or even just print(): You've got a unique log group for a lambda, CloudWatch logs autonatically timestamps entries, and stdout & stderr from a lambda both go to the current CloudWatch log stream in the lambda’s CloudWatch logs group.

[+] jasonpeacock|3 years ago|reply
This is very long (but thorough!) article about implementing structured logging.

Shorter version: Use structured logging!

If using Python, use `structlog`: https://www.structlog.org/en/stable/

[+] matheusmoreira|3 years ago|reply
Thank you so much for posting this. Very well designed library. I wish logging worked like this on every language.
[+] aunty_helen|3 years ago|reply
"%(asctime)s - %(levelname)-5s - %(name)-20.20s - %(message)s - (%(filename)s:%(lineno)s)"

Here’s a logging format that I use and probably you should too. Among other things, it gets around most of the context issues by outputting the file and line number of the code that called the log.

Before I would try and make every log message unique or mostly unique so there weren’t double ups.

I don’t agree with the authors logging clutter though. If you’re manually eyeballing the logs looking for a message you’re already doing it wrong. Use search, have ids in the messages etc.

Remember folks, debug logging on because if it’s not logged, when it breaks you don’t have the logs…

[+] gregmac|3 years ago|reply
I think this is a good overview, I especially like "descriptive, contextual, reactive." I like to ask "who is going to see this message and why? What action should they take?"

I've also come to find a few "wide" log entries significantly better than having many smaller log messages (especially as the system grows).

Ideally, one operation == one log message, and contains all context: userid, sessionid, url path, domain, error codes/messages, timing, etc. log error/warning if it fails, info if success. It's much easier to follow than one operation spread across a half dozen log messages. In dev, with one thing happening at a time, it's no big deal, but in production there's way more stuff happening and it is interspersed with hundreds of lines of logs of other operations as well as other threads running the same operation concurrently.

[+] mekster|3 years ago|reply
Does anyone ever log in multi dimensional way at all?

Everyone seems to just go,

  logger.info('Doing that')
But logs really should be covering the context instead of being single dimensional by being placed at the execution point, which will also make the code far more readable.

  //Sorry, I don't know python well but trying to run a lambda as a parameter.
  myLogger.info('Doing that', lambda : (
    doThat()
    whatNot()))
And then, you know exactly what the log is covering as well as take a benchmark of the said part and report how long it took to run.

You don't need 'Done that' logging, as if you throw inside, the log doesn't happen.

You might want to know 'Doing that' was run even if it threw inside, but if logging is paired with an error reporting tools like Sentry, which is quite essential than depending on simple logging, you can know the stack trace easily to track which part of the code was run.

It seems no logging libraries do it the multidimensional way but it's an essential part of how I do logging.

[+] mr-karan|3 years ago|reply
Honestly, I like to keep the logger as decoupled and minimalist as possible. There’s https://vector.dev which can basically ship your logs from source A to destination B (supports tonnes of usecases). Separation of concerns makes things much more easier.
[+] pR0Ps|3 years ago|reply
Maybe it's Stockholm syndrome, but contrary to what seems like most people I don't actually mind the built in logging library. Sure, it reeks of its Java roots and can be a bit of a pain to set up configure, but once that's done it's incredibly flexible and generally works really well.

One of the main gripes I do have with it is its reliance on the old-school %-based formatting for log messages:

    logger.debug("Invalid object '%s'", obj)
As there are real advantages to providing a template and object args separately, this is a bit of a shame since it pushes people to use pre-formatted strings without any args instead.

I fixed this for myself by writing bracelogger[0]. If this is a pain point for you too, you might find it useful.

Ex:

    logger.debug("Invalid object of type {0.__class__.__name__}: {0}", obj)
[0]: https://github.com/pR0Ps/bracelogger
[+] reubano|3 years ago|reply
I'd just like to plug my lib, pygogo (https://github.com/reubano/pygogo). Here's a structured log example taken from the docs.

  import pygogo as gogo

  kwargs = {'contextual': True}
  extra = {'additional': True}
  logger = gogo.Gogo('basic').get_structured_logger('base', **kwargs)
  logger.debug('message', extra=extra)

  # Prints the following to `stdout`:
  {"additional": true, "contextual": true, "message": "message"}
[+] Jeff_Brown|3 years ago|reply
In the first example, the author says not to log who the user is if a connection is successful, only if it fails, because if it succeeds there's nothing you need to do. If that's true then why log it?
[+] kordlessagain|3 years ago|reply
As a logger in a previous lifetime, I approve of this content.
[+] frays|3 years ago|reply
Great article. Didn't know about the `extras` keyword.
[+] alanwreath|3 years ago|reply
given you’ve configured logging somewhere early on in the bootstrap phase of your app - there’s really few reason to go up and above having more than these two lines in all your logging modules

  import logging
  logger = logging.getLogger(__name__)
But boy oh have I seen some funny business where people attach the logger to their objects like so

   class Wat:
     def __init__(self):
       self.logger = logging.getLogger(…)
All for what? For WAT?!
[+] Nimitz14|3 years ago|reply
That is a super common mistake from beginner python programmers.
[+] 1337shadow|3 years ago|reply
To inherit the logger attribute in subclasses?? Save an import?
[+] bakpakin|3 years ago|reply
anyone else just define a function log(msg) the prints to stderr or maybe stdout and use that instead for simple programs/daemons? I never really got the big value proposition of seven-leveled logging or the complexity of it. If I need to customize it I just change my log function.