The Python logging module is one of the worst modules in the stdlib. You can see it’s Java inspiration all over it, as being one of the oldest modules it was created before anyone really knew what “pythonic” code was.
Things I really dislike about it is the lack of “context”. Usually log messages are about something - often a request. You can’t attach a request ID or something else a nested set of function calls that log.
The whole library is just really rubbish and has not aged well. There’s no standard way of outputting JSON logs, but there’s a built in way to spawn a socket server that allows the logging configuration to be updated remotely via a bespoke protocol? And everything is built mostly around logging to files.
It also hits that perfect sweet spot of being both overly engineered and totally inflexible. I wanted to funnel our log messages to s3 via the “s3fs” module, which supports a “file like” object that gets written to s3. I had to hack around the various handlers to support this because it assumes it’s a “real” file.
I've been writing Python daily for 15 years but when I have to do any configuration beyond `basicConfig()`, I still groan. In many instances I have wanted to do something like temporarily turn on logging of HTTP requests, which turns into a much more involved effort than it ought to be.
I just recently discovered that there's no documented way to dump the current configuration (https://stackoverflow.com/q/72624883) so you can see what loggers exist and where their output is going.
It's definitely sad that there isn't built-in context support, but if you're looking for a solution to this, it is actually very easy to attach context to logs using threadlocal and a logging filter.
When running a webserver though, an even simpler trick is to add middleware that sets the current request's info as the current thread's name, and then including threadName in your log format.
Agreed. One of the most confusing libraries with a ton of magic going on behind the scenes. I also detest PyTest, although not a std lib, pytest is one of the worst in terms of explicitness. It’s very useful but that’s orthogonal. I’ve seen newcomers to python struggle to grasp how pytest just manipulates a billion things behind the scenes.
Overall, I prefer less magic and more explicitness.
>> You can see it’s Java inspiration all over it, as being one of the oldest modules it was created before anyone really knew what “pythonic” code was.
The irony is Java logging moved on and with newer libraries offers better supports for lazy evaluation, structured logging and multiple output formats, while Python logging seems stuck with an ancient copy of log4j 1.x
> there’s a built in way to spawn a socket server that allows the logging configuration to be updated remotely via a bespoke protocol
That sounds like there's a Log4shell-like vulnerability waiting to be found. But I couldn't even find it, I found the SocketHandler but that's just a log destination.
It’s rock solid, at least. I’ve also had the displeasure of working in the logging package. Totally outdated. I created a JSON formatter (also support for context). Huge pain, but always works.
You should try the loguru library. I was able to roll a rolling-upload-to-s3 adapter in under an hour. Switching to json logs is one bool flag away. Plus it's gorgeous
I can't agree with you more. I switched to Python's logger off of my janky thing about 6 months ago, and I'm hating every second of it. I still have no clue how to turn it on, turn it off, get it to fork output (stdout, stderr, file), nor how to override its behavior.
It's literally easier to just have an IOBuffer laying around.
One of my favorite features of loguru is the ability to contextualize logging messages using a context manager.
with logger.contextualize(user_name=user_name):
handle_request(...)
Within the context manager, all loguru logging messages include the contextual information. This is especially nice when used in combination with loguru's native JSON formatting support and a JSON-compatible log archiving system.
One downside of loguru is that it doesn't include messages from Python's standard logging system, but it can be integrated as a logging handler for the standard logging system. At the company where I work we've created a helper similar to Datadog's ddtrace-run executable to automatically set up these integrations and log message formatting with loguru
I mentioned this almost every time logging and python come up but one of my favorite feature is logging.exception: https://docs.python.org/3/library/logging.html#logging.Logge.... It allows you to attach the whole exception with stack trace without having to explicitly used “as e” and then use traceback or embed the error in the message. Something like logger.exception(“Failed to connect.”) will print the message and the exception in one go.
I'm using this too, but I always found it very unreadable for people unfamiliar with it. It's so surprising that it catches the exception for you! I end up always adding a commend
```
logger.exception("Failed to connect.") # Will print exception.
```
You can actually do this with any level of log message by passing `exc_info=True` to it. This is super useful in cases where you want to log the exception context, but don't need to be at a high logging level.
The advice I give to people is "think about logging like a feature with customers, just like any other feature." If you think about logging this way, you ideally put yourself in the position of your logs' "customer"- a bleary-eyed teammate who just got paged out of bed at 2am, or a security guy who isn't intimately familiar with your code. Providing extra context for those users is really kind, as they don't have a "secret decoder ring".
Since the article discusses it, I also highly recommend AWS CloudWatch logs. It's super easy to log to CloudWatch from Lambda with the Python logger class. It's a little harder from EC2, but not really that difficult. The advantage is that you don't need to SSH somewhere and navigate to a file and then locate the interesting stuff in that file - You just keep the CloudWatch console open (or keep a link to it handy) and the latest stuff is on top with easy filtering.
Cloudwatch is cheap (even free) for small environments and has very small operational overhead.
The cost, however, can easily get out of hand and AWS does not have per resource billing enabled by default. So it might not be obvious which Log Group is responsible for for which charge.
I also think its console is way inferior than something like Grafana for searching and visualisation.
> It's super easy to log to CloudWatch from Lambda with the Python logger class
Or even just print(): You've got a unique log group for a lambda, CloudWatch logs autonatically timestamps entries, and stdout & stderr from a lambda both go to the current CloudWatch log stream in the lambda’s CloudWatch logs group.
Here’s a logging format that I use and probably you should too. Among other things, it gets around most of the context issues by outputting the file and line number of the code that called the log.
Before I would try and make every log message unique or mostly unique so there weren’t double ups.
I don’t agree with the authors logging clutter though. If you’re manually eyeballing the logs looking for a message you’re already doing it wrong. Use search, have ids in the messages etc.
Remember folks, debug logging on because if it’s not logged, when it breaks you don’t have the logs…
I think this is a good overview, I especially like "descriptive, contextual, reactive." I like to ask "who is going to see this message and why? What action should they take?"
I've also come to find a few "wide" log entries significantly better than having many smaller log messages (especially as the system grows).
Ideally, one operation == one log message, and contains all context: userid, sessionid, url path, domain, error codes/messages, timing, etc. log error/warning if it fails, info if success. It's much easier to follow than one operation spread across a half dozen log messages. In dev, with one thing happening at a time, it's no big deal, but in production there's way more stuff happening and it is interspersed with hundreds of lines of logs of other operations as well as other threads running the same operation concurrently.
Does anyone ever log in multi dimensional way at all?
Everyone seems to just go,
logger.info('Doing that')
But logs really should be covering the context instead of being single dimensional by being placed at the execution point, which will also make the code far more readable.
//Sorry, I don't know python well but trying to run a lambda as a parameter.
myLogger.info('Doing that', lambda : (
doThat()
whatNot()))
And then, you know exactly what the log is covering as well as take a benchmark of the said part and report how long it took to run.
You don't need 'Done that' logging, as if you throw inside, the log doesn't happen.
You might want to know 'Doing that' was run even if it threw inside, but if logging is paired with an error reporting tools like Sentry, which is quite essential than depending on simple logging, you can know the stack trace easily to track which part of the code was run.
It seems no logging libraries do it the multidimensional way but it's an essential part of how I do logging.
Honestly, I like to keep the logger as decoupled and minimalist as possible. There’s https://vector.dev which can basically ship your logs from source A to destination B (supports tonnes of usecases). Separation of concerns makes things much more easier.
Maybe it's Stockholm syndrome, but contrary to what seems like most people I don't actually mind the built in logging library. Sure, it reeks of its Java roots and can be a bit of a pain to set up configure, but once that's done it's incredibly flexible and generally works really well.
One of the main gripes I do have with it is its reliance on the old-school %-based formatting for log messages:
logger.debug("Invalid object '%s'", obj)
As there are real advantages to providing a template and object args separately, this is a bit of a shame since it pushes people to use pre-formatted strings without any args instead.
I fixed this for myself by writing bracelogger[0]. If this is a pain point for you too, you might find it useful.
Ex:
logger.debug("Invalid object of type {0.__class__.__name__}: {0}", obj)
import pygogo as gogo
kwargs = {'contextual': True}
extra = {'additional': True}
logger = gogo.Gogo('basic').get_structured_logger('base', **kwargs)
logger.debug('message', extra=extra)
# Prints the following to `stdout`:
{"additional": true, "contextual": true, "message": "message"}
Best way to find logs connected to one of your API requests is by using correlation IDs, which are injected to each log message. Me and my friend wrote libraries to do this for both Django[1] and ASGI apps(starlette, FastAPI)[2]
In the first example, the author says not to log who the user is if a connection is successful, only if it fails, because if it succeeds there's nothing you need to do. If that's true then why log it?
given you’ve configured logging somewhere early on in the bootstrap phase of your app - there’s really few reason to go up and above having more than these two lines in all your logging modules
anyone else just define a function log(msg) the prints to stderr or maybe stdout and use that instead for simple programs/daemons? I never really got the big value proposition of seven-leveled logging or the complexity of it. If I need to customize it I just change my log function.
[+] [-] orf|3 years ago|reply
Things I really dislike about it is the lack of “context”. Usually log messages are about something - often a request. You can’t attach a request ID or something else a nested set of function calls that log.
The whole library is just really rubbish and has not aged well. There’s no standard way of outputting JSON logs, but there’s a built in way to spawn a socket server that allows the logging configuration to be updated remotely via a bespoke protocol? And everything is built mostly around logging to files.
It also hits that perfect sweet spot of being both overly engineered and totally inflexible. I wanted to funnel our log messages to s3 via the “s3fs” module, which supports a “file like” object that gets written to s3. I had to hack around the various handlers to support this because it assumes it’s a “real” file.
Hate it.
[+] [-] rgovostes|3 years ago|reply
I just recently discovered that there's no documented way to dump the current configuration (https://stackoverflow.com/q/72624883) so you can see what loggers exist and where their output is going.
[+] [-] paulfurtado|3 years ago|reply
When running a webserver though, an even simpler trick is to add middleware that sets the current request's info as the current thread's name, and then including threadName in your log format.
[+] [-] formerly_proven|3 years ago|reply
[+] [-] systemvoltage|3 years ago|reply
Overall, I prefer less magic and more explicitness.
[+] [-] dikei|3 years ago|reply
The irony is Java logging moved on and with newer libraries offers better supports for lazy evaluation, structured logging and multiple output formats, while Python logging seems stuck with an ancient copy of log4j 1.x
[+] [-] doliveira|3 years ago|reply
That sounds like there's a Log4shell-like vulnerability waiting to be found. But I couldn't even find it, I found the SocketHandler but that's just a log destination.
EDIT: shockingly, I did find it: https://docs.python.org/3/howto/logging-cookbook.html#config...
[+] [-] drzoltar|3 years ago|reply
[+] [-] helsinki|3 years ago|reply
[+] [-] emmelaich|3 years ago|reply
Presumably these levels derive from syslog or older. I've no idea why they omitted NOTICE.
[+] [-] LtWorf|3 years ago|reply
[+] [-] kortex|3 years ago|reply
https://github.com/Delgan/loguru
Also iirc s3's "file-like interface" does not actually obey the file protocol, which is obnoxious.
[+] [-] miohtama|3 years ago|reply
[+] [-] unknown|3 years ago|reply
[deleted]
[+] [-] wheelerof4te|3 years ago|reply
[+] [-] 1337shadow|3 years ago|reply
[+] [-] sorokod|3 years ago|reply
[+] [-] MichaelRazum|3 years ago|reply
[+] [-] thechao|3 years ago|reply
It's literally easier to just have an IOBuffer laying around.
[+] [-] connorbrinton|3 years ago|reply
One of my favorite features of loguru is the ability to contextualize logging messages using a context manager.
Within the context manager, all loguru logging messages include the contextual information. This is especially nice when used in combination with loguru's native JSON formatting support and a JSON-compatible log archiving system.One downside of loguru is that it doesn't include messages from Python's standard logging system, but it can be integrated as a logging handler for the standard logging system. At the company where I work we've created a helper similar to Datadog's ddtrace-run executable to automatically set up these integrations and log message formatting with loguru
[+] [-] memco|3 years ago|reply
[+] [-] atorodius|3 years ago|reply
``` logger.exception("Failed to connect.") # Will print exception. ```
[+] [-] pR0Ps|3 years ago|reply
Ex:
[+] [-] Narushia|3 years ago|reply
[+] [-] efitz|3 years ago|reply
The advice I give to people is "think about logging like a feature with customers, just like any other feature." If you think about logging this way, you ideally put yourself in the position of your logs' "customer"- a bleary-eyed teammate who just got paged out of bed at 2am, or a security guy who isn't intimately familiar with your code. Providing extra context for those users is really kind, as they don't have a "secret decoder ring".
Since the article discusses it, I also highly recommend AWS CloudWatch logs. It's super easy to log to CloudWatch from Lambda with the Python logger class. It's a little harder from EC2, but not really that difficult. The advantage is that you don't need to SSH somewhere and navigate to a file and then locate the interesting stuff in that file - You just keep the CloudWatch console open (or keep a link to it handy) and the latest stuff is on top with easy filtering.
[+] [-] jstarfish|3 years ago|reply
You may well find yourself wishing you did this if you also output DEBUG logs in production.
[+] [-] pid-1|3 years ago|reply
The cost, however, can easily get out of hand and AWS does not have per resource billing enabled by default. So it might not be obvious which Log Group is responsible for for which charge.
I also think its console is way inferior than something like Grafana for searching and visualisation.
[+] [-] AmpsterMan|3 years ago|reply
I miss that I can't do that any longer
[+] [-] dragonwriter|3 years ago|reply
Or even just print(): You've got a unique log group for a lambda, CloudWatch logs autonatically timestamps entries, and stdout & stderr from a lambda both go to the current CloudWatch log stream in the lambda’s CloudWatch logs group.
[+] [-] jasonpeacock|3 years ago|reply
Shorter version: Use structured logging!
If using Python, use `structlog`: https://www.structlog.org/en/stable/
[+] [-] delijati|3 years ago|reply
[+] [-] timost|3 years ago|reply
It can be tricky to forward all logs through it [0] but it's definitely worth it.
[0] https://www.structlog.org/en/stable/standard-library.html
[+] [-] matheusmoreira|3 years ago|reply
[+] [-] aunty_helen|3 years ago|reply
Here’s a logging format that I use and probably you should too. Among other things, it gets around most of the context issues by outputting the file and line number of the code that called the log.
Before I would try and make every log message unique or mostly unique so there weren’t double ups.
I don’t agree with the authors logging clutter though. If you’re manually eyeballing the logs looking for a message you’re already doing it wrong. Use search, have ids in the messages etc.
Remember folks, debug logging on because if it’s not logged, when it breaks you don’t have the logs…
[+] [-] miohtama|3 years ago|reply
For powerful log channeling and notifications to Telegram, Slack, etc. there is this Python package:
https://pypi.org/project/notifiers/
I also created a logging handler for Discord, where one can get notifications on errors to a private channel:
https://github.com/tradingstrategy-ai/python-logging-discord...
[+] [-] gregmac|3 years ago|reply
I've also come to find a few "wide" log entries significantly better than having many smaller log messages (especially as the system grows).
Ideally, one operation == one log message, and contains all context: userid, sessionid, url path, domain, error codes/messages, timing, etc. log error/warning if it fails, info if success. It's much easier to follow than one operation spread across a half dozen log messages. In dev, with one thing happening at a time, it's no big deal, but in production there's way more stuff happening and it is interspersed with hundreds of lines of logs of other operations as well as other threads running the same operation concurrently.
[+] [-] mekster|3 years ago|reply
Everyone seems to just go,
But logs really should be covering the context instead of being single dimensional by being placed at the execution point, which will also make the code far more readable. And then, you know exactly what the log is covering as well as take a benchmark of the said part and report how long it took to run.You don't need 'Done that' logging, as if you throw inside, the log doesn't happen.
You might want to know 'Doing that' was run even if it threw inside, but if logging is paired with an error reporting tools like Sentry, which is quite essential than depending on simple logging, you can know the stack trace easily to track which part of the code was run.
It seems no logging libraries do it the multidimensional way but it's an essential part of how I do logging.
[+] [-] jxramos|3 years ago|reply
Interesting capabilities. There's a rundown with log output examples seen in https://docs.python.org/3/library/logging.html#logging.Logge...
[+] [-] mr-karan|3 years ago|reply
[+] [-] pR0Ps|3 years ago|reply
One of the main gripes I do have with it is its reliance on the old-school %-based formatting for log messages:
As there are real advantages to providing a template and object args separately, this is a bit of a shame since it pushes people to use pre-formatted strings without any args instead.I fixed this for myself by writing bracelogger[0]. If this is a pain point for you too, you might find it useful.
Ex:
[0]: https://github.com/pR0Ps/bracelogger[+] [-] reubano|3 years ago|reply
[+] [-] hotfixguru|3 years ago|reply
[1] https://github.com/snok/django-guid
[2] https://github.com/snok/asgi-correlation-id
[+] [-] Jeff_Brown|3 years ago|reply
[+] [-] unknown|3 years ago|reply
[deleted]
[+] [-] kordlessagain|3 years ago|reply
[+] [-] frays|3 years ago|reply
[+] [-] alanwreath|3 years ago|reply
[+] [-] Nimitz14|3 years ago|reply
[+] [-] 1337shadow|3 years ago|reply
[+] [-] bakpakin|3 years ago|reply