We've been one of 666 repos, and I'm not too happy of having our repo used as advertising space. Some thoughts:
- I'm happy to receive fix-a-typo PRs from human users. In this case the other side demonstrated that they care by putting in a bit of manual effort, and a small PR often paves the way towards larger contributions. I also know that open source beginners get really excited about their first small contributions, and I'm honestly happy to support that.
- In contrast, the marginal effort for bot PRs is ~0. It's very easy to generate a small amount of work for a lot of people, and the nice side effect is that the bot's platform is advertised everywhere. As a maintainer, I have never given consent to this and I have no choice to opt out.
We are very happy users of some GitHub bots, but I feel it needs to be an active adoption decision by the maintainer. If you want to pitch me your service you may send me an unsolicited email, but don't use our public space to advertise your product without asking.
Edit: I don't want to be too harsh to OP here - at least they pointed out a small but valid issue in our case. I very much appreciate their apology at https://news.ycombinator.com/item?id=31210245
I just... think you should reconsider your stance on this. If you made a mistake in a public repo and someone else caught it (via scan of your repo or otherwise), it's a pretty bad look to be anything but grateful at that point, PR benefits for the bot aside.
I understand the sentiment but you should be judging the PR, not the source. Ask yourself: would you have happily accepted the same PR that the bot sent if it came from a human?
By all means, I am not against having bots identify themselves properly, my point is that "effort from bot PR is ~0", "it advertises their platform" are simply not the right reasons to judge this situation by.
> I have never given consent to this and I have no choice to opt out.
You have a public repository on GitHub. You are free to switch it to private, but otherwise this absolutely illogical. No one needs your consent to submit PRs to and highlight a public repository.
This is equivalent to having a website and then getting angry about linking to it, or putting your artwork up for public viewing and then getting angry at someone pointing out a small tear in the fabric.
Actually, now that I think of it, it’s better comparable to someone bringing in a handheld scanner with a company name on it, scanning the artwork and then pointing out the tear.
Which is still totally fine. You’ve given implied consent by making it available to the public. You have decided to make it possible for the public to view it, criticize it and link to it.
Quote Wikipedia,
> Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, *study*, change, and distribute the software and its source code to anyone *and for any purpose*
I feel the same way about those bots that tell you about insignificant security vulnerabilities in some project you abandoned. It's basically spam.
That said, this does seem like it is a bit more useful. As long as they actually read the changes and make sure they aren't false positives. Which I'm guessing they didn't do for 666 repos.
> > We may be looking too deep into this but it seems like many developers think when string concatenation occurs it’s enough to declare the first string as an f-string and the other strings are turned into f-strings by osmosis. It doesn’t. We’re not suggesting this is the case for all developers that accidentally did this error, but interesting nonetheless.
I highly doubt that people believed that f-strings worked this way. Far more likely is that, for example, the expression started as one line, then got split onto two, or some such similar scenario.
This may be also caused by confusing syntax highlighting in some editors, for example in VSCode [1]
The variable in the second string gets highlighted (with slightly different color, but still) because it would still work with `str.format()`. GitHub doesn't seem to do this.
I don't mind it's a bot, and I really appreciate that apparently a human made the final review before sending the PR. But I don't like that the tittle of the commit is:
I expect a more neutral title for a commit, something like
> Fix fstring in <name-of-file>
Each maintainer/project has their own (weird) rules about titles, and if any other files must log the changes, and regression test, and whatever they like. But I think no maintainer/project expect to see the name of the author in the commit tile.
The article links to some docs for the logging module here: https://docs.python.org/3/howto/logging.html#optimization asserting that f-strings are less optimal but the docs do not say that they do not optimize our the expression evaluation of f-strings: only that the logging module tried to perform evaluation as late as possible: where is the f-string described as suboptimal?
Relatedly the logging optimization suggests setting: raiseExceptions to false for production logging code: where is that set? On the logger, handler or something else?
I was also confused by the expression evaluation thing. Reading between the lines, it seems like
logger.debug("hello %s", foo)
may be better than
logger.debug(f"hello {foo}")
in the case when loglevel is higher than debug. In the first version, the final string does not have to be computed, while in the second version, we might construct the string and then do nothing since the loglevel is excluded.
I guess it's implicit in that f-strings, as arguments, will be evaluated before the logging function can even run whereas `debug("...", heavy_obj)` will avoid a potentially expensive `str(heavy_obj)` (or whatever the string conversion warrants.)
As for raiseExceptions, I'm not sure it's for optimization. It looks like an old sanity check for bad logging configurations.
The "Use logging's interpolation" warning has always annoyed me. If logging is in your hot path, that might be an issue. Me f-string interpolating some info logs that run once for convenience is not.
I find it ironic that the article points out that relying on error from humans to find errors is something of a hit or miss proposition and suggests that automating error finding is an appropriate course instead of making it less likely to make the error in the first place.
For example, I wonder how many errors would have been found if the definition of a format string was the default? That is, how many times would people have written something like "hello {previously-defined-variable}" and not meant to substitute the value of that previously defined variable at runtime?
I don't think this makes sense. Plain strings and format strings are not interchangeable, and using one where the other was meant is probably a bug.
Would you expect that a user input like "{secret} please" is interpolated? If so, we hopefully agree that this would blow major security holes into any python script processing untrusted user input. And if not... Why not?
To be fair, your suggestion might make for a more resilient default, but it's also a great way to leak data and add overhead for the default case. There are tradeoffs.
>...and suggests that automating error finding is an appropriate course instead of making it less likely to make the error in the first place.
You can't fix the syntax and standard lib of the language. It is what it is. Similarly, how many bugs would you prevent if Python had compiler support to catch those types of syntax (and type) errors.
This is how bash works. Any string with a $ in it will be interpolated unless you double escape it. Also depending on if you use double or single quoted strings.
Spaces as list separator could also fall into this philosophical question of what makes most sense as string separators. Some times it is super convenient, until you have actual spaces in your string and it becomes a pita.
See also the yaml Norway problem for what happens when implicit goes over explicit.
It generates about the same amount of bugs, if not more, and would also end up with a code-review-doctor suggesting you to use /$ over $. In the end, regardless of syntax, a human always have to make the final call on whether interpolation is wanted or not.
I was actually surprised to read that people would ignore or be annoyed by a bot raising a valid PR that can be easily merged after a quick glance. What would be the reason for that?
Automated checking of potential bugs in f-strings is hard. There are lots of false positives. You can see some discussion around this kind of rule in pylint [0]. At the end of the day, the choice to run automated linting tools on a repo is up to the maintainers. Autogenerating PRs like this is incredibly noisy and comes off to me as a blatant advertisement for their "code review doctor" product.
In reactions they conveniently left out "false positives we still hadn't weeded out". On top of that it can be annoying to have bots making trivial PRs in their own format when you've got a well defined process for it. Lastly it was basically spamming an ad link for the service at the end of the PR comment - even if the other issues didn't come up it's not always well received to do that.
Looking at 1 bot it doesn't sound bad, when you have everyones bot doing this kind of stuff it can quickly become more of a nuisance than a help.
You're assuming that the PR is valid, but a maintainer can't make that assumption. They have to do the thankless work to figure out the context and handle the fallout if they get it wrong. Let's look at who wins:
* Small benefit to bot creator
* Tiny benefit to project
* Modest cost to maintainer
Waves of low-effort resume-padding commits are already a thing. Not a big problem, but bots clearly have the potential to multiply the small problem into a big problem.
I'm still open to the idea that bots could be a net win, because most projects really do have heaps of small simple mistakes lying around. I'm sympathetic to the maintainers though. They always seem to get the short end of the stick.
I would never tolerate ads in my commit history. That's ridiculous.
It's basically using open source repos as an advertising platform for their static-analysis bot.
If they want to offer services, they can reach out to the maintainers. This is different than a human opening a valid PR on a OS repo since the commit message includes an ad and now they're advertising on HN.
I expect to see the entire gamut of possible reactions with a sufficient number of bot PRs. But in looking at 10 of them at random, I didn't find a single "negative response."
(I don't think ignoring it is invalid or wrong by any means, given there's so many reasons one might not engage in a timely manner, or at all, in the issues section or PRs. I don't monitor my repos issues because I just don't feel interested in supporting my code. Feel free to fork or ignore!)
> After creating 69 pull requests the reaction ranged from:
> Annoyance that a bot with no context on their codebase was raising pull requests. A few accepted the bugs were simple enough for a bot to fix and merged the pull request, but a few closed the pull requests and issues without fixing. Fair enough. Open source developers are busy people and are not being paid to interact with what they think it just a bot. We’re not entitled to their limited time.
> Neutral silence. They just merged the PR.
> Gratitude. “thanks” or “good bot”.
I appreciate their self awareness about responses from maintainers.
I like python although I don't use it too often. Would it be unfairly critical of me to say that this is the outcome of a bad design choice? Ideally languages should be designed in a way that a bug like this which is so widespread and easy to create, should be caught via some mechanism, either linting or some part of the process.
The f-strings are a recent (may be not so recent now) addition to the language, so all the errors stem from it being "new" where people's reflexes / carefulness hasn't adjusted to them yet.
I think in addition to the suggestion for linters, updating IDE/editors to incorporate them would help. Syntax highlighting is the primary reason not terminating strings isn't that common of an error anymore, coloring it differently than a normal string might help (or may be it would make things ugly, I don't know).
Compare to Typescript having a very small but significant difference. They use backticks instead of an f-prefix.
I think this minor difference eliminates all confusion of whether concatenating f strings and normal strings propagate. Same when you split an existing f-string in two because it became too long, there is no risk to forget a backtick on the second pet, in the same way you would with f-prefix, because if you do the closing tick doesn’t match the opening.
Linting the existing f-strings is, as shown by this bot, unfortunately very difficult.
To play devils advocate, that's a matter of perspective. There's nothing special about "{text}". It's just characters. One could claim that the default analysis should be "that's fine", with the option to make it special available with the "f". ;)
But, perhaps you're right, and the total number of bugs would be reduced with f-strings, but that would require making them default back in python 1.0.
The linter I use has warnings for things-that-look-like-f-strings on by default. But, some of my projects have f-string like text, so special text to tell the linter to ignore them are required all over the place.
Most developers will require an arsenal of static analysis tools to achieve maximum productivity. Linters are an example of such a tool, but they don't exist as part of the language spec itself, AFAIK.
Maybe catenation of an fString and a string should yield an fString by type promotion? String is morally "any" so it feels to me like a contextual narrowing of type.
mhils|3 years ago
- I'm happy to receive fix-a-typo PRs from human users. In this case the other side demonstrated that they care by putting in a bit of manual effort, and a small PR often paves the way towards larger contributions. I also know that open source beginners get really excited about their first small contributions, and I'm honestly happy to support that.
- In contrast, the marginal effort for bot PRs is ~0. It's very easy to generate a small amount of work for a lot of people, and the nice side effect is that the bot's platform is advertised everywhere. As a maintainer, I have never given consent to this and I have no choice to opt out.
We are very happy users of some GitHub bots, but I feel it needs to be an active adoption decision by the maintainer. If you want to pitch me your service you may send me an unsolicited email, but don't use our public space to advertise your product without asking.
Edit: I don't want to be too harsh to OP here - at least they pointed out a small but valid issue in our case. I very much appreciate their apology at https://news.ycombinator.com/item?id=31210245
TameAntelope|3 years ago
omegalulw|3 years ago
By all means, I am not against having bots identify themselves properly, my point is that "effort from bot PR is ~0", "it advertises their platform" are simply not the right reasons to judge this situation by.
Mo3|3 years ago
You have a public repository on GitHub. You are free to switch it to private, but otherwise this absolutely illogical. No one needs your consent to submit PRs to and highlight a public repository.
This is equivalent to having a website and then getting angry about linking to it, or putting your artwork up for public viewing and then getting angry at someone pointing out a small tear in the fabric.
Actually, now that I think of it, it’s better comparable to someone bringing in a handheld scanner with a company name on it, scanning the artwork and then pointing out the tear.
Which is still totally fine. You’ve given implied consent by making it available to the public. You have decided to make it possible for the public to view it, criticize it and link to it.
Quote Wikipedia,
> Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, *study*, change, and distribute the software and its source code to anyone *and for any purpose*
Case closed.
dataflow|3 years ago
tus666|3 years ago
IshKebab|3 years ago
That said, this does seem like it is a bit more useful. As long as they actually read the changes and make sure they aren't false positives. Which I'm guessing they didn't do for 666 repos.
scotty79|3 years ago
Your repo is public so you can't prevent people and bots from looking into it and having opinions on it (even public ones).
HL33tibCe7|3 years ago
I highly doubt that people believed that f-strings worked this way. Far more likely is that, for example, the expression started as one line, then got split onto two, or some such similar scenario.
johnny_castaway|3 years ago
The variable in the second string gets highlighted (with slightly different color, but still) because it would still work with `str.format()`. GitHub doesn't seem to do this.
[1] https://imgur.com/a/9KGWVG0
badrabbit|3 years ago
groestl|3 years ago
jldugger|3 years ago
gus_massa|3 years ago
> Fix issue probably-meant-fstring found at https://codereview.doctor
I expect a more neutral title for a commit, something like
> Fix fstring in <name-of-file>
Each maintainer/project has their own (weird) rules about titles, and if any other files must log the changes, and regression test, and whatever they like. But I think no maintainer/project expect to see the name of the author in the commit tile.
bvinc|3 years ago
defterGoose|3 years ago
InfiniteRand|3 years ago
racl101|3 years ago
saagarjha|3 years ago
memco|3 years ago
Relatedly the logging optimization suggests setting: raiseExceptions to false for production logging code: where is that set? On the logger, handler or something else?
bobbiechen|3 years ago
aib|3 years ago
I guess it's implicit in that f-strings, as arguments, will be evaluated before the logging function can even run whereas `debug("...", heavy_obj)` will avoid a potentially expensive `str(heavy_obj)` (or whatever the string conversion warrants.)
As for raiseExceptions, I'm not sure it's for optimization. It looks like an old sanity check for bad logging configurations.
nimish|3 years ago
Low value junk like this is not helpful.
readthenotes1|3 years ago
For example, I wonder how many errors would have been found if the definition of a format string was the default? That is, how many times would people have written something like "hello {previously-defined-variable}" and not meant to substitute the value of that previously defined variable at runtime?
MauranKilom|3 years ago
Would you expect that a user input like "{secret} please" is interpolated? If so, we hopefully agree that this would blow major security holes into any python script processing untrusted user input. And if not... Why not?
HL33tibCe7|3 years ago
swatcoder|3 years ago
macspoofing|3 years ago
You can't fix the syntax and standard lib of the language. It is what it is. Similarly, how many bugs would you prevent if Python had compiler support to catch those types of syntax (and type) errors.
Too|3 years ago
Spaces as list separator could also fall into this philosophical question of what makes most sense as string separators. Some times it is super convenient, until you have actual spaces in your string and it becomes a pita.
See also the yaml Norway problem for what happens when implicit goes over explicit.
It generates about the same amount of bugs, if not more, and would also end up with a code-review-doctor suggesting you to use /$ over $. In the end, regardless of syntax, a human always have to make the final call on whether interpolation is wanted or not.
boxed|3 years ago
rikatee|3 years ago
dewey|3 years ago
That link seems to be broken: https://github.com/issues?q=is%3Aissue+author%3Acode-review-...
I was actually surprised to read that people would ignore or be annoyed by a bot raising a valid PR that can be easily merged after a quick glance. What would be the reason for that?
TrickardRixx|3 years ago
[0] https://github.com/PyCQA/pylint/issues/5039
zamadatix|3 years ago
Looking at 1 bot it doesn't sound bad, when you have everyones bot doing this kind of stuff it can quickly become more of a nuisance than a help.
jjoonathan|3 years ago
I'm still open to the idea that bots could be a net win, because most projects really do have heaps of small simple mistakes lying around. I'm sympathetic to the maintainers though. They always seem to get the short end of the stick.
cinntaile|3 years ago
That guy was not happy. I do agree that it's basically advertising and that's annoying.
llbeansandrice|3 years ago
It's basically using open source repos as an advertising platform for their static-analysis bot.
If they want to offer services, they can reach out to the maintainers. This is different than a human opening a valid PR on a OS repo since the commit message includes an ad and now they're advertising on HN.
mhils|3 years ago
Forge36|3 years ago
1. It's hard to explain the impact to the application of the current problem. Thus it looks like a theoretical issue
2. Sometimes people rely on the bug for their code to work
3. Surprise work can be poorly received (ie: not the current priority)
brandonbloom|3 years ago
Waterluvian|3 years ago
(I don't think ignoring it is invalid or wrong by any means, given there's so many reasons one might not engage in a timely manner, or at all, in the issues section or PRs. I don't monitor my repos issues because I just don't feel interested in supporting my code. Feel free to fork or ignore!)
VWWHFSfQ|3 years ago
mjs7231|3 years ago
rikatee|3 years ago
f7fg_u-_h|3 years ago
> Annoyance that a bot with no context on their codebase was raising pull requests. A few accepted the bugs were simple enough for a bot to fix and merged the pull request, but a few closed the pull requests and issues without fixing. Fair enough. Open source developers are busy people and are not being paid to interact with what they think it just a bot. We’re not entitled to their limited time.
> Neutral silence. They just merged the PR.
> Gratitude. “thanks” or “good bot”.
I appreciate their self awareness about responses from maintainers.
malcolmgreaves|3 years ago
rikatee|3 years ago
However, Code Review Doctor is more of a "this MIGHT be a problem. have you considered..." rather than "it wrong"
fareesh|3 years ago
noobermin|3 years ago
I think in addition to the suggestion for linters, updating IDE/editors to incorporate them would help. Syntax highlighting is the primary reason not terminating strings isn't that common of an error anymore, coloring it differently than a normal string might help (or may be it would make things ugly, I don't know).
Too|3 years ago
I think this minor difference eliminates all confusion of whether concatenating f strings and normal strings propagate. Same when you split an existing f-string in two because it became too long, there is no risk to forget a backtick on the second pet, in the same way you would with f-prefix, because if you do the closing tick doesn’t match the opening.
Linting the existing f-strings is, as shown by this bot, unfortunately very difficult.
nomel|3 years ago
But, perhaps you're right, and the total number of bugs would be reduced with f-strings, but that would require making them default back in python 1.0.
The linter I use has warnings for things-that-look-like-f-strings on by default. But, some of my projects have f-string like text, so special text to tell the linter to ignore them are required all over the place.
polio|3 years ago
pabs3|3 years ago
zikohh|3 years ago
ggm|3 years ago
carl_dr|3 years ago
Plus it would lead to subtle security vulns and other bugs. A contrived example :
f”{bot} spammed my repo saying” + “touch(‘{path}’) was wrong lol.”
Now my path var has been disclosed.
SnowflakeOnIce|3 years ago
baisq|3 years ago
safwan|3 years ago
hn_saver|3 years ago
[deleted]
dattboii|3 years ago
[deleted]
bayareabadboy|3 years ago
[deleted]