top | item 10815821

The “Chad” bug

220 points| mrb | 10 years ago |plus.google.com | reply

153 comments

order
[+] grardb|10 years ago|reply
This is remarkable. I always find it interesting when bugs like this occur.

It reminds me of a hackathon I attended where a food ordering startup (I forget the name, but they were chosen to feed us dinner that night) had a similar bug, which baffled me beyond belief. Without going into crazy detail about my password, it typically follows a certain pattern but is never the same across websites. For some reason, the website kept saying my password was invalid. It met all the password requirements that the website asked for (length, capital letter, etc.).

I forget the exact details, but it ended up being the exact location of a capital letter, the location of a number, or some combination of both. I could never figure out how a bug like that could even be coded up. My best guess is that it was some poorly-formed regex.

> Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

[+] Animats|10 years ago|reply
"Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems."

Yes. My favorite was a Coyote Point load balancer bug. If the last character of the HTTP options is "m", the connection will not get past the load balancer.[1] I found this because a web crawler was having trouble with one site. Fortunately, I knew someone with their own Coyote Point load balancer, and was able to establish that the connection went into the load balancer and never came out.

The load balancer has a big file of rules which contain regular expressions. Somewhere, I think there's a "\m" where they meant "\n". Reporting this to the vendor, along with a Python program to demonstrate the problem, was of course futile; they suggested "upgrading the software". I demonstrated that the bug existed on their own load balancer on their own site. I finally added a completely useless field to the HTTP header so that the last character was not "m".

[1] https://www.webmasterworld.com/webmaster_hardware/3312997.ht...

[+] 0xcde4c3db|10 years ago|reply
Besides the usual regex aches and pains, the grammar for email addresses is far more complex than most people realize. According to a highly-voted Stack Overflow answer [1], the current RFC-specified grammar for addresses can't even be matched with regex alone. Combining the edge cases of the grammar with (say) Unicode normalization sounds like a recipe for hours of fun.

[1] https://stackoverflow.com/questions/201323/using-a-regular-e...

[+] annnnd|10 years ago|reply
> Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

Great quote. However, I think regexes got a bad reputation just because of the way people use them. In essence they are a pretty reliable way of parsing because the parsing engine is well tested. But the expression should be kept as simple as possible and developer should avoid using any nonstandard / nonexplicit extensions. I even avoid using \w because, well, what IS a word character? I am sure it is defined somewhere... but I'll always use explicit form (like "[a-zA-Z]" when I want ASCII chars) instead.

Anyway, if you use the form as used in the regex puzzle [0], you'll be fine. As long as you use regex only for what it was meant for, of course [1]...

[0] https://news.ycombinator.com/item?id=10787509

[1] http://blog.codinghorror.com/parsing-html-the-cthulhu-way/

[+] cookiecaper|10 years ago|reply
I find a lot of websites will eat passwords that contain special characters. I don't mean that they'll tell you it doesn't match the password policy, I mean that they'll accept the password and then tell you the password is wrong when you come back to sign in. I eventually had to teach my password generator to use only a few usually-properly-handled special characters when generating to avoid the hassle of having to reset the password every time. The same thing is often true of long passwords -- websites will accept the password at the UI level, but it probably gets truncated somewhere in the processing, and you don't know which character you got cut off at, so you have to reset to something shorter.
[+] raverbashing|10 years ago|reply
From what I've seen from "regular people" writing regular expressions, they seem to not have the slightest clue on how to do it

And then putting it into the program without testing it properly

So, sorry, the issue is not regexes, but people just going for it at an "trial and error" fashion (and sometimes just trial)

[+] akerl_|10 years ago|reply
It should be common knowledge at this point, but just in case:

If you're doing regex or any other text manipulation on user input when you ask them to set a password, you're doing it wrong.

[+] ifdefdebug|10 years ago|reply
> Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

Yeah sure. I think I heard that quote before... just about a million times?

People expect regex to be an easy-to-use tool. Well it's not, and it's a foot gun if you don't take your time to learn it right. But no, people hack up some expressions, hit their feet and blame... the tool of course, not themselves.

Just learn it right, it's a great tool if you know how to make it work for you :)

[+] noonespecial|10 years ago|reply
Regex's in perl5 are what introduced me to test automation. Hard.
[+] raldi|10 years ago|reply
I bet there's a hashtable involved somewhere, and Chad's address just happens to hash to, like, 0x00000, and it turns out when that happens, there's a bug.

As a workaround, I bet you can use CHAD@... or chad+blah@...

[+] mrb|10 years ago|reply
A hashtable bug seems possible.

(Hangouts Dialer still does not see him if saved as CHAD@. It sees him when saved as chad+bla@ but it's annoying because then his email is wrong in my contact list as his email provider does not support + aliases.)

[+] edent|10 years ago|reply
It's a pity there's no way to report bugs like this to Google.

The only way I've found of getting anything resolved is to forward issues to a friend inside the company, or hope that you can write a blog post which gets enough attention.

I get that filtering and testing millions of random bug reports from all corners of the Internet is hard - but it's a problem which Google desperately needs to solve if it wants to retain the trust of its users.

[+] asuffield|10 years ago|reply
(Tedious disclaimer: not speaking for anybody else, my opinion only, etc. I'm an SRE at Google.)

> It's a pity there's no way to report bugs like this to Google.

This is a popular myth.

General instructions are here: https://www.google.com/tools/feedback/intl/en/

In this particular case, it's an android app, so what you do is tap on the hamburger menu, hit "help and feedback", then "send feedback".

[+] chris_wot|10 years ago|reply
LibreOffice, RedHat, Debian, Canonical and Mozilla can do it. This is not a particularly hard problem to solve.
[+] packetized|10 years ago|reply
I wonder if this is related to i18n or country lookup. Chad is the only semi-common English-language name that's also a country name, that I can think of.
[+] benplumley|10 years ago|reply
Jordan, Georgia. I feel like if this were the cause then the bug would be a lot more common.

My guess is the dialler hashes some parts of the contact to get a UUID, but for this contact it happens to be outside the range the dialler can look at - perhaps off-by-one, where the dialler looks for UUIDs of 1 and above and this happens to hash to 0.

[+] stevoski|10 years ago|reply
My niece is called "Ireland".

Continents too: I met a girl called "Africa", and "Asia" is certainly used as a name.

However I don't ever expect to meet someone called "Democratic People's Republic of Korea"

[+] NLips|10 years ago|reply
There are also:

  India
  Georgia
  Jordan
I'd guess they are all as common or more common than Chad.
[+] ryporter|10 years ago|reply
Somewhat similarly, I encountered a possible bug in Google Docs many years back. I was reorganizing my documents, and I temporarily changed one of the names to "delete". Poof -- I could not longer find it anywhere (or even search for words that I knew were in it). I forget how I got back to it (maybe via my browser history), but I changed the name a bit, and then the document was "found".

This could have simply been a race case unrelated to the filename, but it's much more amusing to speculate that it was due to hack introduced during development. I now regret not trying to reproduce it, but I was pretty frustrated after I found my document again. I did contact support, but didn't hear anything back.

[+] nbakshi|10 years ago|reply
This reminds me my favorite name while testing: "McNulla". I have seen quite a few webforms which had a regex to remove any NULL string, because of which it would not take this name as it has a "Null" string in the name.
[+] EvanAnderson|10 years ago|reply
I knew a family w/ the last name of "Null" from high school. I wonder, from time to time, if they have suboptimal experiences using the 'net.
[+] PhasmaFelis|10 years ago|reply
> I exported the contacts and looked at the raw Google CSV data. One of the 2 problematic contacts had a whitespace character at the end of its phone number. I removed it. Bingo, Dialer can now find it!

This is kind of horrifying. Google being tripped up by trailing whitespace?

[+] timberburn|10 years ago|reply
When I was setting up an account on Comcast's website, I was consistently getting an nondescript internal server error when submitting the form.

Took me quite awhile and many failed attempts to find that Comcast will throw an error when your requested username contains "comcast".

[+] josegonzalez|10 years ago|reply
Same for ConEd.

Moneygram will freeze payments with the word "moneygram" in the associated email. Which is great for those of us that use catch-all emails and use the email address to discern what to do with an email...

[+] rincebrain|10 years ago|reply
I've had a number of fun failure modes like that.

My current favorite two include when the change password form permitted longer passwords than the login page, and one where the change password form happily allowed special characters, but if there was e.g. a semicolon in the password, submitting it from the login page would throw a SQL error.

[+] hidroto|10 years ago|reply
i wonder if that is to stop people from using names like comcastSucks or worse.
[+] incepted|10 years ago|reply
> Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

jwz's wit was a lot of fun in the early 2000s but that quote is too often used wrong.

This quote is not about regexps, it's about using wrong tools for the job. Using it without context makes it sound dumb. "Well, what if regexp is exactly the right solution for that problem?".

[+] lmm|10 years ago|reply
A regexp is almost always the wrong solution. It's a way of representing a finite state machine that obscures the states, which are the only valuable part of the state machine abstraction (they're inherently incomprehensible otherwise). And most implementations these days have random extensions, meaning you have all the performance and safety issues of a turing-complete programming language - but a much worse UX. They may have made sense in the days of ed and the teletype, when a terse incomprehensible expression was better than a slightly longer readable one, but they don't now.
[+] abhishekash|10 years ago|reply
Do people with other android version or the phone make face the same issue while using this email id ?
[+] frik|10 years ago|reply
Many sites don't support the plus in email addresses ("+" = comment, supported e.g. by GMail). Not so funny if the register process works but the login or password reset features are broken.

Example: a site let me register and login with the plus. But resetting the password was hard, I had to escape the plus to get it working.

[+] db48x|10 years ago|reply
Is that "chad@" or "chаd@" (homoglyphs)?
[+] mrb|10 years ago|reply
No homoglyphs.
[+] Gravityloss|10 years ago|reply
So, when computing power increases, we just add useless parsing at every level of software, decreasing performance and causing bugs like these.
[+] GotAnyMegadeth|10 years ago|reply
I also see this with one of my brothers' names which is Olly <surname>.
[+] tyingq|10 years ago|reply
A potential workaround...quote the local part:

"chad"@example.com

It seems to be supported by Exchange, gmail, and a few other MTA's I tested, and gets routed to the right place.

[+] Eyas|10 years ago|reply
I wonder if this is the only case where the name/first name of the contact is exactly the same as the recipient's address.
[+] tmaly|10 years ago|reply
regex bugs are bad, I have been bitten by one before. Its pure technical debt