PHP Bug #18556 : Setting locale to 'tr_TR' lowercases class names

[+] Mithrandir|13 years ago|reply

I think this was a good explanation:

"No, the problem results because lowercase i (in most languages) and uppercase I (in most languages) are not actually considered to be the upper/lower variant of the same letter in Turkish. In Turkish, the undotted ı is the lowercase of I, and the dotted İ is the uppercase of i. If you have a class named Image, it will break if the locale is changed to turkish because class_exists() function uses zend_str_tolower(), and changes the case on all classes, because they are supposed to be case insensitive. Someone else above explained it very well:

"class_exists() function uses zend_str_tolower(). zend_str_tolower() uses zend_tolower(). zend_tolower() uses _tolower_l() on Windows and tolower() on other oses. _tolower_l() is not locale aware. tolower() is LC_CTYPE aware."

Edit: Someone else later said the following (I'm wondering if it's true):

"This, practically, can't be fixed. Mainly because there's no way to know if 'I' is uppercase of 'i' or 'ı' since there's not a separate place for Turkish 'I' in code tables. The same holds for 'i' (can't be known if it's lowercase of 'I' or 'İ'). I told 2 years ago and will say it again: PHP should provide a way to turn off case-insensitive function/class name lookup. No good programmer uses this Basic language feature since identifiers are case-sensitive in all real languages like Python, Ruby, C#, Java."

[+] simias|13 years ago|reply

But, why should the locale change the way PHP code is interpreted? Shouldn't LC_ALL="C" when parsing the code?

Maybe it breaks if you embed unicode strings or something. What do other languages do?

[+] dools|13 years ago|reply

If there are so many people depending on PHP and all the code written in PHP in all of Turkey, why doesnt someone in Turkey fix the problem?

Or anywhere for that matter?

There is no "they" in this equation. There is no person who should be held more accountable than you or I for fixing this problem.

The choices are simple:

1) Fix the problem

2) Find a work around

3) Don't use PHP

What's that? There is a lot of open source software that you wanted to use for free that's written in PHP that does just what you need except for this tiny little trivial thing that should be easy to fix? Well too bad!

Trade off the cost of fixing it against the cost of rewriting the big, free, open source package that's written in PHP you wanted to use, in the programming language of your choice, and stop complaining.

[+] simias|13 years ago|reply

It might not be an easy fix if you're not familiar with PHP's guts. It's the kind of fix that can induce a lot of unexpected regressions.

Not wanting to fix a bug because it's not worth the time or risks breaking backward compatibility is perfectly fine by me. But at least take a decision and say something.

If they don't plan on fixing it they should say something like "We believe this is a minor bug that only concerns a small number of users. In order to fix this we'd need to change X, Y and Z and make sure we don't introduce regressions. If you want to try and do it we'll be glad to review your patches. In the meantime you can use this workaround: [...]".

I hate it when I submit a bug report and it's being ignored. You also build a strawman argument with the "lot of open source software that you wanted to use for free". It's a bug and should be fixed (even if the fix is closing the ticket as "wontfix").

[+] gouranga|13 years ago|reply

You are the only reasonable voice on the matter I've heard so far. Infinite upvotes from me.

We all know PHP has its shortcomings, but there appears to be a witch hunt going on here.

[+] slurgfest|13 years ago|reply

The PHP project has ensured that "3) Don't use PHP" is the lowest-cost option. These bugs are not trivial, and there are far too many of them.

It is important that people should be fully aware of the technical liability they are taking on when they adopt PHP for nontrivial projects.

It isn't reasonable to demand that other people fix the huge collection of weird bugs in your project. Particularly when they are not invested in PHP (any more). PHP's bug collection is a strong reason not to invest in PHP (any more). If it is important to you to encourage PHP adoption, then YOU fix the bugs.

I am not wasting my life working around this nonsense because there is no reason why I should have to. There are alternatives which already work correctly.

Don't trade off against the cost of rewriting.

Trade off against the cost of using any of the well-developed alternatives which do not have the same bugs, the same volume of bugs, or the same internal processes which generate and shelter bugs for years on end.

[+] alpb|13 years ago|reply

This is a huge bug. Believe or not, many dev people in Turkey use locale tr_TR (which is perfectly normal) and when they begin to use "any" off-the-shelf PHP library/class with uppercase-I, it does not work at all. A little example, if APC has a class with I, it won't work on your tr_TR configured Windows Server.

PHP is crap. Not even classical ASP had such bugs and it was perfectly passing the Turkey test (http://www.codinghorror.com/blog/2008/03/whats-wrong-with-tu...) and Unicode supporting languages didn't have such a bug. E.g. Java, Python.

PHP is crap. This bug is clearly a WONTFIX, it's been 10 years since it is reported. I remember this bug when I was 14, thank God I moved on to other languages afterwards.

[+] kalleboo|13 years ago|reply

If this is such a dealbreaker for developers in Turkey, why have none of them, in the 10 years this bug has been alive, submitted a patch for it? PHP is open source, it relies on code submissions.

edit: not trolling, just curious. What drives people to complain about specific, well-defined open source bugs without any effort to fix it? I understand hard-to-nail down issues like user experience, but this shouldn't be that hard to plan out and fix independently.

[+] mikeash|13 years ago|reply

Every time an article critical of PHP appears, defenders come out of the woodwork. It's a great language, they say. It's no more flawed than any other language. Critics are just biased. It has problems, but other languages have problems too. People build large apps with PHP, so it must be good.

But come on. This language is complete crap. Code spontaneously fails depending on the locale? And the bug has been open for ten years and still is not fixed? And this is only one bizarre and inexplicable bug out of hundreds, maybe thousands, of bizarre and inexplicable bugs in PHP.

This language isn't defensible. If you want to say that it's worth dealing with the flaws due to the ecosystem, fine, fair enough. But don't tell us that PHP is no worse than any other language. It's far worse.

[+] drostie|13 years ago|reply

Of course, this isn't an article critical of PHP and I'm sure those defenders would just as readily state 'every time PHP appears on Hacker News the PHP-haters come out."

While I tend to agree that I would not use PHP for new projects, I would disagree that it's indefensible. All you need to defend it is, "it's easy." In the sense of, "it's nearby, it's within reach." If it happens to be the language installed on your system, its use is automatically defensible on those grounds alone.

It might not nurture you and love you and cherish you; hell, it may abuse you at times, as any language with idiosyncrasies does. It might even have more idiosyncrasies than other languages do. But those do not make a relationship indefensible -- merely difficult. And in some cases, the difficulty makes the love even more binding -- which is why we still have people who program in low-level languages, for example, even though those have all the more tendency to abuse you for the tiniest mistake you make.

[+] lonnyk|13 years ago|reply

> Code spontaneously fails depending on the locale?

It doesn't spontaneously fail. The languages functions are case-insensitive and they documented this. [1] [2] When you change the locale to Turkish the letters change. Thus, the class name changes and no longer works as expected.

So it is documented because it may not as expected, but it is not spontaneous.

[1] http://www.php.net/manual/en/functions.user-defined.php

[2] https://gist.github.com/3033533

[+] rimantas|13 years ago|reply

  > Code spontaneously fails depending on the locale?

Code spontaneously hangs when converting 2.2250738585072012e-308? Now count how many languages were affected by that.

[+] yuvadam|13 years ago|reply

It's kind of hard not to bash PHP for crap like this.

Yes, the PHP ecosystem is friendly, easy, cheap, etc. etc.

But as a programming language per se... Come on, PHP.

[+] j_col|13 years ago|reply

PHP is a big legacy open source project, worked on by many volunteers whenever they can spare the time, just like any other open source project.

It is wildly successful despite this and many other bugs.

I only wish that the people who spend as much time attacking PHP and it's developers endlessly would instead focus some of that energy into helping to improve PHP, but I guess some of us are just negatively charged.

Sad that we have yet another anti-PHP posting hitting the front of HN in as many days, let the hating re-commence (again)...

[+] samdk|13 years ago|reply

Responses like this to people who don't like PHP are just as bad as the people constantly and loudly attacking it. Neither accomplishes anything other than building animosity.

Your suggestion that people improve PHP instead of attacking it is naive. PHP is, as you said, a big legacy open source project. As a result of that, it's basically impossible to make the extreme, breaking changes that many people (me included) think would be required to make it a reasonable competitor to the existing options. (And the PHP community is not especially inclined to change. It took years for short array syntax to get added to the language. If something as obviously beneficial as that is going to be hotly debated, making real, breaking changes is impossible.)

Faced with the alternatives of trying to radically change PHP (which is, as I said above, impossible) or to use and improve other languages and frameworks, I think the choice is obvious. It was one thing 5-10 years ago when there weren't necessarily good or mature alternatives, but we have many choices now. In my opinion, it makes very little sense to use something with as much extraordinarly painful legacy baggage as PHP unless you have an exceptionally good reason for doing so.

[+] zokier|13 years ago|reply

PHP can not be improved. An "improved PHP" would be completely different language, and probably a rewrite of the codebase. The difference between "improved PHP" and PHP as it stands today would at least as great as the difference between Perl and Perl6.

And if you are going to design a new language (that's what "improved PHP" would be), you have very little to gain in basing your work on PHP. The ecosystem is in the current PHP, and it is as likely to transition to completely different language as it would be to transition to your "improved PHP".

[+] Raticide|13 years ago|reply

I think the people that attack it would rather see it die off than be "improved".

[+] acdha|13 years ago|reply

I stopped using PHP years ago because their open-source community is a broken insider network. After wasting a few bug reports repeatedly arguing with certain core developers who took the position that code and documentation not being in sync wasn't a bug, I quit trying.

[+] mikeash|13 years ago|reply

I'd rather spend my time making good languages better than making awful languages ever so slightly less awful.

[+] gokhan|13 years ago|reply

That's why, for example, .NET world has .ToLowerInvariant() and .ToUpperInvariant() and developers are advised to use it when doing internal stuff. Interpreting / parsing a language is clearly an internal task and shouldn't be affected by locale changes.

[+] TazeTSchnitzel|13 years ago|reply

Yep. .NET also has a nice set of string comparison classes for dictionary lookups to avoid exactly this kind of thing.

[+] Draiken|13 years ago|reply

Unfortunately you can't compare .NET to PHP. Ever.

[+] fmavituna|13 years ago|reply

It's also referred as The Turkey Test : http://www.moserware.com/2008/02/does-your-code-pass-turkey-...

http://www.codinghorror.com/blog/2008/03/whats-wrong-with-tu...

[+] celalo|13 years ago|reply

I is not capital of i in Turkish. Instead, İ is capital of i and I is capital of ı. They are two different letters.

[+] Raticide|13 years ago|reply

No other language has this problem. The locale is irrelevant. The class name is just a series of bytes; it shouldn't need to transform the case.

[+] endtime|13 years ago|reply

And this should affect class names why?

[+] patio11|13 years ago|reply

The situation is not helped by the frequent OSS community suggestion: "Just patch Turkish." while mumbling "Bloody non-ASCII ingrates."

[+] robryan|13 years ago|reply

What was the advantage of case insensitive class and function names? Sounds to me like someone that was implementing very early on without great reasons and them kept for backwards comparability. In all my programming in PHP I have never thought to take advantage of this.

[+] RobAley|13 years ago|reply

I'm assuming the original reason is lost in the mists of time, but one advantage it has when calling/using external/3rd party code is in style conventions. If in my code my convention is to use functionNames but in yours you use functionames or FunctionNames, I can still code in my style after include()ing your file. A small advantage, granted.

[+] meepmorp|13 years ago|reply

> What was the advantage of case insensitive class and function names?

The programmer can be sloppy/lazy and still have thing turn out largely as expected. If you're just learning how to program, this makes it a bit easier, since a whole class of possible problems goes away.

[+] viraptor|13 years ago|reply

I don't understand what's the problem with fixing this really. I would completely agree that making "Info" and "info" class names compatible is "not fixable", but what is the problem in making "Info" work if both the definition and usage are the same case? The bug says that this is exactly backwards - mixed case works, but same case doesn't.

The only way to make it not work is to first change the case in one locale and then case-insensitive compare it in another locale. Why would this kind of operation ever happen? Any sane situation should "just work":

- in declaration convert to lower-case and save, in usage convert to lower-case and lookup -> has to work

- in declaration save original, in usage search all classes with case-insensitive compare -> has to work

How was that bug ever created in the first place? I get the fact that "I" doesn't match to lower-case "i" in tr_TR, but why does it matter when comparing strings which should be equal? Just be consistent in how both the declarations and usages are converted...

[+] unknown|13 years ago|reply

[deleted]

[+] robryan|13 years ago|reply

It is likely no one who regularly commits to PHP is effected by this issue and it requires non trivial changes to the way PHP works to fix.

Granted given the usual pragmatism of PHP someone should have just hacked something in by now.

[+] cocoflunchy|13 years ago|reply

I'm looking forward to the tenth birthday of this bug... Only 24 days to go !

[+] unknown|13 years ago|reply

[deleted]

[+] dkhenry|13 years ago|reply

This is my biggest problem with PHP. Aside from poor language construction , and the plethora of poorly written code the core language has lots of problems in it. When upgrading to PHP 5.4.3 I found six or seven show stopper bugs in PHP and some of its extensions ( one of which has never worked ). I am still waiting on the fix to one of them. https://bugs.php.net/bug.php?id=62302

[+] jister|13 years ago|reply

If this is a known bug and it's been there for 10 years then why the hell did the developer STILL chose to use PHP in the first place?

[+] Draiken|13 years ago|reply

Unfortunately on legacy systems someone else chose PHP for him a long time ago... Poor developer that has to deal with this stuff. Been there.

[+] kitsune_|13 years ago|reply

Words fail me.

[+] dasil003|13 years ago|reply

I know it seems insane, but Turkish capitalization is not fun to work with as a programmer. When they latinized the alphabet 100 years ago or so, they were short on vowels and so it must have seemed pretty clever and convenient to make i and I separate letters with İ and ı respective case pairs. From a western programmers perspective though it's one of the worst unicode special cases owing to its combined unexpectedness and commonness.

Just as an example, text-transform: uppercase has been broken in Turkish for all major browsers until I believe Firefox finally fixed it late last year, after having a bug open for nearly a decade.

[+] ebiester|13 years ago|reply

It's funny, the first thing I thought is "someone was having trouble with the turkish I and tried a hackaround, and now it's unfixable."

I blame Atatürk. If I had a time machine, I'd skip killing Hitler and travel back to the language reform time. "Do you know how much trouble this is going to cause us? Reuse the X, make one a dotted e. I don't care, this is going to fuck everything up!"

127 comments