Reasons to Not Parse Localized Strings

OptionOfT|1 year ago

On Windows one can change how a date is rendered, without changing the locale. I need to look up if this is propagated to browsers.

Also, I hate DOB selectors which don't allow me to manually enter the date, and default to today, and don't have a year << arrow. Only month.

Now I need to click at least (age - 1) * 12 on the < arrow.

In general, I wish more websites would use native date / number / dropdown pickers.

Workday is the worst offender here.

toast0|1 year ago

Try clicking the year number and the month name. Those often show a pop up with less clicking required to get to where you want.

1718627440|1 year ago

Is that analogous to setting LC_TIME, per application or temporary?

frizlab|1 year ago

> On Windows one can change how a date is rendered

I think that’s true on all OSes

teddyh|1 year ago

”People whose date formats break my system are weird outliers. They should have had solid, acceptable formats, like 平成10年8月1日.”

(With apologies to patio11.)

RainyDayTmrw|1 year ago

For context, in case anyone needs, that's a common date format in Japan. Aside from using kanji characters, the big surprise to most of the rest of the world is that the largest epoch is specified as a royal era name[1], corresponding to the Japanese monarchy.

This parallels, and the remark about patio11 refers, to this article[2], which has since become famous on HN. It ends with a similar remark from the author's prior experience as an American expatriate in a less populous and less cosmopolitan part of Japan, when a clerk remarked that Patrick McKenzie was a troublesome name to have in Japan, and why didn't he change it to something convenient and ordinary like Tanaka Taro[3].

This has since become HN folklore.

[1] https://en.wikipedia.org/wiki/Japanese_era_name [2] https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-... [3] https://news.ycombinator.com/item?id=6145768

legulere|1 year ago

Microsoft Excel is the worst offender here. When you're on a locale with , as a decimal point it's not able to read CSVs with . as a decimal point. It uses ; instead of , as a field separator.

Delegating parsing user input is a good idea, but sometimes the input methods you can rely on just don't cut it.

By the way: The international way to express a decimal separator is a (thin non-breaking) space. There's no misunderstanding possible.

LegionMammal978|1 year ago

> By the way: The international way to express a decimal separator is a (thin non-breaking) space. There's no misunderstanding possible.

According to whom? The CGPM recommends thin spaces as thousands separators, and either points or commas as decimal separators. NIST, ISO, etc. generally copy this, sometimes stipulating the decimal separator as one or the other.

layer8|1 year ago

This is a relic from the olden times were application data was rarely exchanged across locales, and people expected software to conform to the local conventions (and they largely still expect it). Microsoft never changed this because it would have broken (and would still break) a vast number of systems and workflows.

stefs|1 year ago

> decimal separator is a (thin non-breaking) space

i really hope you mean thousands separator ...

croes|1 year ago

If you import the data from a file you can select the separator

cytocync|1 year ago

Take a moment to know that it’s * Crucial* not to localize strings in health software services because it can lead to data leaks and performance degradation. It’s better to work with global APIs so you’re protected from all sorts of risks.

RadiozRadioz|1 year ago

Performance degradation?

Retr0id|1 year ago

> Parsing Is Not a Science

It can and should be, though. I feel like we should have a separate word for parsing when the rules are not well-defined - something like "fuzzy parsing" (in a similar vein to fuzzy string comparison)

eyelidlessness|1 year ago

Renaming the problem doesn’t make it go away. It might be useful for identifying the subset of parsing which is problematic, but I think the article already achieves this well by specifying the subset of input under discussion.

pwdisswordfishz|1 year ago

It's "scraping".

xigoi|1 year ago

It’s called “guessing”.

teddyh|1 year ago

OT: Why does almost every comment in this thread currently say “2 hours ago”, when they were probably written when this story was first featured, about 3 days ago?

tmiku|1 year ago

Hovering over the time-ago item on the comment header displays the exact post time, and interestingly it shows times from Feb 16 (3 days ago) for many of the "2 hours ago" comments. Must be an artifact of some moderation tool.

unknown|1 year ago

[deleted]

Waterluvian|1 year ago

How do people who use commas as decimals disambiguate 1,004 and 1.004 without changing the precision implied by number of decimal places?

yndoendo|1 year ago

You are trying to apply what you know versus what others know. No different than Farenheit vs Celsius or Yard vs Meter.

Personal, the MM/DD/YYYY format, that is stander in the USA, needs to die and be replaced with YYYY-MM-DD.

Same with 12 hour time and replacing it with 24 hour. As the saying goes l, Americans use am and pm because they can't count past 12. AM and PM are a waste of code and display area. What fits in 2 characters takes up 5 characters.

trinix912|1 year ago

It depends on the local norms. Where I live, 1,004 is decimal, 1 004 or 1'004 is 1004 which makes it even more clear than the en-US default. That is, the 1.004 variant is never used, and if it is, it is assumed to be a decimal (misspelling) of 1,004.

OptionOfT|1 year ago

You don't. It's ambiguous. Just like the string 01/03/2025 is if you don't know the source's locale.

But it can be worse. Los Angeles, Sunday, November 2, 2025, 2:00:00 am is ambiguous. Is it PST or PDT?

munch117|1 year ago

What I used to do is set the thousands separator to ' in the operating system settings. That made Excel read CSV files with 1,004 and 1.004 the same, as one and four thousands. No one puts thousands separators in CSV files anyway, so that worked out. And it looked nice too.

In today's Windows 11 I can't find that setting. You can't set the thousands separator separately, not anywhere that I can find. It's a tragedy. I see Excel misreading CSV files all the time. I don't use Excel that much myself and I understand what's going on, so it doesn't affect me all that much directly, but for my Excel warrior colleagues, it's another matter.

alkonaut|1 year ago

How do people who use periods disambiguate it? It’s simply ambiguous without context.

toast0|1 year ago

I mean, if that's how you write numbers, it's the same as disambiguating 1,004 and 1.004 if you use EN-us norms.

Which is to say, you assume it's as written, unless context suggests otherwise.

ValleZ|1 year ago

The same way as people who use periods. 1,004.004

47 comments