top | item 45722740

(no title)

flymasterv | 4 months ago

It states the cargo culted reasons, but not the actual truth.

1) Pronounciation is either solved by a) automatic language detection, or b) doesn't matter. If I am reading a book, and I see text in a language I recognize, I will pronounce it correctly, just like the screen reader will. If I see text in a language I don't recognize, I won't pronounce it correctly, and neither will the screen reader. There's no benefit to my screen reader pronouncing Hungarian correctly to me, a person who doesn't speak Hungarian. On the off chance that the screen reader gets it wrong, even though I do speak Hungarian, I can certainly tell that I'm hearing english-pronounced hungarian. But there's no reason that the screen reader will get it wrong, because "Mit csináljunk, hogy boldogok legyünk?" isn't ambiguous. It's just simply Hungarian, and if I have a Hungarian screen reader installed, it's trivial to figure that out.

2) Again, if you can translate it, you already know what language it is in. If you don't know what language it is in, then you can't read it from a book, either.

3) See above. Locale is mildly useful, but the example linked in the article was strictly language, and spell checking will either a) fail, in the case of en-US/en-UK, or b) be obvious, in the case of 1) above.

The lang attribute adds nothing to the process.

discuss

order

bilkow|4 months ago

Your whole comment assumes language identification is both trivial and fail-safe. It is neither and it can get worse if you consider e.g. cases where the page has different elements in different languages, different languages that are similar.

Even if language identification was very simple, you're still putting the burden on the user's tools to identify something the writer already knew.

flymasterv|4 months ago

Language detection (where “language”== one of the 200 languages that are actually used), IS trivial, given a paragraph of text.

And the fact is that the author of the web page doesn’t know the language of the content, if there’s anything user contributed. Should you have to label every comment on HN as “English”? That’s a huge burden on literally every internet user. Other written language has never specified its language. Imposing data-entry requirements on humans to satisfy a computer is never the ideal solution.

janwillemb|4 months ago

This comment contains a few logical fallacies.

> It states the cargo culted reasons, but not the actual truth

This dismisses existing explanations without engaging with the mentioned reasons. The following text then doesn't provide any arguments for this.

> Pronunciation is either solved by a) automatic language detection, or b) doesn't matter.

There are more possibilities than a and b. For example, it may matter for other things than pronunciation only. Also it may improve automatic detection or make automatic detection superfluous.

> If I am reading a book [...] I will pronounce it correctly, just like the screen reader will. If I see text in a language I don't recognize, I won't pronounce it correctly, and neither will the screen reader.

A generalization of your own experience to all users and systems. Screen readers aim to convey information accessibly, not mirror human ignorance.

> There's no reason that the screen reader will get it wrong, because <hungarian sentence> isn't ambiguous

This is circular reasoning. The statement is based on the assumption that automatic detection is always accurate - which is precisely what is under debate.

> If you can translate it, you already know what language it is in.

This a non sequitur. Even if someone can translate text, that doesn't mean software or search engines can automatically identify that language.

> The lang attribute adds nothing to the proces.

This absolute claim adds nothing to the logic.