A coworker once implemented a name validation regex that would reject his own name. It still mystifies me how much convincing it took to get him to make it less strict.
I worked with an office of Germans who insisted that ASCII was sufficient. The German language uses letters that cannot be represented in ASCII.
In fairness, they mostly wanted stuff to be in English, and when necessary, to transliterate German characters into their English counterparts (in German there is a standardised way of doing this), so I can understand why they didn't see it was necessary. I just never understood why I, as the non-German, was forever the one trying to convince them that Germans would probably prefer to use their software in German...
In certain cultures yes. Where I live, you can only select from a central, though frequently updated, list of names when naming your child. So theoretically only (given) names that are on that list can occur.
Family names are not part of this, but maybe that exists too elsewhere. I don't know how people whose name has been given to them before this list was established is handled however.
An alternative method, which is again culture dependent, is to use virtual governmental IDs for this purpose. Whether this is viable in practice I don't know, never implemented such a thing. But just on the surface, should be.
Don't validate names, use transliteration to make them safe for postal services (or whatever). In SQL this is COLLATE, in the command line you can use uconv:
There are of course some people who'll point you to a blog post saying no validation is possible.
However, for every 1 user you get whose full legal name is bob@example.com you'll get 100 users who put their e-mail into the name field by accident
And for every 1 user who wants to be called e.e. cummings you'll get 100 who just didn't reach for the shift key and who actually prefer E.E. Cummings. But you'll also get 100 McCarthys and O'Connors and al-Rahmans who don't need their "wrong" capitalisation "fixed" thank you very much.
Certainly, I think you can quite reasonably say a name should be comprised of between 2 and 75 characters, with no newlines, nulls, emojis, leading or trailing spaces, invalid unicode code points, or angle brackets.
If you just use the {Alphabetic} Unicode character class (100K code points), together with a space, hyphen, and maybe comma, that might get you close. It includes diacritics.
I'm curious if anyone can think of any other non-alphabetic characters used in legal names around the world, in other scripts?
I wondered about numbers, but the most famous example of that has been overturned:
"Originally named X Æ A-12, the child (whom they call X) had to have his name officially changed to X Æ A-Xii in order to align with California laws regarding birth certificates."
(Of course I'm not saying you should do this. It is fun to wonder though.)
throw310822|1 year ago
MrJohz|1 year ago
In fairness, they mostly wanted stuff to be in English, and when necessary, to transliterate German characters into their English counterparts (in German there is a standardised way of doing this), so I can understand why they didn't see it was necessary. I just never understood why I, as the non-German, was forever the one trying to convince them that Germans would probably prefer to use their software in German...
guappa|1 year ago
The coworker who made that happen said I'm a weirdo for setting my machine in my own language. According to him I should have set it to english.
This of course happened in a non english speaking country.
croes|1 year ago
perching_aix|1 year ago
Family names are not part of this, but maybe that exists too elsewhere. I don't know how people whose name has been given to them before this list was established is handled however.
An alternative method, which is again culture dependent, is to use virtual governmental IDs for this purpose. Whether this is viable in practice I don't know, never implemented such a thing. But just on the surface, should be.
armada651|1 year ago
ValentinA23|1 year ago
>echo "'Lódź'" | uconv -f "UTF-8" -t "UTF-8" -x "Latin-ASCII"
>'Lodz'
poizan42|1 year ago
zarzavat|1 year ago
michaelt|1 year ago
However, for every 1 user you get whose full legal name is bob@example.com you'll get 100 users who put their e-mail into the name field by accident
And for every 1 user who wants to be called e.e. cummings you'll get 100 who just didn't reach for the shift key and who actually prefer E.E. Cummings. But you'll also get 100 McCarthys and O'Connors and al-Rahmans who don't need their "wrong" capitalisation "fixed" thank you very much.
Certainly, I think you can quite reasonably say a name should be comprised of between 2 and 75 characters, with no newlines, nulls, emojis, leading or trailing spaces, invalid unicode code points, or angle brackets.
crazygringo|1 year ago
I'm curious if anyone can think of any other non-alphabetic characters used in legal names around the world, in other scripts?
I wondered about numbers, but the most famous example of that has been overturned:
"Originally named X Æ A-12, the child (whom they call X) had to have his name officially changed to X Æ A-Xii in order to align with California laws regarding birth certificates."
(Of course I'm not saying you should do this. It is fun to wonder though.)
gmuslera|1 year ago
nkrisc|1 year ago
barryrandall|1 year ago
majkinetor|1 year ago
rsynnott|1 year ago