EBCDIC is incompatible with GDPR

[+] CWuestefeld|4 years ago|reply

At the time I moved out of New Jersey 8 years ago, the state was still unable to represent my completely vanilla name on my driver's license. My first name is "Christopher", but their computers can't/couldn't handle an 11-character name. It was always truncated on my driver's license.

This led to problems when they instituted their trusted ID compliance. When renewing the license we were required to provide some combination of documentation to corroborate our identity, and obviously that documentation needs to match the name shown on the driver's license - and of course mine did not.

There was one way out for Christophers like myself. A birth certificate was considered the ultimate truth, so as long as I had a notarized (with the raised seal) birth certificate to prove my identity, they would allow me to renew my license.

The State of New Jersey is very awful at IT. My wife, who works in healthcare finance, told me about problems she was having with the State because - get this - their field for what amounts to "Medicaid ID#" was too narrow, so they had to recycle ID#s for new recipients! And to make that worse, they discarded old backup data so when checking the data for a patient several years ago, it's only possible to find that of the latest owner of ID# 12345.

[+] jjice|4 years ago|reply

The absurdity of not being able to support a name as common as Christopher or anything as long or longer just screams "government work". What the hell went through everyone's head when they built this system? Absolutely no testing or real data was used either, but that shouldn't matter, because having a max limit on someone's (very common) name is honestly impressive. The fact they developed this system without addressing this issue is a testament to the quality of government software development.

I'm sure there is good government software out there, but there are plenty of showcases of the opposite (especially since these are systems that NEED to work).

[+] pickledcods|4 years ago|reply

I have the exact same problem with my European passport. maximum name length, that is total first+middle+surname must be less than 30 characters.

Officially I am not who I am.

[+] adolph|4 years ago|reply

Cue patio11 link:

Falsehoods Programmers Believe About Names

https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-...

[+] Loranubi|4 years ago|reply

Using a ASCII name in a Chinese speaking country is fun too. The majority of Chinese names are 2 to 4 characters long. So even if you have a short English name, it will either get abbreviated to 5 chars or so or just printed over whatever comes next to the name field.

[+] gregw2|4 years ago|reply

Transitioning off a long established core mainframe/AS400 app is not necessarily so easy as just changing to UTF-8 as the article author implies.

If you have no mainframe or enterprise experience to relate to that observation, consider the effort involved to transition from python 2 to (UTF-8 clean) python 3!

That said, I am not even clear from the article which diacritical markings are missing from EBCDIC and if the lawyers arguments to "not change" were legitimate in the way the article implies... you do realize there are hundreds of EBCDIC code pages covering at least all the European languages ... since these are markets which IBM has sold into for 50+ years now, right?

I only learned about EBCDIC code pages when trying to proactively properly setup character encoding handling for data extraction from one of my employer's long running AS400s... "Which EBCDIC?" is not that different a headache from "which extended ASCII code page?"... EBCDIC is not just like 7-bit (non extended) ASCII as the article implies.

[+] toyg|4 years ago|reply

> Transitioning off a long established core mainframe app is not necessarily so easy

I don't think the author implied it was easy, just that it should have been done at some point in the 25 years since the system was first implemented. The last paragraph is just an exortation to use Unicode everywhere all the time, today.

[+] WorldMaker|4 years ago|reply

> "Which EBCDIC?" is not that different a headache from "which extended ASCII code page?"

Sure, but that's still a massive headache. You've probably never had a a headache like needing to switch ASCII or EBCDIC code pages. You generally can't just switch code pages per-record in a file, storing mixed code page data to disk is generally a bad idea, and in some operating systems you can barely switch code pages per application and sometimes need ROM hacks and entire mainframe restarts to switch code pages. (Modern z/OS supports something more like modern Linux locale switching with environment variables before running applications so should at least allow per-application code pages.)

Even if the lowest common denominator code page you choose to run your application in is a full bit or two more than the 7-bit ASCII lowest common denominator a single code page per application is still never going to cover the breadth of UTF-8 without nasty hacks. (That's of course assuming you don't have other problems such as intermediate tools that presume you are only using ASCII compatible EBCDIC subsets of code pages, which may be the case when you've got an eclectic evolution of code accreted around your mainframe apps.)

[+] nemoniac|4 years ago|reply

[deleted]

[+] erk__|4 years ago|reply

Time to break out UTF-EBCDIC!

https://en.wikipedia.org/wiki/UTF-EBCDIC

[+] retrac|4 years ago|reply

In slightly related news, Ontario just last year finally allowed people to use accented characters in their official legal names, birth certificates, and so on. French has been an official language in Ontario for over half a century. The reason it wasn't possible until recently was entirely technical. The systems were limited by ASCII or, yeah, possibly EBCDIC. (I don't have the details.) Still no guidance on how the average government clerk with the very common US-style layout is supposed to type them in, though.

https://news.ontario.ca/en/release/58538/ontario-introduces-...

[+] coldacid|4 years ago|reply

There /is/ a US-International layout which uses both AltGr style and compose style entry of accented characters, although it's not the best. I actually made my own customized version of US-International for Windows in order to support more options for accented characters and certain extended Latin characters used in old and middle English.

[+] gspr|4 years ago|reply

What I don't understand when I hear stories like these is why the hell not just use someone else's solution? Surely neighboring Québec had this sorted out ages ago – why not just duplicated whatever they did? Problem solved in no time.

Going further, I wonder why for example the EU doesn't try to get schemes going that facilitate the copying of IT solutions between member states. Why does every country have to reinvent the wheel?

[+] oldie|4 years ago|reply

Remember all the ghastliness with code pages that sprang up around Ascii, such that systems configured for different languages didn't agree about what characters most code points were supposed to represent? Well, good news: Ebcdic supports that. For example, here's a code page that can represent all the characters you're likely to need in French:

https://en.everybodywiki.com/EBCDIC_297

So, to be unable to represent á, è, ô, ü, ç, etc, the application would have to be locked into not just Ebcdic but also a particular Ebcdic code page that seems unsuited to the locale where the program was running.

Admittedly, an Ebcdic system will have difficulty representing French, Greek and Russian names at the same time, because there's no code page that encodes all the necessary characters.

An application hard-coded to US-Ascii would also be unable to support accented characters, and an application using any one Ascii code page (as opposed to Unicode) would have the same difficulty representing French, Greek and Russian names at the same time. Which is why, in 2021, we don't do that.

[+] tyteen4a03|4 years ago|reply

This ruling is interesting. As a person with names in Chinese, I could technically force my bank to support UTF-8 simply by saying I do not wish to be known as my English name, which is the phonetic spelling of my Chinese one.

Now since I'm Hongkongese where my English legal name is as legal as the Chinese one the law might be different but for Chinese people though...

[+] nroets|4 years ago|reply

Even the Dutch have words that cannot be encoded with in Ebcdic[1]. And I suppose many Dutch have names like André.

https://blogs.transparent.com/dutch/tremas-e-i-u-o-a/

[+] consp|4 years ago|reply

You might be able to but I wonder if you want to. (Considering this is in Western Europe, Belgium) Most of the people will not be able to convert the characters into something they can process, even if they wanted to. While maybe legal, it would speed your processing up a lot to use the phonetic writing in the extended latin character set.

The diacritical marks however have some familiarity and are in common use.

On a sidenote: lots of airlines also have this issue where an accent or other dimark will remove the character completely making your name different from the one in your passport. Could be quite annoying.

edit: thought it was in the Netherlands but it was in Flanders/Belgium.

[+] tdeck|4 years ago|reply

When you write your name in Chinese characters, how do people know whether to pronounce it in Cantonese or Mandarin (or some other Chinese language)? Does that ambiguity ever come up?

[+] Eduard|4 years ago|reply

https://en.wikipedia.org/wiki/Languages_of_the_European_Unio...

[+] caf|4 years ago|reply

Same for those with Arabic, Persian, Korean, Thai, Russian ... names

[+] jan_Inkepa|4 years ago|reply

They don't say what the outcome of the case is? I guess it's still in progress(seems to be 2 years old though)? Really interesting use though!

Edit: ah on the linked wiki article it says:

> The Court of Appeal of Brussels held that, in accordance with Article 16 GDPR, the data subject has the right for their name to be correctly spelled when processed by the computer systems of the Bank

So the plaintiff won, but no word on if/how the bank actually fixed it.

[+] Luc|4 years ago|reply

The lower court ordered the bank to spell the name correctly. The court of appeal upheld this judgement.

Source (Dutch): https://www.gegevensbeschermingsautoriteit.be/publications/a...

This tweet says it was ING Bank: https://twitter.com/simonhania/status/1270812210584043521

[+] markstos|4 years ago|reply

I once worked for a newspaper while they were researching if dead people were voting in the state of Kentucky. The project would compare voter records with those of the deceased. The State responded to one of their open record requests with a a magnetic real about a foot in diameter, which I was tasked with decoding into a spreadsheet.

I took the magnetic reel to college with me that summer and asked around. Turns out they had magnetic tape reader for reels of this size hooked up their VAX system. A friendly sysadmin tried to read the data for me, but it came back has gibberish.

I wasn't surprised. Then he said "Aha! EBCDIC!" I hadn't heard it, but as the reel spoon and the names of the dead spun off the reel, he spun his own yard about this arcane format that was an ancient as the magnetic tape reel I'd brought it.

And yes, there were some dead people voting in Kentucky.

[+] pavel_lishin|4 years ago|reply

When suspected necromancy is afoot, of course you'd need to see a wizard about it.

[+] cesaref|4 years ago|reply

The international banking system is coordinated by the SWIFT network, and all inter-bank messages are encoded in EBCDIC. If you transfer money between countries, or get statements from a broker, chances are it lived in EBCDIC at one point in it's journey.

[+] cupcake-unicorn|4 years ago|reply

Good on this consumer for dragging the bank through this. I'm sure the consumer probably got crap from friends/family about why they were doing this but this is sheer laziness on behalf of the bank and they deserve to be dragged through this to force them to uphold reasonable tech standards for all their customers. Glad that the EU has this option, I'm in the US and would use it more for stuff here :/

[+] jimmaswell|4 years ago|reply

Not spending millions of dollars to appease people who are disproportionately upset over such a minor thing as missing accent marks is sheer laziness to you?

[+] qwerty456127|4 years ago|reply

The first comment saying "TrÃ¨s intÃ©ressant !" looks hilarious in this context. I wonder if it has been made to look like this intentionally or not.

[+] capitainenemo|4 years ago|reply

Certainly looks like a joke to me, especially given all the correctly rendered text, and the various encoding related comments. Was probably rendered like this.

$ echo "très intéressant" | iconv -f iso-8859-1 -t utf-8

trÃ¨s intÃ©ressant

[+] nerdponx|4 years ago|reply

I'm sure it was deliberate. Got a good laugh out of it!

[+] gpderetta|4 years ago|reply

Every problem can be solved with an additional level of indirection. For example use html character entities for characters that are not representable in the DB character set.

[+] caf|4 years ago|reply

And/or rename the existing "Name" field to "Named-based Index Key" and add a new field for Name.

[+] unknown|4 years ago|reply

[deleted]

[+] dhosek|4 years ago|reply

One of the fun things about EBCDIC is that 370 assembler has opcode-level support for converting an EBCDIC-encoded numeric string into an integer (and maybe the other way around too, it's been a while). This is one of two things I remember about my now-ancient 370 assembler knowledge. The other is that there is no built-in support for maintaining a call stack. It is up to each subroutine to handle this and there were some weird declarations around this to indicate whether a subroutine was reentrant, the definition of which escapes me now.

And people shouldn't criticize EBCDIC too much, after all Windows still dumps a lot of crap in legacy 8-bit coding that can cause applications to break (there was a recent post on HN about someone being unable to run the IntelliJ debugger because of an accent in their username). At least EBCDIC is clear about its limitations.¹

⸻⸻⸻

1. I'd be remiss if I didn't point out one other EBCDIC weirdness: It has two vertical bars, | and ¦ which always caused complications in translations between EBCDIC and ASCII. IIRC, ¦ was the more common symbol in EBCDIC coding but some converters wanted to translate | to | instead (or maybe it was the other way around—the last time I did IBM big metal was 30 years ago).

[+] toyg|4 years ago|reply

> there was a recent post on HN about someone being unable to run the IntelliJ debugger because of an accent

That's not Windows, that's JVM weirdness. Using the right calls, this sort of thing has been fine in Windows for some time.

[+] CoastalCoder|4 years ago|reply

Can someone comment on what assumptions those banks are permitted to make regarding names?

E.g., can they assume that names can be expressed as a sequence of (current) Unicode characters with some specific maximum length? Can they assume that names have no leading / trailing spaces?

[+] tialaramex|4 years ago|reply

Probably reasonable assumptions. When you're not sure, assume the standard will be reasonableness, because that's what the law assumes when it isn't specified.

So, you can make reasonable assumptions. What is reasonable will change, which is fine because the way courts figure out what's reasonable in some particular case is to either have the judge decide, or have a jury decide, and people change too.

The nice thing about reasonableness is that you are equipped to make a first pass at judging it yourself, since you are presumably a reasonable person. If you need second guessing, have a team mate consider it, and, if you're worried that your collective idea of "reasonable" might be distorted in an important way, that'll be why your organisation probably encouraged diversity to avoid that.

You might say, this seems awful because it isn't precise enough to say, implement it as a Javascript library. That's true, but intentional. Justice will necessarily involve such judgement calls, and trying to evade that by specifying everything precisely with no room for judgement is a bug not a feature.

[+] PeterisP|4 years ago|reply

I believe that the main assumption they can make is that they can use the name on the ID forms issued by the government or, in case of foreign citizens, their passports. Due to history of international diplomacy, the general standard for passports expects that in addition to whatever script the country uses, they will also include the name of the person in English or French - so this is the key source of the problem, as for passports in e.g. Russian you will get an "English" name that you might use, however, you may get passports with names only in French, so you would have to support the English and French alphabets but perhaps not necessarily any others.

Regarding trailing spaces etc, IMHO the standard would be "as shown in passport" i.e. trailing spaces definitely would not matter, but spaces and punctuation between words would (e.g. D'Artagnan as a name). I looked for but did not find any specific restrictions on name length. In general, the country will have regulations on what they accept as names in their official IDs, and again you may piggyback on other institutions - as long as you accept everything for which your government have issued documents, you should be fine; and if someone has an interesting case that requires changing the process, let that fight happen between them and the government first.

[+] mqus|4 years ago|reply

I think that it has to be reasonable. Assuming that your French-speaking target region has only names without accents is unreasonable. Assuming a maximum length of 200(?) utf8 codepoints(or even bytes) seems reasonable (defendable) in court. Same for leading/trailing spaces.

[+] contravariant|4 years ago|reply

I'm somewhat wondering to what extent a bank is required to support storing the names natively.

I mean something like "${name} spelled with an acute accent on the e" would be technically a correct description even if it is impractical to use. The GDPR does grant you the right to correct your personal information but doesn't specify how this information is represented.

As far as I can tell the GDPR also doesn't grant the customer the right to have their name represented correctly on their bank pass (otherwise everyone with a long surname would require impractically long bank passes), the court only ruled that the inability of the bank to store the name correctly simply isn't an excuse.

[+] mindcrime|4 years ago|reply

I wonder if there are any limits on this from the GDPR perspective? What if my name has 2^40 characters in it? Are companies required to support that? What if I change my name from whatever it is today (say, "Phillip") to a name that has 2^40 characters? Would the bank be required to accommodate that? etc..

[+] N19PEDL2|4 years ago|reply

If the bank still relies on legacy software and IT standards, well it's just its fault. They cannot expect people with diacritics or other non-ASCII characters in their name just to spell it incorrectly because their systems do not support Unicode in the twenty-twenties.

Maybe their IT team had other priorities than replacing EBCDIC with Unicode (or whatever they find more appropriate for their systems), but this is an indicator of poor interest in technological progress by the bank itself. It reminds me some banks that gave millions to Microsoft to keep ATMs running Windows XP after its end of life.

Edit: I elaborated a bit more and I realized that it might be more difficult than just replace the character encoding standard to a more modern one. For example, the name of the account owner likely needs to match exactly the holder name on the credit card associated with the account, and I'm not sure if diacritics can be embossed correctly on the card.

[+] edwinjm|4 years ago|reply

Heh, if you're looking for a good example of Technical Debt…

Yes, already in 1995 Unicode was an established standard (even Windows 95 started to support it). The bank should have known it would be a requirement in the future.

[+] coldacid|4 years ago|reply

Unicode's old enough that Windows NT was built to work with it natively. In fact, all the "ANSI" Windows API calls in NT were just wrapper functions around the Unicode equivalents handling Unicode/code-page conversions. And this was 1993.

[+] theragra|4 years ago|reply

My friend often is having issues with flying, because his name is written as Maksims, and old booking systems think Ms at the end means missis.(he is male)

Crazy shit everywhere in these old systems.

[+] po1nt|4 years ago|reply

Imagine you maintain this system and somebody named X Æ A-XII Musk will try to register.

Jokes aside. I know a person named exactly like me just with a small diacritic difference. I realize they use secondary identifiers but this is identity theft waiting to happen.

[+] gpvos|4 years ago|reply

More articles should have a "Dance" section.

[+] LeoPanthera|4 years ago|reply

I have two middle names and a hyphenated family name. It is too long for US passports. But not too long, for some reason, for my California drivers license.

This makes checking in for flights a nightmare, since the names do not match. Domestic flights are OK, but I cannot checkin online for any international flight, I have to go to the airport to check in.

[+] andrewaylett|4 years ago|reply

Sounds like the perfect use-case for UTF-7? https://en.wikipedia.org/wiki/UTF-7

No, I'm not entirely serious.

[+] krallja|4 years ago|reply

You probably want UTF-EBCDIC instead: https://news.ycombinator.com/item?id=28987256

267 comments