top | item 29929752

(no title)

pzb | 4 years ago

A couple of years ago I ran into the same confusion of the "TeletexString"/"T61String" data type in ASN.1. After going down the rabbit hole of what is T.61 and trying to map it to Unicode, I reread the ASN.1 (X.690) spec and realized that the authors never actually referenced T.61. Ever since the first edition of ASN.1 in 1988, those strings have not used T.61. They use a character set that is easily mapped to Unicode - https://www.itscj-ipsj.jp/ir/102.pdf, a subset of US ASCII.

Not to say the rest of the spec is notably better. If fully implemented, it requires supporting escape codes in strings to change character sets. I've never seen valid escape codes in real world data, but it probably exists.

As the original article shows, ASN.1 has lots of other challenges and complexity. Trying to write a code generator that supports all the complexity is no trivial task and the only open source one I've seen only generates C code. Protobuf has the advantage of having modern language support (including multiple type safe and memory safe languages).

discuss

order

jsmith45|4 years ago

Eh... It does have a transitive normative reference to T.61, but only by way of special restrictions on the use of three characters.

T61String is defined in terms of ISO 2022, with the default C0 Character set set to ISO-IR-102 (as you linked). ISO-IR-102 defines the set of graphical characters, but also places a condition on the use of 3 of them by reference to T.61. It also requires that the control character set C0 be set to ISO-IR-106 by default, and ISO-IR-107 for C1.

The net effect is that the default character set of T61String is almost the T.61 character set, except that to get the T.61 character set, you need to include the escape sequence to set G1 to ISO-IR-103. ESC 2/9 7/6

A conforming T61String implementation does need to support the escape sequences and resulting encodings from ISO-IR-6, ISO-IR-87, ISO-IR-102, ISO-IR-103, ISO-IR-106, ISO-IR-107, ISO-IR-126, ISO-IR-144, ISO-IR-150, ISO-IR-153, ISO-IR-156, ISO-IR-164, ISO-IR-165, ISO-IR-168.

Since the control character sets include shift prefixes etc, properly parsing T61Strings into Unicode is non-trivial.

This is actually a pretty good reflection of the complexity in ASN.1. Technically the ASN.1 spec proper only requires that a T61 string support exactly the set of characters specified in the above registrations. It does not mandate any particular format, for them. It is the BER encoding that requires that ISO2022 be used to encode these. A different encoding could specify that all strings are encoded as UTF-8, and the different types are just various subsets of allowed characters.

cryptonector|4 years ago

Heimdal's ASN.1 compiler generates C code. It also generates bytecode with C bindings. Two options.

Also, I've made it generate JSON dumps of the ASN.1 modules. My goal is to eventually replace the C-coded backends that generate C / bytecode with jq-coded backends that can generate C, Java, Rust, etc.