I learnt C, more than 20 years ago, from the book The C Programming Language written by Brian W. Kernighan and Dennis M. Ritchie, also known as K&R. I read the book almost cover to cover all the way from the preface at the beginning to its three appendices at the end while solving all the exercises that each chapter presented. As someone who knew very little about programming languages back then, this book was formative in my journey of becoming a programmer.
Appendix A (Reference Manual) of the book broadened my outlook on programming languages by providing me a glimpse of what goes into formally specifying a programming language. Section A.12 (Preprocessing) of this appendix specifies trigraph sequences. Quoting from the section:
> Preprocessing itself takes place in several logically successive phases that may, in a particular implementation, be condensed.
> 1. First, trigraph sequences as described in Par.A.12.1 are replaced by their equivalents. Should the operating system environment require it, newline characters are introduced between the lines of the source file.
Then section A.12.1 (Trigraph Sequences) further elaborates trigraph sequences in more detail. Quoting this section below:
> The character set of C source programs is contained within seven-bit ASCII, but is a superset of the ISO 646-1983 Invariant Code Set. In order to enable programs to be represented in the reduced set, all occurrences of the following trigraph sequences are replaced by the corresponding single character. This replacement occurs before any other processing.
I learnt it a few years before that, and I remember how reading K&R once I got it felt like having someone turn UP the lights, open the blinds, wash the windows and basically TURN UP THE SUN compared to things I read before. So much clarity.
Number of times I've seen trigraphs in "real code": still zero. I hope it's the same for you.
In 1990 IBM donated a 9370 computer to our university. The default code page for German EBCDIC did not support square brackets.
I don't remember whether trigraphs were not supported by the compiler at the time or whether we just wanted to avoid completely unreadable code. Not experienced in VM/370 administration we spent weeks to modify the system to use some international EBCDIC codepage.
The system never saw much use, everybody preferred Unix workstations where programming in C was a natural thing.
I, too, learned C by reading K&R cover to cover and solving all the exercises (in front of a Sun 3/160 running SunOS 3.5-ish). Even then back in those ancient days, it was obvious trigraphs were evil and should have been abolished to a special place in hell.
Fantastic book. I used Learn C in 21 days and that is what started everything for me. I had a second book on Linux administration and installed Slackware from 1.44mb disks, ultimately setting up pppd and using Mosaic.
I've pasted it here for convenience (formatting fixed, thanks child comment!):
// Are you there god??/
??=define _(please, help)
??=define _____(i,m, v,e,r,y) r%:%:m
??=define ____ _____(a,f,r,a,i,d)
main(__)<%____(!_(-~-??-((-~-??-!__<<-
??-!!__)<<-??-(!!__<<!!__))+-~-~-??--~-~
-~-~-~-~-??-(-~-~-~-~-??-!!__<<-~!!__),-
??-!__))<%??>%>_(__,___)??<____
(printf("please let me die??/r%d bottle%s"
" of bee%s""""??/n",(!(___
%-~-~!!___))?--__+!___++:__+!___++,!(__-!!___)
&&___%-~-~!!___??!??!!(___%-~-~!!___??!??!__
-(-~!!___))?"":"s",___%-~-??-!!___<-??-!!___?
"r on the wall":"eeeeeeer! Take one down,pass ??/
it around")&&__&&_(__,___),"mercy I'm in pain")??<??>??>
But trigraphs have gotten old even for IOCCC. In the guidelines for recent years, they specifically mention "We tend to dislike programs that ... obfuscate by excessive use of ANSI tri-graphs":
https://www.ioccc.org/2020/guidelines.txt
Note that this uses not only trigraphs but also digraphs (here `<%`, `%>` and `%:`), which are similar to trigraphs in intended usages but behave much differently to digraphs in that it is a proper token and not a preprocessor substitution pattern. `printf("??(foo??)<:bar:>%c", "quux"<:1:>)` prints `[foo]<:bar:>u`, for example. Therefore digraphs are deemed less dangerous (however obscure) than trigraphs and do not require any compiler options.
And sort of the opposite of that, I once had someone say they wanted to contribute to the C++ portion of our codebase, but the only problem was they didn't know how to make the "->" character, and did they need to get a special keyboard?
> Almost every country needed an adapted version of ASCII, since ASCII suited the needs of only the US and a few other countries. For example, Canada had its own version that supported French characters.
> Many other countries developed variants of ASCII to include non-English letters (e.g. é, ñ, ß, Ł), currency symbols (e.g. £, ¥), etc. See also YUSCII (Yugoslavia).
> It would share most characters in common, but assign other locally useful characters to several code points reserved for "national use". […]
> Because the bracket and brace characters of ASCII were assigned to "national use" code points that were used for accented letters in other national variants of ISO/IEC 646, a German, French, or Swedish, etc. programmer using their national variant of ISO/IEC 646, rather than ASCII, had to write, and, thus, read, something such as
ä aÄiÜ = 'Ön'; ü
instead of
{ a[i] = '\n'; }
> C trigraphs were created to solve this problem for ANSI C, although their late introduction and inconsistent implementation in compilers limited their use. Many programmers kept their computers on US-ASCII, so plain-text in Swedish, German etc. (for example, in e-mail or Usenet) contained "{, }" and similar variants in the middle of words, something those programmers got used to. For example, a Swedish programmer mailing another programmer asking if they should go for lunch, could get "N{ jag har sm|rg}sar" as the answer, which should be "Nä jag har smörgåsar" meaning "No I've got sandwiches".
One of the challenges of | is that it was never entirely clear whether the ASCII | should be equivalent to EBCDIC’s | or ¦. As I recall, Waterloo C wanted ¦ as its vertical bar character, although I could be wrong. On the IBM system that I used back in the 80s, we had ASCII terminals which were run through a muxer to the actual system (which was part of the magic that allowed it to have thousands of concurrent users all getting real-time access—a lot of UI was offloaded to these systems which were essentially minicomputers on their own).
Great article (that appeared on HN somewhat recently) from Ken Shirrif on the history display terminals, and a great photo of the IBM 2848 Display Controller.
The next-gen was far more common.. The IBM 3270 terminal hooked to a local controller that talked to the mainframe. Could also hook a printer to the controller, you could print screen and simple forms independently from the mainframe.
You know all this, but I've always thought it was cool, and try to refresh my understanding of the setup. I no doubt have many details wrong.
There's also iso646.h which allows you to do some particularly python looking stuff:
#include <iso646.h>
#include <stdbool.h>
#include <stdio.h>
#define is ==
bool is_whitespace(int c) {
if (c is ' ' or c is '\n' or c is '\t') {
return true;
}
return false;
}
int main() {
int current, previous;
bool in_word;
while ((current = getchar()) not_eq EOF) {
if (is_whitespace(current) and not is_whitespace(previous)) {
putchar('\n');
} else {
putchar(current);
}
previous = current;
}
return 0;
}
I think the only remaining purpose for trigraphs is when you are at the very end of a C interview, and your amazing candidate has answered every question perfectly, and you just have to find something they might not know about--only then do you reach for the trigraphs.
This reminds me of a comment on a Python discussion >2 years ago, of which I think often:
"Whether it's computer languages or human ones, as soon as you get into a discussion about the correct parsing of a statement, you've lost and need to rewrite in a way that's unambiguous. Too many people pride themselves on knowing more or less obscure rules and, honestly, no one else cares."
Completely agree with that. In fact, it's the first thing I thought of when I saw the code snippet in question. Even if you replace the trigraph with the regular || operator, it's still hard to read that piece of code. Syntactic sugars and short circuits are cool and all but most of the time they have no place in production code that's meant to be read by other developers.
I'd say, "Congratulations! You're one of today's luck 10,000!", but trigraphs aren't really much fun. Just another reminder that C is old, and computing is even older.
I've used uppercase-only terminals, and I've used ancient C, but not at the same time.
Years ago I wrote a perfectly reasonable comment like /* WTF??!?!!?!???? */ and the old C compiler complained about "invalid trigraph". A syntax error in the middle of a comment!
Took me a while to figure out that "trigraph" was referring to some part of "??!?!!?!????" and not "WTF".
There are two aspects to this, the trigraph, and using the short circuiting behaviour of the binary logic operator for control flow.
The latter is a very common idiom in Julia code, which I found obscure and puerile at first (“look how smart I am”), but have come to appreciate as concise and natural by now.
For example:
function fact(n::Int)
n >= 0 || error("n must be non-negative")
n == 0 && return 1
n * fact(n-1)
end
The instructor at the branch college where I learned C++ in the late 90's taught us that those were the preferred operators and that the old operators belonged in the wastebasket of history along with printf and str* functions.
It made for some amusing group projects when I got to university, when classmates had never seen those operators and were trying to figure out where they were coming from and why I would write such silly things. I trolled them by replacing all my brackets with `begin` and `end` in the next assignment before moving to the standard use of C operators for the rest of the class.
Anecdote: An online judge website (which is pretty well known in Korea) has an easy problem[0] asking to write a program which adds "??!" to input. A lot of beginners' C/C++ submissions got "Wrong Answer" verdict because of trigraphs.
This sort of practice goes back to BCPL, which wikipedia says is the first braced programming language. Because { and } weren't universally available, compilers also supported the sequence $( and $) to represent these, which were typeable and printable on just about anything.
[+] [-] susam|3 years ago|reply
Appendix A (Reference Manual) of the book broadened my outlook on programming languages by providing me a glimpse of what goes into formally specifying a programming language. Section A.12 (Preprocessing) of this appendix specifies trigraph sequences. Quoting from the section:
> Preprocessing itself takes place in several logically successive phases that may, in a particular implementation, be condensed.
> 1. First, trigraph sequences as described in Par.A.12.1 are replaced by their equivalents. Should the operating system environment require it, newline characters are introduced between the lines of the source file.
Then section A.12.1 (Trigraph Sequences) further elaborates trigraph sequences in more detail. Quoting this section below:
> The character set of C source programs is contained within seven-bit ASCII, but is a superset of the ISO 646-1983 Invariant Code Set. In order to enable programs to be represented in the reduced set, all occurrences of the following trigraph sequences are replaced by the corresponding single character. This replacement occurs before any other processing.
> No other such replacements occur.> Trigraph sequences are new with the ANSI standard.
[+] [-] vonwoodson|3 years ago|reply
As opposed to say, “Learn You a Haskell for Great Good! A Beginner's Guide” which is 881 pages and doesn’t even moderately cover the prelude.
Anyway, C is an amazing language and I keep a K&R on my phone as a pdf
[+] [-] unwind|3 years ago|reply
Number of times I've seen trigraphs in "real code": still zero. I hope it's the same for you.
[+] [-] usr1106|3 years ago|reply
I don't remember whether trigraphs were not supported by the compiler at the time or whether we just wanted to avoid completely unreadable code. Not experienced in VM/370 administration we spent weeks to modify the system to use some international EBCDIC codepage.
The system never saw much use, everybody preferred Unix workstations where programming in C was a natural thing.
[+] [-] kjs3|3 years ago|reply
[+] [-] Taniwha|3 years ago|reply
[+] [-] dijonman2|3 years ago|reply
Great memories!
[+] [-] unknown|3 years ago|reply
[deleted]
[+] [-] bradford|3 years ago|reply
I've pasted it here for convenience (formatting fixed, thanks child comment!):
[+] [-] omoikane|3 years ago|reply
But trigraphs have gotten old even for IOCCC. In the guidelines for recent years, they specifically mention "We tend to dislike programs that ... obfuscate by excessive use of ANSI tri-graphs": https://www.ioccc.org/2020/guidelines.txt
[+] [-] thamer|3 years ago|reply
[+] [-] sargstuff|3 years ago|reply
But think the cpp has to go away first, after enough sed.
https://grayson.sh/blogs/using-piphilology-to-hide-strings
https://www.gnu.org/software/gawk/manual/gawk.html#Signature...
[+] [-] lifthrasiir|3 years ago|reply
[+] [-] rdlw|3 years ago|reply
https://stackoverflow.com/q/1642028
[+] [-] falcor84|3 years ago|reply
[+] [-] furyofantares|3 years ago|reply
[+] [-] teawrecks|3 years ago|reply
[+] [-] layer8|3 years ago|reply
> Almost every country needed an adapted version of ASCII, since ASCII suited the needs of only the US and a few other countries. For example, Canada had its own version that supported French characters.
> Many other countries developed variants of ASCII to include non-English letters (e.g. é, ñ, ß, Ł), currency symbols (e.g. £, ¥), etc. See also YUSCII (Yugoslavia).
> It would share most characters in common, but assign other locally useful characters to several code points reserved for "national use". […]
> Because the bracket and brace characters of ASCII were assigned to "national use" code points that were used for accented letters in other national variants of ISO/IEC 646, a German, French, or Swedish, etc. programmer using their national variant of ISO/IEC 646, rather than ASCII, had to write, and, thus, read, something such as
instead of > C trigraphs were created to solve this problem for ANSI C, although their late introduction and inconsistent implementation in compilers limited their use. Many programmers kept their computers on US-ASCII, so plain-text in Swedish, German etc. (for example, in e-mail or Usenet) contained "{, }" and similar variants in the middle of words, something those programmers got used to. For example, a Swedish programmer mailing another programmer asking if they should go for lunch, could get "N{ jag har sm|rg}sar" as the answer, which should be "Nä jag har smörgåsar" meaning "No I've got sandwiches".[+] [-] dhosek|3 years ago|reply
[+] [-] watersb|3 years ago|reply
http://www.righto.com/2019/11/ibm-sonic-delay-lines-and-hist...
The next-gen was far more common.. The IBM 3270 terminal hooked to a local controller that talked to the mainframe. Could also hook a printer to the controller, you could print screen and simple forms independently from the mainframe.
You know all this, but I've always thought it was cool, and try to refresh my understanding of the setup. I no doubt have many details wrong.
[+] [-] NegativeLatency|3 years ago|reply
[+] [-] garaetjjte|3 years ago|reply
[+] [-] gpderetta|3 years ago|reply
I quite like them, but then again, I have been writing way too much python lately.
[+] [-] chromatin|3 years ago|reply
edited to add: I really like "Modern C" and just re-checked -- no mention of the preprocessor feature!
https://hal.inria.fr/hal-02383654/file/ModernC.pdf
[+] [-] ryandrake|3 years ago|reply
[+] [-] richbell|3 years ago|reply
https://stackoverflow.com/a/1642035
[+] [-] unknown|3 years ago|reply
[deleted]
[+] [-] Natsu|3 years ago|reply
[+] [-] billpg|3 years ago|reply
"Are question marks fine?"
"Yes."
"I'll come up with something."
[+] [-] daptaq|3 years ago|reply
[+] [-] cl3misch|3 years ago|reply
"Whether it's computer languages or human ones, as soon as you get into a discussion about the correct parsing of a statement, you've lost and need to rewrite in a way that's unambiguous. Too many people pride themselves on knowing more or less obscure rules and, honestly, no one else cares."
https://news.ycombinator.com/item?id=23051202
[+] [-] halileohalilei|3 years ago|reply
[+] [-] kbob|3 years ago|reply
I've used uppercase-only terminals, and I've used ancient C, but not at the same time.
[+] [-] kenniskrag|3 years ago|reply
https://en.m.wikipedia.org/wiki/C%2B%2B17#Removed_features
[+] [-] amelius|3 years ago|reply
[+] [-] pjmlp|3 years ago|reply
[+] [-] piesquaredarr|3 years ago|reply
[+] [-] DonHopkins|3 years ago|reply
Took me a while to figure out that "trigraph" was referring to some part of "??!?!!?!????" and not "WTF".
[+] [-] Agentlien|3 years ago|reply
http://stackoverflow.com/questions/53315710/ddg#53315821
[+] [-] FabHK|3 years ago|reply
The latter is a very common idiom in Julia code, which I found obscure and puerile at first (“look how smart I am”), but have come to appreciate as concise and natural by now.
For example:
https://docs.julialang.org/en/v1/manual/control-flow/#Short-...[+] [-] divbzero|3 years ago|reply
[+] [-] pwdisswordfish9|3 years ago|reply
[+] [-] pavon|3 years ago|reply
It made for some amusing group projects when I got to university, when classmates had never seen those operators and were trying to figure out where they were coming from and why I would write such silly things. I trolled them by replacing all my brackets with `begin` and `end` in the next assignment before moving to the standard use of C operators for the rest of the class.
[+] [-] curling_grad|3 years ago|reply
[0]: https://www.acmicpc.net/problem/10926
[+] [-] hgs3|3 years ago|reply
[1] https://stackoverflow.com/questions/1642028/what-is-the-oper...
[+] [-] cesaref|3 years ago|reply
https://en.wikipedia.org/wiki/BCPL
This is the earliest example of this sort of thing i'm aware of - is there an earlier example?
Also, BCPL supported // for comments, again, probably the first use of this sequence.
[+] [-] virtualritz|3 years ago|reply
This comment on the SO post made my day. :D
[+] [-] anfractuosity|3 years ago|reply