top | item 33101401

What does the ??!??! operator do in C?

636 points| isomorph | 3 years ago |stackoverflow.com

166 comments

order
[+] susam|3 years ago|reply
I learnt C, more than 20 years ago, from the book The C Programming Language written by Brian W. Kernighan and Dennis M. Ritchie, also known as K&R. I read the book almost cover to cover all the way from the preface at the beginning to its three appendices at the end while solving all the exercises that each chapter presented. As someone who knew very little about programming languages back then, this book was formative in my journey of becoming a programmer.

Appendix A (Reference Manual) of the book broadened my outlook on programming languages by providing me a glimpse of what goes into formally specifying a programming language. Section A.12 (Preprocessing) of this appendix specifies trigraph sequences. Quoting from the section:

> Preprocessing itself takes place in several logically successive phases that may, in a particular implementation, be condensed.

> 1. First, trigraph sequences as described in Par.A.12.1 are replaced by their equivalents. Should the operating system environment require it, newline characters are introduced between the lines of the source file.

Then section A.12.1 (Trigraph Sequences) further elaborates trigraph sequences in more detail. Quoting this section below:

> The character set of C source programs is contained within seven-bit ASCII, but is a superset of the ISO 646-1983 Invariant Code Set. In order to enable programs to be represented in the reduced set, all occurrences of the following trigraph sequences are replaced by the corresponding single character. This replacement occurs before any other processing.

  ??=  #
  ??/  \
  ??'  ^
  ??(  [
  ??)  ]
  ??!  |
  ??<  {
  ??>  }
  ??-  ~
> No other such replacements occur.

> Trigraph sequences are new with the ANSI standard.

[+] vonwoodson|3 years ago|reply
To be fair, and definitely a part of its appeal, the K&R is only 312 pages long. It covers the language and most of the standard library you’ll need.

As opposed to say, “Learn You a Haskell for Great Good! A Beginner's Guide” which is 881 pages and doesn’t even moderately cover the prelude.

Anyway, C is an amazing language and I keep a K&R on my phone as a pdf

[+] unwind|3 years ago|reply
I learnt it a few years before that, and I remember how reading K&R once I got it felt like having someone turn UP the lights, open the blinds, wash the windows and basically TURN UP THE SUN compared to things I read before. So much clarity.

Number of times I've seen trigraphs in "real code": still zero. I hope it's the same for you.

[+] usr1106|3 years ago|reply
In 1990 IBM donated a 9370 computer to our university. The default code page for German EBCDIC did not support square brackets.

I don't remember whether trigraphs were not supported by the compiler at the time or whether we just wanted to avoid completely unreadable code. Not experienced in VM/370 administration we spent weeks to modify the system to use some international EBCDIC codepage.

The system never saw much use, everybody preferred Unix workstations where programming in C was a natural thing.

[+] kjs3|3 years ago|reply
I, too, learned C by reading K&R cover to cover and solving all the exercises (in front of a Sun 3/160 running SunOS 3.5-ish). Even then back in those ancient days, it was obvious trigraphs were evil and should have been abolished to a special place in hell.
[+] Taniwha|3 years ago|reply
This is because IBM 029 card punches don't support these characters right?
[+] dijonman2|3 years ago|reply
Fantastic book. I used Learn C in 21 days and that is what started everything for me. I had a second book on Linux administration and installed Slackware from 1.44mb disks, ultimately setting up pppd and using Mosaic.

Great memories!

[+] bradford|3 years ago|reply
Trigraphs make this obfuscated C submission possible: (https://gist.github.com/Property404/e31b99deb3527159e183)

I've pasted it here for convenience (formatting fixed, thanks child comment!):

   //  Are you there god??/
   ??=define _(please, help)
   ??=define _____(i,m, v,e,r,y) r%:%:m
   ??=define ____ _____(a,f,r,a,i,d)
   main(__)<%____(!_(-~-??-((-~-??-!__<<-
   ??-!!__)<<-??-(!!__<<!!__))+-~-~-??--~-~
   -~-~-~-~-??-(-~-~-~-~-??-!!__<<-~!!__),-
   ??-!__))<%??>%>_(__,___)??<____
   (printf("please let me die??/r%d bottle%s"
   " of bee%s""""??/n",(!(___
   %-~-~!!___))?--__+!___++:__+!___++,!(__-!!___)
   &&___%-~-~!!___??!??!!(___%-~-~!!___??!??!__
   -(-~!!___))?"":"s",___%-~-??-!!___<-??-!!___?
   "r on the wall":"eeeeeeer! Take one down,pass ??/
   it around")&&__&&_(__,___),"mercy I'm in pain")??<??>??>
[+] lifthrasiir|3 years ago|reply
Note that this uses not only trigraphs but also digraphs (here `<%`, `%>` and `%:`), which are similar to trigraphs in intended usages but behave much differently to digraphs in that it is a proper token and not a preprocessor substitution pattern. `printf("??(foo??)<:bar:>%c", "quux"<:1:>)` prints `[foo]<:bar:>u`, for example. Therefore digraphs are deemed less dangerous (however obscure) than trigraphs and do not require any compiler options.
[+] rdlw|3 years ago|reply
See also: "What is the "-->" operator in C++?"

https://stackoverflow.com/q/1642028

[+] furyofantares|3 years ago|reply
And sort of the opposite of that, I once had someone say they wanted to contribute to the C++ portion of our codebase, but the only problem was they didn't know how to make the "->" character, and did they need to get a special keyboard?
[+] teawrecks|3 years ago|reply
Yeah, I thought this was going to involve the ternary operator. TIL about trigraphs.
[+] layer8|3 years ago|reply
From the ASCII Wikipedia page (https://en.wikipedia.org/wiki/ASCII#7-bit_codes):

> Almost every country needed an adapted version of ASCII, since ASCII suited the needs of only the US and a few other countries. For example, Canada had its own version that supported French characters.

> Many other countries developed variants of ASCII to include non-English letters (e.g. é, ñ, ß, Ł), currency symbols (e.g. £, ¥), etc. See also YUSCII (Yugoslavia).

> It would share most characters in common, but assign other locally useful characters to several code points reserved for "national use". […]

> Because the bracket and brace characters of ASCII were assigned to "national use" code points that were used for accented letters in other national variants of ISO/IEC 646, a German, French, or Swedish, etc. programmer using their national variant of ISO/IEC 646, rather than ASCII, had to write, and, thus, read, something such as

  ä aÄiÜ = 'Ön'; ü
instead of

  { a[i] = '\n'; }
> C trigraphs were created to solve this problem for ANSI C, although their late introduction and inconsistent implementation in compilers limited their use. Many programmers kept their computers on US-ASCII, so plain-text in Swedish, German etc. (for example, in e-mail or Usenet) contained "{, }" and similar variants in the middle of words, something those programmers got used to. For example, a Swedish programmer mailing another programmer asking if they should go for lunch, could get "N{ jag har sm|rg}sar" as the answer, which should be "Nä jag har smörgåsar" meaning "No I've got sandwiches".
[+] dhosek|3 years ago|reply
One of the challenges of | is that it was never entirely clear whether the ASCII | should be equivalent to EBCDIC’s | or ¦. As I recall, Waterloo C wanted ¦ as its vertical bar character, although I could be wrong. On the IBM system that I used back in the 80s, we had ASCII terminals which were run through a muxer to the actual system (which was part of the magic that allowed it to have thousands of concurrent users all getting real-time access—a lot of UI was offloaded to these systems which were essentially minicomputers on their own).
[+] watersb|3 years ago|reply
Great article (that appeared on HN somewhat recently) from Ken Shirrif on the history display terminals, and a great photo of the IBM 2848 Display Controller.

http://www.righto.com/2019/11/ibm-sonic-delay-lines-and-hist...

The next-gen was far more common.. The IBM 3270 terminal hooked to a local controller that talked to the mainframe. Could also hook a printer to the controller, you could print screen and simple forms independently from the mainframe.

You know all this, but I've always thought it was cool, and try to refresh my understanding of the setup. I no doubt have many details wrong.

[+] NegativeLatency|3 years ago|reply
There's also iso646.h which allows you to do some particularly python looking stuff:

  #include <iso646.h>
  #include <stdbool.h>
  #include <stdio.h>
  #define is ==
  
  bool is_whitespace(int c) {
    if (c is ' ' or c is '\n' or c is '\t') {
      return true;
    }
    return false;
  }
  
  int main() {
    int current, previous;
    bool in_word;
  
    while ((current = getchar()) not_eq EOF) {
      if (is_whitespace(current) and not is_whitespace(previous)) {
        putchar('\n');
      } else {
        putchar(current);
      }
      previous = current;
    }
  
    return 0;
  }
[+] gpderetta|3 years ago|reply
In C++ these are genuine operators and do not require the macros from iso646.

I quite like them, but then again, I have been writing way too much python lately.

[+] chromatin|3 years ago|reply
Wow, and I thought I knew C pretty well. Great post.

edited to add: I really like "Modern C" and just re-checked -- no mention of the preprocessor feature!

https://hal.inria.fr/hal-02383654/file/ModernC.pdf

[+] ryandrake|3 years ago|reply
I think the only remaining purpose for trigraphs is when you are at the very end of a C interview, and your amazing candidate has answered every question perfectly, and you just have to find something they might not know about--only then do you reach for the trigraphs.
[+] Natsu|3 years ago|reply
Honestly, I thought this was about a programming language called C? rather than C.
[+] cl3misch|3 years ago|reply
This reminds me of a comment on a Python discussion >2 years ago, of which I think often:

"Whether it's computer languages or human ones, as soon as you get into a discussion about the correct parsing of a statement, you've lost and need to rewrite in a way that's unambiguous. Too many people pride themselves on knowing more or less obscure rules and, honestly, no one else cares."

https://news.ycombinator.com/item?id=23051202

[+] halileohalilei|3 years ago|reply
Completely agree with that. In fact, it's the first thing I thought of when I saw the code snippet in question. Even if you replace the trigraph with the regular || operator, it's still hard to read that piece of code. Syntactic sugars and short circuits are cool and all but most of the time they have no place in production code that's meant to be read by other developers.
[+] kbob|3 years ago|reply
I'd say, "Congratulations! You're one of today's luck 10,000!", but trigraphs aren't really much fun. Just another reminder that C is old, and computing is even older.

I've used uppercase-only terminals, and I've used ancient C, but not at the same time.

[+] DonHopkins|3 years ago|reply
Years ago I wrote a perfectly reasonable comment like /* WTF??!?!!?!???? */ and the old C compiler complained about "invalid trigraph". A syntax error in the middle of a comment!

Took me a while to figure out that "trigraph" was referring to some part of "??!?!!?!????" and not "WTF".

[+] FabHK|3 years ago|reply
There are two aspects to this, the trigraph, and using the short circuiting behaviour of the binary logic operator for control flow.

The latter is a very common idiom in Julia code, which I found obscure and puerile at first (“look how smart I am”), but have come to appreciate as concise and natural by now.

For example:

  function fact(n::Int)
     n >= 0 || error("n must be non-negative")
     n == 0 && return 1
     n * fact(n-1)
  end
https://docs.julialang.org/en/v1/manual/control-flow/#Short-...
[+] divbzero|3 years ago|reply
In addition to trigraphs, there are apparently a set of C alternative tokens defined as follows:

  #define and &&
  #define and_eq &=
  #define bitand &
  #define bitor |
  #define compl ~
  #define not !
  #define not_eq !=
  #define or ||
  #define or_eq |=
  #define xor ^
  #define xor_eq ^=
I suppose that allows for code like this:

  if (x or not y or not z) {
      return 1;
  }
https://en.wikipedia.org/wiki/C_alternative_tokens
[+] pwdisswordfish9|3 years ago|reply
Makes for great obfuscated C++.

    template <typename T>
    void print(T const bitand foo) {
        std::cout << foo << std::endl;
    }
[+] pavon|3 years ago|reply
The instructor at the branch college where I learned C++ in the late 90's taught us that those were the preferred operators and that the old operators belonged in the wastebasket of history along with printf and str* functions.

It made for some amusing group projects when I got to university, when classmates had never seen those operators and were trying to figure out where they were coming from and why I would write such silly things. I trolled them by replacing all my brackets with `begin` and `end` in the next assignment before moving to the standard use of C operators for the rest of the class.

[+] curling_grad|3 years ago|reply
Anecdote: An online judge website (which is pretty well known in Korea) has an easy problem[0] asking to write a program which adds "??!" to input. A lot of beginners' C/C++ submissions got "Wrong Answer" verdict because of trigraphs.

[0]: https://www.acmicpc.net/problem/10926

[+] cesaref|3 years ago|reply
This sort of practice goes back to BCPL, which wikipedia says is the first braced programming language. Because { and } weren't universally available, compilers also supported the sequence $( and $) to represent these, which were typeable and printable on just about anything.

https://en.wikipedia.org/wiki/BCPL

This is the earliest example of this sort of thing i'm aware of - is there an earlier example?

Also, BCPL supported // for comments, again, probably the first use of this sequence.

[+] virtualritz|3 years ago|reply
> Has Microsoft Windows finally been open-sourced or where did this come from?

This comment on the SO post made my day. :D

[+] anfractuosity|3 years ago|reply
In gcc I got:

    1.c:1:11: warning: trigraph ??< ignored, use -trigraphs to enable [-Wtrigraphs]
Is there a preprocessor directive to enable support out of curiosity?