What does the ??!??! operator do in C?

[+] susam|3 years ago|reply

I learnt C, more than 20 years ago, from the book The C Programming Language written by Brian W. Kernighan and Dennis M. Ritchie, also known as K&R. I read the book almost cover to cover all the way from the preface at the beginning to its three appendices at the end while solving all the exercises that each chapter presented. As someone who knew very little about programming languages back then, this book was formative in my journey of becoming a programmer.

Appendix A (Reference Manual) of the book broadened my outlook on programming languages by providing me a glimpse of what goes into formally specifying a programming language. Section A.12 (Preprocessing) of this appendix specifies trigraph sequences. Quoting from the section:

> Preprocessing itself takes place in several logically successive phases that may, in a particular implementation, be condensed.

> 1. First, trigraph sequences as described in Par.A.12.1 are replaced by their equivalents. Should the operating system environment require it, newline characters are introduced between the lines of the source file.

Then section A.12.1 (Trigraph Sequences) further elaborates trigraph sequences in more detail. Quoting this section below:

> The character set of C source programs is contained within seven-bit ASCII, but is a superset of the ISO 646-1983 Invariant Code Set. In order to enable programs to be represented in the reduced set, all occurrences of the following trigraph sequences are replaced by the corresponding single character. This replacement occurs before any other processing.

  ??=  #
  ??/  \
  ??'  ^
  ??(  [
  ??)  ]
  ??!  |
  ??<  {
  ??>  }
  ??-  ~

> No other such replacements occur.

> Trigraph sequences are new with the ANSI standard.

[+] vonwoodson|3 years ago|reply

To be fair, and definitely a part of its appeal, the K&R is only 312 pages long. It covers the language and most of the standard library you’ll need.

As opposed to say, “Learn You a Haskell for Great Good! A Beginner's Guide” which is 881 pages and doesn’t even moderately cover the prelude.

Anyway, C is an amazing language and I keep a K&R on my phone as a pdf

[+] unwind|3 years ago|reply

I learnt it a few years before that, and I remember how reading K&R once I got it felt like having someone turn UP the lights, open the blinds, wash the windows and basically TURN UP THE SUN compared to things I read before. So much clarity.

Number of times I've seen trigraphs in "real code": still zero. I hope it's the same for you.

[+] usr1106|3 years ago|reply

In 1990 IBM donated a 9370 computer to our university. The default code page for German EBCDIC did not support square brackets.

I don't remember whether trigraphs were not supported by the compiler at the time or whether we just wanted to avoid completely unreadable code. Not experienced in VM/370 administration we spent weeks to modify the system to use some international EBCDIC codepage.

The system never saw much use, everybody preferred Unix workstations where programming in C was a natural thing.

[+] kjs3|3 years ago|reply

I, too, learned C by reading K&R cover to cover and solving all the exercises (in front of a Sun 3/160 running SunOS 3.5-ish). Even then back in those ancient days, it was obvious trigraphs were evil and should have been abolished to a special place in hell.

[+] Taniwha|3 years ago|reply

This is because IBM 029 card punches don't support these characters right?

[+] dijonman2|3 years ago|reply

Fantastic book. I used Learn C in 21 days and that is what started everything for me. I had a second book on Linux administration and installed Slackware from 1.44mb disks, ultimately setting up pppd and using Mosaic.

Great memories!

[+] unknown|3 years ago|reply

[deleted]

[+] bradford|3 years ago|reply

Trigraphs make this obfuscated C submission possible: (https://gist.github.com/Property404/e31b99deb3527159e183)

I've pasted it here for convenience (formatting fixed, thanks child comment!):

   //  Are you there god??/
   ??=define _(please, help)
   ??=define _____(i,m, v,e,r,y) r%:%:m
   ??=define ____ _____(a,f,r,a,i,d)
   main(__)<%____(!_(-~-??-((-~-??-!__<<-
   ??-!!__)<<-??-(!!__<<!!__))+-~-~-??--~-~
   -~-~-~-~-??-(-~-~-~-~-??-!!__<<-~!!__),-
   ??-!__))<%??>%>_(__,___)??<____
   (printf("please let me die??/r%d bottle%s"
   " of bee%s""""??/n",(!(___
   %-~-~!!___))?--__+!___++:__+!___++,!(__-!!___)
   &&___%-~-~!!___??!??!!(___%-~-~!!___??!??!__
   -(-~!!___))?"":"s",___%-~-??-!!___<-??-!!___?
   "r on the wall":"eeeeeeer! Take one down,pass ??/
   it around")&&__&&_(__,___),"mercy I'm in pain")??<??>??>

[+] omoikane|3 years ago|reply

Roughly the only good use of trigraphs these days is for obfuscated code, for example here: https://www.ioccc.org/years.html#1990_scjones

But trigraphs have gotten old even for IOCCC. In the guidelines for recent years, they specifically mention "We tend to dislike programs that ... obfuscate by excessive use of ANSI tri-graphs": https://www.ioccc.org/2020/guidelines.txt

[+] thamer|3 years ago|reply

How to format text on HN: https://news.ycombinator.com/formatdoc

  For code blocks, prefix each line with two or more spaces.

[+] sargstuff|3 years ago|reply

Guess with tri-graph elimination & awk getting unicode support will have to gawk C with cpp using pipology theory.

But think the cpp has to go away first, after enough sed.

https://grayson.sh/blogs/using-piphilology-to-hide-strings

https://www.gnu.org/software/gawk/manual/gawk.html#Signature...

[+] lifthrasiir|3 years ago|reply

Note that this uses not only trigraphs but also digraphs (here `<%`, `%>` and `%:`), which are similar to trigraphs in intended usages but behave much differently to digraphs in that it is a proper token and not a preprocessor substitution pattern. `printf("??(foo??)<:bar:>%c", "quux"<:1:>)` prints `[foo]<:bar:>u`, for example. Therefore digraphs are deemed less dangerous (however obscure) than trigraphs and do not require any compiler options.

[+] rdlw|3 years ago|reply

See also: "What is the "-->" operator in C++?"

https://stackoverflow.com/q/1642028

[+] falcor84|3 years ago|reply

And of course, its cousin the slides-to operator, described in the answer https://stackoverflow.com/a/8909176/493553 with the following example:

    while (x --\
                \
                 \
                  \
                   > 0)
         printf("%d ", x);

[+] furyofantares|3 years ago|reply

And sort of the opposite of that, I once had someone say they wanted to contribute to the C++ portion of our codebase, but the only problem was they didn't know how to make the "->" character, and did they need to get a special keyboard?

[+] teawrecks|3 years ago|reply

Yeah, I thought this was going to involve the ternary operator. TIL about trigraphs.

[+] layer8|3 years ago|reply

From the ASCII Wikipedia page (https://en.wikipedia.org/wiki/ASCII#7-bit_codes):

> Almost every country needed an adapted version of ASCII, since ASCII suited the needs of only the US and a few other countries. For example, Canada had its own version that supported French characters.

> Many other countries developed variants of ASCII to include non-English letters (e.g. é, ñ, ß, Ł), currency symbols (e.g. £, ¥), etc. See also YUSCII (Yugoslavia).

> It would share most characters in common, but assign other locally useful characters to several code points reserved for "national use". […]

> Because the bracket and brace characters of ASCII were assigned to "national use" code points that were used for accented letters in other national variants of ISO/IEC 646, a German, French, or Swedish, etc. programmer using their national variant of ISO/IEC 646, rather than ASCII, had to write, and, thus, read, something such as

  ä aÄiÜ = 'Ön'; ü

instead of

  { a[i] = '\n'; }

> C trigraphs were created to solve this problem for ANSI C, although their late introduction and inconsistent implementation in compilers limited their use. Many programmers kept their computers on US-ASCII, so plain-text in Swedish, German etc. (for example, in e-mail or Usenet) contained "{, }" and similar variants in the middle of words, something those programmers got used to. For example, a Swedish programmer mailing another programmer asking if they should go for lunch, could get "N{ jag har sm|rg}sar" as the answer, which should be "Nä jag har smörgåsar" meaning "No I've got sandwiches".

[+] dhosek|3 years ago|reply

One of the challenges of | is that it was never entirely clear whether the ASCII | should be equivalent to EBCDIC’s | or ¦. As I recall, Waterloo C wanted ¦ as its vertical bar character, although I could be wrong. On the IBM system that I used back in the 80s, we had ASCII terminals which were run through a muxer to the actual system (which was part of the magic that allowed it to have thousands of concurrent users all getting real-time access—a lot of UI was offloaded to these systems which were essentially minicomputers on their own).

[+] watersb|3 years ago|reply

Great article (that appeared on HN somewhat recently) from Ken Shirrif on the history display terminals, and a great photo of the IBM 2848 Display Controller.

http://www.righto.com/2019/11/ibm-sonic-delay-lines-and-hist...

The next-gen was far more common.. The IBM 3270 terminal hooked to a local controller that talked to the mainframe. Could also hook a printer to the controller, you could print screen and simple forms independently from the mainframe.

You know all this, but I've always thought it was cool, and try to refresh my understanding of the setup. I no doubt have many details wrong.

[+] NegativeLatency|3 years ago|reply

There's also iso646.h which allows you to do some particularly python looking stuff:

  #include <iso646.h>
  #include <stdbool.h>
  #include <stdio.h>
  #define is ==
  
  bool is_whitespace(int c) {
    if (c is ' ' or c is '\n' or c is '\t') {
      return true;
    }
    return false;
  }
  
  int main() {
    int current, previous;
    bool in_word;
  
    while ((current = getchar()) not_eq EOF) {
      if (is_whitespace(current) and not is_whitespace(previous)) {
        putchar('\n');
      } else {
        putchar(current);
      }
      previous = current;
    }
  
    return 0;
  }

[+] garaetjjte|3 years ago|reply

Of course when you are willing to use preprocessor, you can do things like Bournegol: http://oldhome.schmorp.de/marc/bournegol.html

[+] gpderetta|3 years ago|reply

In C++ these are genuine operators and do not require the macros from iso646.

I quite like them, but then again, I have been writing way too much python lately.

[+] chromatin|3 years ago|reply

Wow, and I thought I knew C pretty well. Great post.

edited to add: I really like "Modern C" and just re-checked -- no mention of the preprocessor feature!

https://hal.inria.fr/hal-02383654/file/ModernC.pdf

[+] ryandrake|3 years ago|reply

I think the only remaining purpose for trigraphs is when you are at the very end of a C interview, and your amazing candidate has answered every question perfectly, and you just have to find something they might not know about--only then do you reach for the trigraphs.

[+] richbell|3 years ago|reply

I think C also has the elusive "down to" operator.

https://stackoverflow.com/a/1642035

[+] unknown|3 years ago|reply

[deleted]

[+] Natsu|3 years ago|reply

Honestly, I thought this was about a programming language called C? rather than C.

[+] billpg|3 years ago|reply

"There's a problem. Some machines don't have some braces and vertical bars and such. We'll have to add keywords like OR and BEGIN and END."

"Are question marks fine?"

"Yes."

"I'll come up with something."

[+] daptaq|3 years ago|reply

See iso646.h, and https://en.cppreference.com/w/c/language/operator_alternativ....

[+] cl3misch|3 years ago|reply

This reminds me of a comment on a Python discussion >2 years ago, of which I think often:

"Whether it's computer languages or human ones, as soon as you get into a discussion about the correct parsing of a statement, you've lost and need to rewrite in a way that's unambiguous. Too many people pride themselves on knowing more or less obscure rules and, honestly, no one else cares."

https://news.ycombinator.com/item?id=23051202

[+] halileohalilei|3 years ago|reply

Completely agree with that. In fact, it's the first thing I thought of when I saw the code snippet in question. Even if you replace the trigraph with the regular || operator, it's still hard to read that piece of code. Syntactic sugars and short circuits are cool and all but most of the time they have no place in production code that's meant to be read by other developers.

[+] kbob|3 years ago|reply

I'd say, "Congratulations! You're one of today's luck 10,000!", but trigraphs aren't really much fun. Just another reminder that C is old, and computing is even older.

I've used uppercase-only terminals, and I've used ancient C, but not at the same time.

[+] kenniskrag|3 years ago|reply

trigraphs are removed in c++ 17

https://en.m.wikipedia.org/wiki/C%2B%2B17#Removed_features

[+] amelius|3 years ago|reply

I've never seen them used anywhere.

[+] pjmlp|3 years ago|reply

They are still around in C though.

[+] piesquaredarr|3 years ago|reply

Huh, I never realized that C++ standards were removing C features. Time to be more careful about using g++ for everything.

[+] DonHopkins|3 years ago|reply

Years ago I wrote a perfectly reasonable comment like /* WTF??!?!!?!???? */ and the old C compiler complained about "invalid trigraph". A syntax error in the middle of a comment!

Took me a while to figure out that "trigraph" was referring to some part of "??!?!!?!????" and not "WTF".

[+] Agentlien|3 years ago|reply

Every time I hear about trigraphs I think of this horror:

http://stackoverflow.com/questions/53315710/ddg#53315821

[+] FabHK|3 years ago|reply

There are two aspects to this, the trigraph, and using the short circuiting behaviour of the binary logic operator for control flow.

The latter is a very common idiom in Julia code, which I found obscure and puerile at first (“look how smart I am”), but have come to appreciate as concise and natural by now.

For example:

  function fact(n::Int)
     n >= 0 || error("n must be non-negative")
     n == 0 && return 1
     n * fact(n-1)
  end

https://docs.julialang.org/en/v1/manual/control-flow/#Short-...

[+] divbzero|3 years ago|reply

In addition to trigraphs, there are apparently a set of C alternative tokens defined as follows:

  #define and &&
  #define and_eq &=
  #define bitand &
  #define bitor |
  #define compl ~
  #define not !
  #define not_eq !=
  #define or ||
  #define or_eq |=
  #define xor ^
  #define xor_eq ^=

I suppose that allows for code like this:

  if (x or not y or not z) {
      return 1;
  }

https://en.wikipedia.org/wiki/C_alternative_tokens

[+] pwdisswordfish9|3 years ago|reply

Makes for great obfuscated C++.

    template <typename T>
    void print(T const bitand foo) {
        std::cout << foo << std::endl;
    }

[+] pavon|3 years ago|reply

The instructor at the branch college where I learned C++ in the late 90's taught us that those were the preferred operators and that the old operators belonged in the wastebasket of history along with printf and str* functions.

It made for some amusing group projects when I got to university, when classmates had never seen those operators and were trying to figure out where they were coming from and why I would write such silly things. I trolled them by replacing all my brackets with `begin` and `end` in the next assignment before moving to the standard use of C operators for the rest of the class.

[+] curling_grad|3 years ago|reply

Anecdote: An online judge website (which is pretty well known in Korea) has an easy problem[0] asking to write a program which adds "??!" to input. A lot of beginners' C/C++ submissions got "Wrong Answer" verdict because of trigraphs.

[0]: https://www.acmicpc.net/problem/10926

[+] hgs3|3 years ago|reply

Reminds me of the "goes to" operator [1]

[1] https://stackoverflow.com/questions/1642028/what-is-the-oper...

[+] cesaref|3 years ago|reply

This sort of practice goes back to BCPL, which wikipedia says is the first braced programming language. Because { and } weren't universally available, compilers also supported the sequence $( and $) to represent these, which were typeable and printable on just about anything.

https://en.wikipedia.org/wiki/BCPL

This is the earliest example of this sort of thing i'm aware of - is there an earlier example?

Also, BCPL supported // for comments, again, probably the first use of this sequence.

[+] virtualritz|3 years ago|reply

> Has Microsoft Windows finally been open-sourced or where did this come from?

This comment on the SO post made my day. :D

[+] anfractuosity|3 years ago|reply

In gcc I got:

    1.c:1:11: warning: trigraph ??< ignored, use -trigraphs to enable [-Wtrigraphs]

Is there a preprocessor directive to enable support out of curiosity?

166 comments