No implicit cast except for literals and void* (explicit compile time/runtime casts), one loop statement (loop{}), no switch/enum/generic/_thread/typeof/etc, no integer promotion, only sized primitive types (u64 s32 f32 etc...), no anonymous code block, real compiler hard/compile time constant declaration, many operators have to go (--,++, a?b:c, etc)... and everything I am forgetting right now (the dangerous struct pack attribute...). But we need inline keywords for memory barriers, atomics for modern hardware architecture programming.
There is C0, a stripped-down version of C popular in academia [1]. Great for teaching because it's conceptually simple and easy to write a compiler for. But with a couple of additions (like sized primitive types) it might match what you are imagining
The first two are obvious, but the third is also legal. It works because array indexing is just sugar for pointer arithmetic, so array[2]=4 is identical in meaning to *(array+2)=4. Therefore 3[array]=27 is identical to *(3+array)=27 and so is legal. But just because you can doesn't mean you should.
Note that this is GNU C, not standard C. GNU has extended the normal C language with features such as forward parameter declarations and numeric ranges in switch cases. Lots of people don't know about these things.
I'd have to argue the function typedefs are not useless, I've come across two uses.
The obvious one is rather than a function pointer typedef, such the subsequent use in a struct is obviously a pointer. Which helps when others are initially reading unfamiliar structures.
The other case can be somewhat related, namely as an assertion / check when writing such handler functions, and more importantly updating them.
handler_ty some_handler;
int some_handler(int a) { /* ... */ }
When updating code, it allowed for easier to decode compiler errors if the expected type of handler_ty was changed, and some specific handler was incorrectly updated, or not updated at all.
Basically the error would generally directly call out the inconsistency with the prior line, rather than with the distanct use in the initialisation of 'table'.
As I recall this mechanism has been around since at least C89, I don't recall using it in K&R.
I'm going to speculate a bit on why these silly things are in C.
C was developed on a PDP-11 that had 64Kb of memory. That's not much of any at all. Therefore, the compiler must be extremely tightly coded.
The fundamental rules of the C language are pretty simple. But articles like these expose consequences of such simple rules. Fixing them requires adding more code. Adding more code means less room for the code being compiled.
Therefore, if the intended use of the language works, the pragmatic approach would be to simply not worry about the quirky consequences.
A more interesting question would be "why do these characteristics persist in modern C compilers?"
The stock answer is "backwards compatibility", "Obfuscated C Code contests" and "gotcha job interview questions". My argument would be that there is no reason for the persistence of such "junk DNA" and it should be deprecated and removed.
I've done my part. D doesn't support that stuff, even though the basic use of the language is easily confused with C.
For example:
#include <stdio.h>
void main()
{
int i;
for (i = 0; i < 10; ++i);
printf("%d\n", i);
}
I've died on that hill. I know others who lost an entire day staring at it wondering what's wrong with it. I saw it on X recently as "99% of C programmers will not be able to find the bug."
The equivalent D code:
import core.stdc.stdio;
void main()
{
int i;
for (i = 0; i < 10; ++i);
printf("%d\n", i);
}
gets you:
test.d(5): Error: use `{ }` for an empty statement, not `;`
> I know others who lost an entire day staring at it wondering what's wrong with it. I saw it on X recently as "99% of C programmers will not be able to find the bug."
Both gcc and clang give a warning[1] for that code with just "-Wall", so I's hard to imagine it being a real problem these days.
$ gcc-12 -g -O2 -std=c11 -Wall -Wextra -Wpedantic -Werror c-error.c
c-error.c:2:10: error: return type of ‘main’ is not ‘int’ [-Werror=main]
2 | void main()
| ^~~~
c-error.c: In function ‘main’:
c-error.c:5:9: error: this ‘for’ clause does not guard... [-Werror=misleading-indentation]
5 | for (i = 0; i < 10; ++i);
| ^~~
c-error.c:6:13: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the ‘for’
6 | printf("%d\n", i);
| ^~~~~~
cc1: all warnings being treated as errors
and:
$ clang-14 -g -O2 -std=c11 -Wall -Wextra -Wpedantic -Werror c-error.c
c-error.c:2:5: error: 'main' must return 'int'
void main()
^~~~
int
c-error.c:5:33: error: for loop has empty body [-Werror,-Wempty-body]
for (i = 0; i < 10; ++i);
^
c-error.c:5:33: note: put the semicolon on a separate line to silence this warning
2 errors generated.
Now granted, those are specific implementations, not things mandated by language changes.
Without information about how identifiers are declared, you do not know how to parse this:
(A)(B);
It could be a cast of B to type A, or function A being called with argument B.
Or this (like the puts(puts) in the article):
A(B):
Could be a declaration of B as an identifier of type A, or a call to a function A with argument B.
Back in 1999 I made a small C module called "sfx" (side effects) which parses and identifies C expressions that could plausibly contain side effects. This is one of the bits provided in a small collection called Kazlib.
This can be used to make macros safer; it lets you write a #define macro that inserts an argument multiple times into the expansion. Such a macro could be unsafe if the argument has side effects. With this module, you can write the macro in such a way that it will catch the situation (albeit at run time!). It's like a valgrind for side effects in macros, so to speak.
In the sfx.c module, there is a rudimentary C expression parser which has to work in the absence of declaration info. In other words it has to make sense of an input like (A)(B).
I made it so that when the parser encounters an ambiguity, it will try parsing it both ways, using backtracking via exception handling (provided by except.c). When it hits a syntax error, it can backtrack to an earlier point and parse alternatively.
Consider (A)(A+B). When we are looking at the left part (A), that could plausibly be a cast or declaration. In recursive descent mode, we are going left to right and looking at left derivations. If we parse it as a declaration, we will hit a syntax error on the +, because there is no such operator in the declarator grammar. So we backtrack and parse it as a cast expression, and then we are good.
Hard to believe that was 26 years ago now. I think I was just on the verge of getting into Lisp.
I see the sfx.c code assumes it would never deal with negative character values, so it cheerfully uses the <ctype.h> functions without a cast to unsigned char. It's a reasonable assumption there since the inputs under the intended use case would be expressions in the user's program, stringified by the preprocessor. Funny bytes would only occur in a multi-byte string literal (e.g. UTF-8). When I review code today, this kind of potential issue immediately stands out.
The same exception module is (still?) used in the Ethereal/Wireshark packet capture and analysis tool. It's used to abort "dissecting" packets that are corrupt or truncated.
I had read the GCC documentation and I did not know about the forward parameter declaration. I did know about the other stuff that is mentioned there (and in the first part).
Declarations in for loops is something that I had only ever used in macros (I had not found it useful in other circumstances), such as:
C is a small language. People confuse simple with small quite often. As languages get smaller, using them gets more difficult once below a certain size. The "Turing tarpit" languages like Brainfuck are extremely difficult to write complex programs in, mostly because they're so small.
C is clearly too small to be simple. C++ is too large to be simple. Somewhere in between, there may exist a simple language waiting to be invented.
It creates a compound literal [1] of type array of int, and initializes the specified array positions using designated initializers [2] with the results of calls to puts().
Using designated initializers without the = symbol is an obsolete extension.
That's not valid standard C; gcc and clang give a warning with '-pedantic'. It's valid C++ though.
And IMO it's quite a nice feature, useful sometimes for reducing boilerplate in early returns. It's the obvious consequence if you don't treat void as some extremely-special syntax but rather as just another type, perhaps alike an empty struct (though that's not valid C either ¯\_(ツ)_/¯) that's just implicitly returned at the end of a void-returning function, and a "return;" statement implicitly "creates" a value of void.
In fact in Rust (and probably a bunch of other languages that I'm too lazy to remember) void-returning functions are done via returning a 0-tuple.
Usually I'm more in the camp of "let's preserve everything we can as cultural heritage, yes even those awful Nazi propaganda material" and I'm confident that some distant archeologist (or current close neighbor) will be glad we did.
But as time pass, I'm more and more convinced that wiping-out every peace of C that was ever produced would be one of the greatest possible gesture for the future of humanity.
I also have a theory that universe exits so we can have some opportunities to taste chocolate. Surely in that perspective, even C can be an unfortunate but acceptable byproduct.
Too many people still don't understand C's greatest failure: the undefined behavior. Most people assume that if you write past an array then the result may be a program crash; but actually undefined behavior includes other wonderful options such as stealing all your money, encrypting all your files and extorting you for some bitcoin, or even partially destroying a nuclear isotopes processing facility. Undefined really means undefined, theoretically some demons may start flying out of your nose and it would be completely up to spec. If you think that this is justified by "performance gains" or some other nonsense then I really don't know what to tell you!
Remember that C's contemporary languages were either inefficient (e.g. ALGOL 68, PL/1, Lisp), functionally obsolete (e.g. FORTRAN didn't have recursion or heap allocation), or even lower level (Assembly, B). C eliminated the need for need for assembly in programs that were low level (like OS kernels) or high performance (math, graphics, signal processing), and that was surely a huge improvement in type safety and expressiveness.
I don't get that C hate. That terse syntax can be misused to produce
unreadable code in C, does not change that I usually find it more
readable than more verbose syntax.
[+] [-] sylware|1 year ago|reply
We need a C- ore µC:
No implicit cast except for literals and void* (explicit compile time/runtime casts), one loop statement (loop{}), no switch/enum/generic/_thread/typeof/etc, no integer promotion, only sized primitive types (u64 s32 f32 etc...), no anonymous code block, real compiler hard/compile time constant declaration, many operators have to go (--,++, a?b:c, etc)... and everything I am forgetting right now (the dangerous struct pack attribute...). But we need inline keywords for memory barriers, atomics for modern hardware architecture programming.
[+] [-] wongarsu|1 year ago|reply
1: https://c0.cs.cmu.edu/docs/c0-reference.pdf
[+] [-] glouwbug|1 year ago|reply
[+] [-] butterisgood|1 year ago|reply
[+] [-] accelbred|1 year ago|reply
[+] [-] short_sells_poo|1 year ago|reply
[+] [-] mystified5016|1 year ago|reply
Read up on Forth languages. It's pretty much exactly what you're after.
[+] [-] mhandley|1 year ago|reply
[+] [-] macintux|1 year ago|reply
https://www.goodreads.com/book/show/198207.Expert_C_Programm...
[+] [-] WalterBright|1 year ago|reply
D doesn't have that bug!
In 44 years of C programming, I've never encountered a legitimate use for the 3rd. (Other than Obfuscated C, that is.))
[+] [-] matheusmoreira|1 year ago|reply
[+] [-] dzaima|1 year ago|reply
[+] [-] dfawcus|1 year ago|reply
The obvious one is rather than a function pointer typedef, such the subsequent use in a struct is obviously a pointer. Which helps when others are initially reading unfamiliar structures.
The other case can be somewhat related, namely as an assertion / check when writing such handler functions, and more importantly updating them. When updating code, it allowed for easier to decode compiler errors if the expected type of handler_ty was changed, and some specific handler was incorrectly updated, or not updated at all.Basically the error would generally directly call out the inconsistency with the prior line, rather than with the distanct use in the initialisation of 'table'.
As I recall this mechanism has been around since at least C89, I don't recall using it in K&R.
[+] [-] WalterBright|1 year ago|reply
C was developed on a PDP-11 that had 64Kb of memory. That's not much of any at all. Therefore, the compiler must be extremely tightly coded.
The fundamental rules of the C language are pretty simple. But articles like these expose consequences of such simple rules. Fixing them requires adding more code. Adding more code means less room for the code being compiled.
Therefore, if the intended use of the language works, the pragmatic approach would be to simply not worry about the quirky consequences.
A more interesting question would be "why do these characteristics persist in modern C compilers?"
The stock answer is "backwards compatibility", "Obfuscated C Code contests" and "gotcha job interview questions". My argument would be that there is no reason for the persistence of such "junk DNA" and it should be deprecated and removed.
I've done my part. D doesn't support that stuff, even though the basic use of the language is easily confused with C.
For example:
I've died on that hill. I know others who lost an entire day staring at it wondering what's wrong with it. I saw it on X recently as "99% of C programmers will not be able to find the bug."The equivalent D code:
gets you: C'mon, Standard C! Fix that![+] [-] moefh|1 year ago|reply
Both gcc and clang give a warning[1] for that code with just "-Wall", so I's hard to imagine it being a real problem these days.
[1] https://godbolt.org/z/vfPzhc596
[+] [-] HeliumHydride|1 year ago|reply
[+] [-] dfawcus|1 year ago|reply
[+] [-] WalterBright|1 year ago|reply
But I'm feeling much better.
[+] [-] mystified5016|1 year ago|reply
I can't wait to slip this into some production code to confuse the hell out of some intern in a few years
[+] [-] svilen_dobrev|1 year ago|reply
How to Get Fired Using Switch Statements & Statement Expressions:
https://blog.robertelder.org/switch-statements-statement-exp...
[+] [-] kazinator|1 year ago|reply
Or this (like the puts(puts) in the article):
Could be a declaration of B as an identifier of type A, or a call to a function A with argument B.Back in 1999 I made a small C module called "sfx" (side effects) which parses and identifies C expressions that could plausibly contain side effects. This is one of the bits provided in a small collection called Kazlib.
This can be used to make macros safer; it lets you write a #define macro that inserts an argument multiple times into the expansion. Such a macro could be unsafe if the argument has side effects. With this module, you can write the macro in such a way that it will catch the situation (albeit at run time!). It's like a valgrind for side effects in macros, so to speak.
https://git.savannah.gnu.org/cgit/kazlib.git/tree/sfx.c
In the sfx.c module, there is a rudimentary C expression parser which has to work in the absence of declaration info. In other words it has to make sense of an input like (A)(B).
I made it so that when the parser encounters an ambiguity, it will try parsing it both ways, using backtracking via exception handling (provided by except.c). When it hits a syntax error, it can backtrack to an earlier point and parse alternatively.
Consider (A)(A+B). When we are looking at the left part (A), that could plausibly be a cast or declaration. In recursive descent mode, we are going left to right and looking at left derivations. If we parse it as a declaration, we will hit a syntax error on the +, because there is no such operator in the declarator grammar. So we backtrack and parse it as a cast expression, and then we are good.
Hard to believe that was 26 years ago now. I think I was just on the verge of getting into Lisp.
I see the sfx.c code assumes it would never deal with negative character values, so it cheerfully uses the <ctype.h> functions without a cast to unsigned char. It's a reasonable assumption there since the inputs under the intended use case would be expressions in the user's program, stringified by the preprocessor. Funny bytes would only occur in a multi-byte string literal (e.g. UTF-8). When I review code today, this kind of potential issue immediately stands out.
The same exception module is (still?) used in the Ethereal/Wireshark packet capture and analysis tool. It's used to abort "dissecting" packets that are corrupt or truncated.
[+] [-] jwilk|1 year ago|reply
https://news.ycombinator.com/item?id=40835274 (113 comments)
[+] [-] zzo38computer|1 year ago|reply
Declarations in for loops is something that I had only ever used in macros (I had not found it useful in other circumstances), such as:
(The compiler will optimize out the loop and the declared variable in the use of the lpt_document macro; I had tested this.)[+] [-] teddyh|1 year ago|reply
[+] [-] hulitu|1 year ago|reply
[+] [-] GrantMoyer|1 year ago|reply
[+] [-] SAI_Peregrinus|1 year ago|reply
C is clearly too small to be simple. C++ is too large to be simple. Somewhere in between, there may exist a simple language waiting to be invented.
[+] [-] tpoacher|1 year ago|reply
That's not to say you can't create interesting monstrocities out of it!
[+] [-] betimsl|1 year ago|reply
[+] [-] hackyhacky|1 year ago|reply
In short, you can initialize an array like this, by specifying each element in order:
However, you can also initialize specific array elements: "BASIC compatibility" mode uses the above syntax.[+] [-] unknown|1 year ago|reply
[deleted]
[+] [-] GranPC|1 year ago|reply
[+] [-] andreyv|1 year ago|reply
Using designated initializers without the = symbol is an obsolete extension.
[1] https://gcc.gnu.org/onlinedocs/gcc/Compound-Literals.html [2] https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html
[+] [-] a12k|1 year ago|reply
[+] [-] utopcell|1 year ago|reply
void g(); void f() { return g(); }
[+] [-] dzaima|1 year ago|reply
And IMO it's quite a nice feature, useful sometimes for reducing boilerplate in early returns. It's the obvious consequence if you don't treat void as some extremely-special syntax but rather as just another type, perhaps alike an empty struct (though that's not valid C either ¯\_(ツ)_/¯) that's just implicitly returned at the end of a void-returning function, and a "return;" statement implicitly "creates" a value of void.
In fact in Rust (and probably a bunch of other languages that I'm too lazy to remember) void-returning functions are done via returning a 0-tuple.
[+] [-] gpderetta|1 year ago|reply
[+] [-] AKluge|1 year ago|reply
[+] [-] rramadass|1 year ago|reply
Money quote;
We stopped when we got a clean compile on the following syntax:
for(;P("\n"),R-;P("|"))for(e=3DC;e-;P("_"+(u++/8)%2))P("|"+(u/4)%2);
I am NOT going to try it out.
[+] [-] unknown|1 year ago|reply
[deleted]
[+] [-] psychoslave|1 year ago|reply
But as time pass, I'm more and more convinced that wiping-out every peace of C that was ever produced would be one of the greatest possible gesture for the future of humanity.
I also have a theory that universe exits so we can have some opportunities to taste chocolate. Surely in that perspective, even C can be an unfortunate but acceptable byproduct.
[+] [-] H8crilA|1 year ago|reply
[+] [-] adonovan|1 year ago|reply
[+] [-] uecker|1 year ago|reply
[+] [-] chasil|1 year ago|reply
Should this misfortune befall you, please don't get on an airplane (with me).