Ways to break your systems code using volatile (2010)

[+] kazinator|2 years ago|reply

There are gaping omissions in this article.

There is a need for volatile in portable C programming, in two circumstances:

1. When an asynchronous signal handler modifies a variable that is inspected by the interrupted mainline code (e.g. to set a flag indicating that the signal went off), that variable must be of type "volatile sig_atomic_t". (The article points to some outside discussion by Hans Boehm about this in relation to Unix, but it's not just for Unix; it's in ISO C.)

2. When a function, after saving a context with setjmp(), modifies automatic local variables, and then the context is restored with longjmp(), those variables must be declared volatile, or else they will not reliably have the correct values. (E.g. the longjmp may or may not restore the values which they had at setjmp time, or do it for some of those variables but not others.)

No matter how a C compiler treats volatile, in order to be ISO C conforming, if the program correctly uses volatile in the above situations, it must somehow work. Even if it is useless for anything else: threads, hardware, ...

  {
    volatile int i = 0;

    if (setjmp(jmp_buf) == 0) {
      i++;
      longjmp(jmp_buf, 1);
    } else {
      printf("i == %d\n", i);
    }
  }

Here, the printf should produce i == 1, which is not required if the volatile is removed.

For instance, if i is located in a machine register, and setjmp/longjmp work by saving and restoring registers (or just some registers, including that one), the longjmp operation will restore the value that existed in the register at the time setjmp was called, which is zero.

If that's a problem in a given compiler, even if it has a garbage implementation of volatile, it has to pay attention to the fact that volatile is being used in setjmp code.

[+] dnedic|2 years ago|reply

This won't work for:

1. Larger than word size variables

2. Out of order CPUs

3. On multicore CPUs when another core handles the signal

Atomics must be used here for proper synchronization, when they are available. If not, architecture-specific mechanisms should be used.

[+] jstimpfle|2 years ago|reply

For 2. that seems like a defect in the specification of setjmp()? The equivalent "just works" for functions like pthread_mutex_lock() etc. Those calls implicitly add barriers to force reloading.

[+] kazinator|2 years ago|reply

For some reason, this is an older version of my comment. I am sure I updated it to remove "gaping omissions", and noted that the article references a discussion by Hans Boehm of signal handlers in Unix programs. It is not Unix-specific though; ISO C specifies signals to some extent and that bit with volatile sig_atomic_t. It can exist in any platform.

[+] bigcheesegs|2 years ago|reply

For 1) lock-free atomics are also allowed.

In C++ there's actually a lot more freedom. You can access non-atomic non-volatile-std :: sig_atomic_t variables as long as you don't violate the data race rules.

[+] comex|2 years ago|reply

(2010)

The post is still accurate, but in 2011 C and C++ added atomics, which are a more portable alternative to uses of volatile for atomicity. They can be more efficient in some cases than the locks suggested by the post, especially in CPUs with higher core counts. (Note that dual-core consumer CPUs were around by 2010 but had only existed for a few years. Linux only finished removing the Big Kernel Lock in 2011.)

[+] quelsolaar|2 years ago|reply

C11 did add _Atomic, BUT, they are not more portable than using volatile.

In C11 any type of any size can have an atomic qualifyer. That means you can have a 50 byte struct that is an atomic. No hardware has a 50 byte atomic instruction so that is not implementable using atomics. The standard gets around this by letting an implementation have a hidden mutex to guarantee that the operations will be atomic.

The problem with this is Windows. Windows lets an application load dynamicaly and shared libraries (DLL). This breaks the C11 Atomic model. Let me illustrate using an example:

Application A creates an atomic data structure, and the implementation creates a mutex for it. Application B does the same thing. Application A wants to share this data structute with dll X. It then has to share its mutex with the DLL sop that the DLL and application uses the same syncronization primitive. Now Application B wants to do the same thing, problem is DLL X cant use Application Bs Mutex, becaus it is required to use Apllication As mutex.

C11s Atomics will never be implemented on Windows because they cant be! Besides, all major compilers do support intrinsic atomics using volatile, that are nearly identical, (and in some ways better understood) so thats what I recomend using. Linus has indicated that he thinks the C11 concurrent memory model is broken so the kernel will continiue to use volatile and intrincics.

[+] lelanthran|2 years ago|reply

> The post is still accurate, but in 2011 C and C++ added atomics, which are a more portable alternative to uses of volatile for atomicity.

Atomics and volatile solve different problems, though. Atomics ensure a read or write completes uninterrupted. Volatile ensures that a read or write is never optimised away.

I think C11 atomics can be optimised away (for example, reading a value twice in a row might result in only a single actual read).

Happy to be corrected, though.

[+] bjackman|2 years ago|reply

In the Linux Kernel instead of marking variables/types as volatile you mark _accesses_ as volatile. there's a pair of macros READ_ONCE/WRITE_ONCE that temporarily cast the pointer for you. I think this is a better way to use volatile.

Even then I think it's rarely useful outside of x86-specific code (where the CPU gives you quite a lot of memory ordering guarantees). Would be interesting to check how often it gets used elsewhere.

[+] lelanthran|2 years ago|reply

This is a very good post; too many people (even myself, sometimes) forget that volatile doesn't mean that the statement containing it cannot be reordered.

[EDIT: the one thing he missed, which I would have liked to know, is about using volatile with int (or sig_atomic_t) as an "eventually consistent" value, for example one global `sig_atomic_t end_flag = 0;`, a single writer (a SIGINT handler to set it to 1), and many threads with `while (end_flag == 0) { ... }` loops.

I've been using this pattern for a while with no obvious problems - access to `end_flag` can be rearranged by the compiler, barriers are irrelevant, the value can be corrupted by a race on every read and it won't matter - the thread will get the eventual value of end_flag on the next loop and end.]

[+] bonzini|2 years ago|reply

> one global `sig_atomic_t end_flag = 0;`, a single writer (a SIGINT handler to set it to 1), and many threads with `while (end_flag == 0) { ... }` loops.

While unlikely to cause problems, this is a data race (a set of at least two concurrent accesses, of which at least one is not an atomic access and at least one is a write) and therefore constitutes undefined behavior.

sig_atomic_t is only safe to use from one thread, where concurrency is given by a signal handler.

[+] gpderetta|2 years ago|reply

That use of volatile for signalling thread exit, although it might work ok in practice, is still a data race and so UB under C11/C++11 MM rules.

[+] kelnos|2 years ago|reply

I suspect sig_atomic_t does work fine when we're talking about POSIX signals, but OP was probably thinking more from an embedded programming and hardware interrupt handlers, which don't conform to POSIX signal semantics.

[+] vvern|2 years ago|reply

A place that volatile shows up in C today that is stunningly handy is when writing eBPF programs.

In eBPF you have to appease a verifier which is trying to prove safety and liveness properties of the compiled program. To prove safety properties you often need to ensure that some offset into a buffer (bpf map) will be in bounds. Even if you judiciously sprinkle such bound checks into your code, the compiler may eliminate them entirely or perform them on some different register or stack value it knows to be semantically sufficient. Unfortunately the verifier is not as smart as the compiler.

Using volatile to reload some offset just before bounds checking it and using it to index the map is a very reliable approach to getting code to verify.

[+] dfox|2 years ago|reply

I'm not entirely convinced that the ninth case is necessarily an miscompilation. My understanding of the standard is that C99 6.7.3 p6 and C11 6.7.3 p7 allows for exactly this behavior. One can argue (as the author does in the linked paper) that the last sentence is about what it means hardware-wise, but completely optimizing away statements that have no effect on the abstract machine level ('x;', 'x = x;', ...) is something that seems not only permitted but even reasonable.

But it does not make much sense to actually want to force a read for side-effect in portable code. For a MMIO register of some MCU/SoC the code is not going to be portable in any meaningful sense anyway, and for things like portable OS drivers for weird hardware (like VGA...) you have to use some kind of OS-provided macro that does the right thing wrt. caching and barriers anyway.

[+] david2ndaccount|2 years ago|reply

In a discussion like this you have to mention that Microsoft’s compiler extended volatile to mean atomic[1], although its default behavior depends on target ISA apparently. Regardless, just use c11/c++11 atomics at this point.

1. https://learn.microsoft.com/en-us/cpp/cpp/volatile-cpp?view=...

[+] gpderetta|2 years ago|reply

Yes but they consider the additional semantics a mistake:

"we strongly recommend that you specify /volatile:iso, and use explicit synchronization primitives and compiler intrinsics when you are dealing with memory that is shared across threads."

/volatile:iso is the default for ARM as the extended semantics would be extremely penalising.

[+] google234123|2 years ago|reply

Reminded me of this article : http://www.ddj.com/cpp/184403766 - volatile: The Multithreaded Programmer's Best Friend By Andrei Alexandrescu, February 01, 2001

[+] 2rsf|2 years ago|reply

As a former embedded engineer on older Motorola and ARM processors I have seen reordering happening, and more than once had to check the generated assembly code, the other items more or less makes sense if you don't expect too much from your compiler, for example using volatile to get atomicity.

Using volatile on multi-threaded code is ok as long as you know what you are doing, for example kicking a watchdog at a defined physical address could be fine from different threads.

[+] qooiii2|2 years ago|reply

I've been bitten by reordering. In my case, the toolchain developers implemented the reordering step in the assembler as an extra optimization step (on by default of course), so I had to disassemble the binary to even find the problem. They had redefined the assembly language semantics to require "volatile" keywords wherever you needed ordering maintained. I turned that particular optimization off.

[+] unknown|2 years ago|reply

[deleted]

[+] unwind|2 years ago|reply

(Meta: As already pointed out by @comex, this post is from 2010 and it would be helpful with a tag in the title to make that clearer.)

That said, it's an awesome post (not surprising considering the source). I found the initial set-up explaining the concept of C's abstract machine very succinct and nice, it's something I would wish more people discussing the language to be (well) aware of.

Great post, thanks!

[+] rpaddock|2 years ago|reply

I can add a bit of historical context here. John wrote that during the development of WINAVR, the first GCC compiler for AVR. The discussions can be found in the AVR-GCC mailing list archives.

When the 'Small' optimization is used in AVR GCC it is aggressive and leads often to broken code if 'volatile' is not properly used.

AVR GCC, using Small optimization, would remove any variable that has no side effects from the compiler's view. Setting a value in an interrupt handler would be outside of the view of the compiler's abstract machine, so 'flag' variables were often removed. Changes in hardware registers fall into the same category. These removals result in broken code.

Volatile is never a replacement for atomics or proper locks/mutexes/semaphores. If hardware such as an interrupt handler or hardware register is not involved, then using volatile is a creating a race condition, which may or may not ever become apparent.

Erich Styger wrote more recently about volatile here.

https://mcuoneclipse.com/2021/10/12/spilling-the-beans-volat...

John and I both make an appearance in the linked parent particle to that one.

Any earlier one of Erich's:

https://mcuoneclipse.com/2013/08/14/volatile-can-be-harmful/

If changing the optimization level breaks something, it is probably a missing volatile when doing bare hardware.

[+] unknown|2 years ago|reply

[deleted]

63 comments