Nowadays compilers are extremely capable of elimiting 'dead' code. But in benchmarking we need some seemingly 'useless' code to do the work we want. For example, to do computation in registers in order to measure instruction latency and throughput. In these cases, we need a mechanism to temporarily disable optimization.

This document keep an up-to-date collections of techniques to temporarily disable compiler optimizations. These techniques are tested on a recent gcc or icc compiler.

## Using volatile

volatile makes compilers be extremely careful (or conservative) on memory load and store of a variable. Most compilers tend to do no optimizations at all (even for proven dead code). So the volatile technique becomes the top 1 used.

For example:

void doTest() {
volatile int a = 0;
a += 1;
}

g++ -std=c++98 -O2 compiles the above code down to

movl   $0x0,-0x4(%rsp) mov -0x4(%rsp),%eax add$0x1,%eax
mov    %eax,-0x4(%rsp)
retq

icpc -std=c++98 -O2 compiles it down to

movl   $0x0,-0x8(%rsp) incl -0x8(%rsp) retq ## Empty assembly Another current weakness of compilers is that they are careful with inline assembly. Some version of gcc and icc tends to leave inline assembly touched variable intact. This becomes the second technique to prevent optimizations. For example, Facebook’s Folly library uses the following doNotOptimizeAway function to prevent optimizing an expression: template <typename T> inline doNotOptimizeAway(T&& datum) { asm volatile ("" : "+r" datum); } void doTest() { int a; doNotOptimizeAway(a = 0); doNotOptimizeAway(a += 16); } icpc -std=c++11 -O2 compiles the above down to movl$0x0,-0x8(%rsp)
mov    -0x8(%rsp),%eax
mov    %eax,-0x8(%rsp)
addl   $0x10,-0x8(%rsp) mov -0x8(%rsp),%edx mov %edx,-0x8(%rsp) incl -0x8(%rsp) mov -0x8(%rsp),%ecx mov %ecx,-0x8(%rsp) retq g++ -std=c++11 -O2 compiles straightly down to xor %eax,%eax add$0x10,%eax
retq
 The +r modifier and the volatile modifier in the assembly is essential. +r means the datum is both read from and write to by the assembly, so compiler can not optimized it out. volatile stops the compiler from removing the empty assembly. If you try =r instead of +r, gcc will optimize it away but icc will keep it.

## Compiler specific pragma

Compilers provides ways to control their optimizer. gcc provides pragma GCC as a way to control temporarily the compiler behavior. By using pragma GCC optimize("O0"), the optimization level can be set to zero, which means absolutely no optimize for gcc.

For example:

#pragma GCC push_options
#pragma GCC optimize("O0")
void doTest() {
int a;
a = 15;
a += 1;
}
#pragma GCC pop_options

g++ -std=c++98 -O2 compiles it straight forwardly down to

movl   $0xf,-0x4(%rsp) addl$0x1,-0x4(%rsp)
retq

icpc -std=c++98 -O2 compiles it down to

sub    $0x10,%rsp movl$0xf,-0x10(%rbp)
mov    \$0x1,%eax
retq
 The above pragma can be replaced by function attribute __attribute__((optimize("O0")))