I ran into some surprisingly weird output of both Clang and gcc on a simple code snippet, and I thought I’d share it.
Consider the following C++ function which, in a roundabout way, checks whether an std::array passed as an argument only contains zeros:
#include
static constexpr int arraySize = 1;
bool isAllZeros (const std::array &array) {
std::array allZeros {};
return array == allZeros;
}
In case you’re wondering, the reason why this is correct is that initializing an array with { } results in each element of the array being value initialized, or in other words set to zero.
What happens if we compile this with gcc? Using godbolt and the latest gcc version (15.2), with optimizations on (“-O3”) we get the following x86-64 Assembly code:
isAllZeros(std::array<int, 1ul> const&):
mov eax, DWORD PTR [rdi]
test eax, eax
sete al
ret
Already here we get fairly non-intuitive output! We have set arraySize to 1, so we’re effectively checking whether a single integer value is 0. The generated code does this by fetching the integer value, and’ing it with itself (which results in the same value), and then setting the return value of the function to be equal to the CPU’s zero flag. This is not how your average Assembly programmer would do it, but it’s still easy enough to understand (if perhaps wasteful-looking).
Let’s see what happens if we set arraySize to 2:
isAllZeros(std::array<int, 2ul> const&):
cmp QWORD PTR [rdi], 0
sete al
ret
That’s more like it! Now we’re simply fetching a QWORD-sized block of memory (8 bytes, which corresponds to two integers), comparing it to 0 and setting the return value to be the result of the comparison operation. This is a lot more intuitive than the arraySize = 1 case, and it’s not clear why.
How about arraySize = 3, meaning a 12-byte block ?
isAllZeros(std::array<int, 3ul> const&):
cmp QWORD PTR [rdi], 0
je .L5
.L2:
mov eax, 1
test eax, eax
sete al
ret
.L5:
mov eax, DWORD PTR [rdi+8]
test eax, eax
jne .L2
xor eax, eax
test eax, eax
sete al
ret
Now things are getting really hectic. This time gcc decided to use a mixture of both strategies, using a cmp instruction to check whether the first 8 bytes are zero, and the test instruction to check the remaining 4 bytes.
That’s not the weirdest part though. The strangest bit is the block in between the “.L2” and “.L5” labels, which as far as I can tell is using a very odd sequence of instructions to simply set eax to the value 0. A nearly-identical sequence is used at the end of the code to set the return value to 1.
How about clang? Surely we won’t see two different compilers behaving oddly here? Again we use “-O3” and the latest version on godbolt, which is Clang 21.1.0.
With arraySize = 1:
isAllZeros(std::array<int, 1ul> const&):
cmp dword ptr [rdi], 0
sete al
ret
Phew! Looks good and normal.
How about with size 2?
isAllZeros(std::array<int, 2ul> const&):
mov qword ptr [rsp - 8], 0
cmp qword ptr [rdi], 0
sete al
ret
Mmmm. The actual comparison code looks just as good, but there’s a new inefficiency here which gcc avoided: specifically, the first instruction which is writing a zero to the stack. This corresponds to initializing the allZeros variable on the stack, even though the rest of the Assembly code never reads this value.
Lastly and for completeness, here is the output from Clang with size 3:
isAllZeros(std::array<int, 3ul> const&):
mov dword ptr [rsp - 8], 0
mov qword ptr [rsp - 16], 0
mov eax, dword ptr [rdi + 8]
or rax, qword ptr [rdi]
sete al
ret
Here we see Clang using bit manipulation with the or instruction to generate a value that is nonzero when any of the input elements are nonzero, and then setting the result of the comparison based on the result of the bitwise or operation. Clever really! However, the unnecessary initialisation of allZeros on the stack is still present. One last thing which is unclear is: why did clang not perform these unnecessary writes when arraySize was just 1?
Moral of the story? As advanced as compilers are, we certainly can’t trust them to generate optimal code, or even to be predictable when doing seemingly trivial changes to code (such as changing the size of an array).
Note: If I missed anything I apologise in advance – please send any feedback to: the name of this blog (see the URL) @gmail.com