|Tuesday, May 23rd 2017, 19:58||
-- Debugging an application which only works in the debug build
Recently, I had an interesting bug in one of my C applications where the application would work in the debug build but not in the release build (-O3 etc.). It turned out to be a really stupid mistake of mine where I went 'How could that ever work?!' after finding it out. However, this opened my eyes to how strong gcc's optimization feature really is.
In this article, I will explain the mistake I made, how I went about solving it and what I learned about gcc in the process.
Finding the Problem
Usually one would use a debugger to trace down the issue to a specific line of code. That, however, is not possible in this case as the program only fails in the release build which does not contain any (or at least less) debug information. Therefore, I used some printfs in the code to roughly locate the error. That lead me to a function which returned NULL even though it returns a valid pointer in the debug build.
That function was the following "constructor". It allocates some memory for a struct, initializes it and returns the pointer (can you spot the error already?):
So, to debug it, I went ahead and took a look at the assembly of it for both the release build and the debug build using radare2.
What, where did my function go?! Only the return statement is left! *Staring intensely at the C-Code* *Adding some debug printfs to print pointer values* *What? NULL?!* *More staring...*
→ Ohh I forgot the return statement!! After adding that, everything worked perfectly fine.
Why Did It Work in the Debug Build?!
Only one thing left to ask: How could that ever work?! Let's take a look at the debug build assembly for that.
Okay, first of all it is not optimized away. Cool. So let's see why this function still returns the pointer. This will by no means be a tutorial on x86_64 assembly, but the important thing to know here is that return values are stored in register rax before calling ret. This means that the malloc'ed pointer needs to be in rax before we exit this function. malloc itself is a function - called @ 0x00407968 - which means the malloc'ed address is stored in rax after it returns. Makes sense, the "test rax, rax" checks whether malloc returned NULL and skips the value initialization if necessary. If it's okay, we init the values (the many mov instructions) and then -- just return. No copying the result to the return register. Just by sheer luck, the malloc'ed pointer is still in register rax when returning which is why this worked in the first place.
However, the truly interesting thing here is the "power" of GCC. I have this function, which calls malloc (which does already change memory!) - then even write to that memory before just exiting. Even though all that happens, GCC still noticed that the code does not have any effect whatsoever and removed it completely - an entire function!
That was the point at which I checked my cmake file and noticed that I am compiling with only very few warnings enabled. After adding -Wall, sure enough, the compiler told me about the missing return (*cough* and a few more *cough*).
Just for reference, the project is this one.
TLDR / Lessons learned:
- Always compile with
- GCC is scary