You know back in the day they used punched cards to store data? In the 1950's programmers would pick out a random punched card and put it in to see if the program reveals some bugs.
Now this has nothing to do with Fuzzing but the idea is quite similar. Now if we give a random input to a program it could reveal undesired behavior. Now fuzzing give programs random inputs observe behavior.
So what does this mean?
Well for starters if a bug is found it should be some loose constraints unless your super lucky. Like if x>0 cause a program to break than Fuzzing would find it! However if a specific value cause problems than Fuzzing wouldn't find it.
So what do we do then?
Fancy word right? Not really
Concrete + Symbolic = Concolic
Its a combination of concrete execution and symbolic execution. Concrete execution just executes the program with fixed value (kinda like fuzzing) (actually I failed to find the difference)
(Symbolic execution assumes a symbolic value and at each branch it would split or something)
The pro and con of concolic execution is basically the pro and con of symbolic execution. As far as I remember the reason concolic execution was useful is related to the problem with symbolic execution and external environment (kernel or something...)
Now as expected, concolic execution can find specific values.
But lets say there is a loop with a branch in it. Concolic execution will continue to branch out causing a state explosion.
You know when there is two things that each have their strength. Ususally people try combining the two. Now Hybrid Fuzzing is exactly that.
Concolic Execution generate test cases and gives it to the Fuzzer to execute. The Fuzzer will report the code coverage to the concolic executor who will come up with new test cases.
Hybrid Fuzzers show promising results. Driller, a Hybrid Fuzzer, demonstrated its effectiveness ranking 3rd in the DARPA CGC competition. Unfortunately Hybrid Fuzzers are not used in the real world.
Now this is the problem this paper attempts to address.
Why are Hybrid Fuzzers not utilized outside of small scale studies? How can we change that?
The problem the paper suggest is that the concolic executor is too slow. QSYM changes the concolic executor to reduce the overhead and even improve the performance.
The reason concolic execution is slow is due to the intermediate representation (IR). Basically when emulating the machine instructions, emulator generates IR which reduces implementation difficulty. For example Intel 64bit has 1795 instructions and a 2000 page manual. Hence, machine instructions are translated using IR which is easier.
IR is a reduced instruction set computer (RISC) which has much less instructions than normal Intel like CPU which is a complex instruction set computer (CISC).
So one instruction in CISC would turn into multiple instructions in RISC hence a 4.69 times increase in the number of instructions. This also adds to the overhead when using IR.
Its not like the people using IR didn't know IR is kind of slow. So they made some efforts in optimization. The emulator accesses these IR instructions in basic blocks. When a block does not effect a symbolic variable, then the block is not executed.
This is a neat optimization however the problem is with the blocks. Approximately 30% of the block is the only instructions that matter. This means further optimization is possible! good right?
No, the IR prevents this from happening due to another optimization.
The IR overhead is a problem so it caches basic blocks. Hence the emulator can only really work with blocks even though only 30% of it is useful.
QSYM removes the IR translation and implements a Dynamic Binary Translator (DBT) to change machine instruction into constraints (to be used for concolic executor).
Does this bring only good? Well QSYM only works for Intel architecture. Why is that? Well the reason people used IR was because this would be difficult to implement so probably QSYM can't be flexible with the architecture. (idk really)
So if we don't have the IR slowing us down than efficient execution is possible. Thanks to this further optimization is possible!
Another problem with hybrid fuzzers is the snapshot. Originally snapshots are used in concolic executors. What do they do? Well basically as you saw the emulator is really slow. So execution is inefficient. The good news is that each execution in the concolic executors actually share branches. This means if we save a snapshot of a branch and return it is really efficient.
However, hybrid fuzzers use test cases and these test cases don't really share branches. Still snapshots are needed because execution is inefficient.
Furthermore, snapshots pull the probram back to the past execution. This actually means the external environment (kernel) cannot really help the program in this context. So these concolic executors need to model these systemcalls and the external environment themselves.
Two ways exist in doing this.
This causes the execution to be inexact. Which itself is a problem.
This makes everything go really slow.
Driller emulates the external environment. (basically 1. up there) So the model is inaccurate. Linux kernal 2.6 has 337 System Calls but angr, a symbolic tracer driller uses, has only 22 system calls.
Well as we seen previously. QSYM has efficient execution since IR is not slowing it down. So it can re-execute every time! This allows QSYM to be a concrete execution and hence the external environment does not cause any problems.
Concolic executors kinda solve logic when they go down the branch. They generate constraint solve it and use the solution to generate test cases (inputs for fuzzer).
Well not all problems are solvable. (In reasonable time)
In concolic execcutor we want to go where we think we are going. This is why the constraint calculation has to be complete and sound.
This is actually slow and sometimes the solver gives up, not generating test cases.
So the concolic executor is so tied up in making this constraint complete, it actually does not generate test cases in certain situations.
Well why do we have the Fuzzer? The Fuzzer can actually check if the path is valid or not. Hence we do not need to worry about the constraint being complete.
So QSYM collects an incomplete set of constraints. If the constraint is over-constrained it would solve only portions of it. This actually allows for more code coverage (which is a good thing)
So basically we don't have a IR instead a Dynamic Binary Translator (DBT). We re-execute the program instead of keeping snapshots. When solving these constraints we act optimistically and avoid over constraints. We ignore some constraints and let the fuzzer take care of the small problems.
Basically the symbolic solver passes the simple obstacles lending way for the fuzzer to explore deeper into the code.
Now these complex constraint result from repeated basic blocks. So these blocks are pruned. There is a slight problem here. Aggressive pruning is bad for QSYM.
So two solutions are put in place. First multiple executions are grouped as one and counted as one. So the count will not go up very fast pruning everything.
Second, the pruning mechanism will becareful with context. If strcmp(buf, "good") and strcmp(buf, "bad") is in the same group we can't count them together right?
Actually the QSYM did quite well in evaluation. (In paper)
It performed better than current Fuzzers or Hybrid Fuzzers. Interestingly improvement in efficiency also improved performance. The pruning perhaps allowed for better coverage and hence the fuzzer could go deeper into the code.
QSYM uses AFL as its Fuzzer. So basically QSYM centralize in working with the concolic executor portion. So clearly ideas here can be implemented into concolic executors instead of just hybrid fuzzing. Furthermore other Fuzzers can be used to demonstrate better performance.