Assembly Obfuscation: the dark side of the code

Modern obfuscation techniques rely on control flow or virtualization based obfuscation. Powerful commercial solutions like VMProtect or Themida exist. Program control flow will be convoluted and extremely hard to follow. No chance to grab some insights by static analysis. Hard life for the reverse engineer. But analysts will also have some information at state 0. If you analyze a program protected by virtualization technique you will spot immediately that something is unusual. You will recognize that obfuscation is in place and this will trigger an alarm, setting you in a defensive posture (if malicious program is to be analyzed) or offensive one (if it’s to be pwned).

Despite being hard, you will know in advance that something malicious, or protected from being observed is there.

What if instead an obfuscation technique makes the program apparently not obfuscated.

You will see a simple program doing nothing strange. You execute it and nothing unusual happens. You look at it, and almost everything is ok. Nothing strange triggering your feeling of danger.

What happens is that under certain conditions a program hidden under your program will be executed. Instructions intertwined in the normal execution flow will be loaded in the CPU.

The technique i’m gonna describe here is not that advanced like modern obfuscation techniques, but can be much elegant if applied in depth.

Overlapping assembly instruction

Assembly obfuscation techniques rely on the fact that some assembly families like x86 (or x86_64) have variable length instructions, ranging several bytes. This, combined with the fact that some of the bytes in certain instructions can be chosen arbitrarily, are suitable tools to hide hidden code.

What if a jmp to malicious code is encapsulated in a nop instruction?

Some multi-byte nop instructions are available for example for this purpose. Nop in certain forms has memory operands that aren’t accessed in any way and don’t change the behaviour of the instruction whose sole purpose is to occupy cpu cycles.

NOP DWORD PTR [EAX+EAX + 00000000] 0F 1F 84 00 00 00 00 00

Last 4 bytes of this instruction can be chosen arbitrarily.

0F 1F 84 00 FF 25 XX XX
           └──┴──┴──┴─ JMP [address]

0F 1F 84 00 FF 20 00 00
           └──┴─ JMP [EAX]

Relative jmp or absolute jmp through memory address or registry can be encoded deceptively.

Tricking disassembler

Static disassembler can use linear sweep algorithm or recursive traversal.

Linear disassembler

A linear sweep algorithm starts with the first executable byte and then proceeds, disassembling instruction by instruction. If it finds bytes in the middle of the code it can be confused.

A classic example that is given in the literature is placing a data byte like 0xe8 inside .text section and to skip that byte (e.g. with a relative jmp). That byte will be the opcode of a never executed instruction.

section .text
global _start

_start:
    jmp hidden_call + 1
    
hidden_call:
    db 0xE8                      ; CALL opcode (never executed linearly)
    
    db 0xFF, 0xE0, 0x90, 0x90    ; These 4 bytes are call displacement
                                 ; But also: JMP EAX + 2 NOPs!
    
    nop
    nop
    
    ; Exit
    mov eax, 1
    xor ebx, ebx
    int 0x80

To compile execute:

$ nasm -f elf32 simple_trick -o simple_trick.o

This is the output from objdump:

00000000 <_start>:
   0:	eb 01                	jmp    3 <hidden_call+0x1>

00000002 <hidden_call>:
   2:	e8 ff e0 90 90       	call   9090e106 <hidden_call+0x9090e104>
   7:	90                   	nop
   8:	90                   	nop
   9:	b8 01 00 00 00       	mov    eax,0x1
   e:	31 db                	xor    ebx,ebx
  10:	cd 80                	int    0x80

As you can see objdump disassemble the call instruction because does not recognize that 0xe8 is a data byte. Inside that call a “jmp somewhere else” is hidden. Disassembler is not effectively producing an output useful to spot code flow. The real instructions that are actually executed are encapsulated in a fake instruction, like candy inside a wrapper.

Link the program:

$ ld -m elf_i386 simple_trick.o -o trick

and let’s have a look at gdb output:

●→  0x8049000 <_start+0000>      jmp    0x8049003 <hidden_call+1>
    0x8049002 <hidden_call+0000> call   0x98957106
    0x8049007 <hidden_call+0005> nop    
    0x8049008 <hidden_call+0006> nop    
    0x8049009 <hidden_call+0007> mov    eax, 0x1
    0x804900e <hidden_call+000c> xor    ebx, ebx

Gdb too is believing that a call will follow the jmp relative. It could also be a more innocuous nop if a multi-byte nop is used as we’ve seen in the section Overlapping assembly instruction. With a nop the analyst may believe that nothing will happen, but potential side effects are present like a mine; this can be extremely dangerous.

gdb> nexti

 →  0x8049003 <hidden_call+0001> jmp    eax
    0x8049005 <hidden_call+0003> nop    
    0x8049006 <hidden_call+0004> nop    
    0x8049007 <hidden_call+0005> nop    
    0x8049008 <hidden_call+0006> nop

Reaching the obfuscated instruction, disassembler now dynamically uncovers the real code:

jmp eax

Recursive disassembler

A recursive traversal algorithm instead will try to follow the program flow and be able to escape the data byte. In this case, as described in the literature, opaque predicates can be used to confuse the flow. The idea is placing complementary branching instructions, with the programmer knowing that one of the branches will always be taken, but not the disassembler.

_start:
    xor eax, eax               ; EAX = 0 (ALWAYS)
    test eax, eax              ; Test if zero
    jz hidden + 1              ; ALWAYS taken (jump into hidden code)
    jnz fake_code              ; NEVER taken
    
hidden:
    db 0xE8                    ; CALL wrapper
    db 0x90, 0x90, 0x90, 0x90  ; hide code (just nop for example purpose)
    
fake_code:
    int 0x80

Let’s combine these techniques and try to craft a program that apparently does nothing strange, but hides malicious behaviour.

Code in the Shadows

I’m gonna write a simple test program that is supposed to print the well-known coding phrase Hello World! but instead under conditions decided by the programmer will write Bad World!.

For a primer about inline assembly feature in C you can read [4] and [5].

We are gonna analyze the program with Ghidra, the open source powerful tool for static analysis, that everybody reading this blog article probably knows.

Ghidra try to defeat obfuscation snippets using constant propagation, control flow analysis or eventually symbolic execution. It can remove dead branch (code never executed) or resolve opaque predicates. A real world shellcode neutralizing these mechanisms will be a nice funny task.

Let’s see for example what happens with basic opaque predicates. I will call this sample hello_world.c because it cannot overcome Ghidra.

hello_world.c

#include <stdio.h>

void hello_world() {
    printf("Hello World!\n");
}

void bad_world() {
    printf("Bad world!\n");
}

int main() {
    __asm__ __volatile__(
        // Setup: load bad_world into RAX
        "movq %1, %%rsi\n"
        
        // opaque predicate
        "xor %%rcx, %%rcx\n"              // RCX = 0
        "test %%rcx, %%rcx\n"             // ZF = 1
        "jnz fake_branch\n"
        "jz fake_branch+2\n"              // ALWAYS taken, jumps into the middle of MOV
        "fake_branch:\n"
        // OVERLAPPED INSTRUCTION:
        // Ghidra sees: MOV RBX, imm64
        ".byte 0x48, 0xBB\n"              // MOV RBX, imm64 opcode
        ".byte 0x48, 0x89, 0xF0\n"        // MOV RAX, RSI (3 bytes)
        ".byte 0xFF, 0xD0\n"              // Hidden: CALL *RAX
        ".byte 0x90, 0x90, 0x90\n"  // Padding
        
        "call hello_world\n"
        :
        : "r" (hello_world), "r" (bad_world)
        : "rax", "rbx", "rcx", "rdx", "rsi"
    );
    
    return 0;

Ghidra disassembler is not able to uncover the hidden instruction. After a test instruction a possible jump to fake_branch occurs, then a value will be loaded into rbx and a call to hello_world() will be executed.

MOV RBX, -0x6f6f6f2f000f76b8
CALL hello_world

Hello World disasm

the decompiler instead successfully realize a scary call to bad_world() will occur:

Hello World decompiled

Moreover ghidra is smart to see that instruction overlapping is present, issuing a warning to the analyst.

In this article we assume that the warning is irrelevant for the reversing phase, despite it would surely be enough to trigger an expert reverse engineer.

The goal of this blog article is to give a simple PoC doing something else from what can be seen, visualized, decoded by reverse engineering tools. Certainly for a real world undetectable shellcode more engineering must be applied to avoid those warnings.

Let’s go ahead and craft a more advanced program named bad_world.c trying to make the decompiler at least not able to see the call to malicious routine:

bad_world.c

#include <stdio.h>
#include <stdlib.h>

void hello_world() {
    printf("Hello World!\n");
}

void bad_world() {
    printf("Bad world!\n");
}


int main(int argc, char *argv[]) {
    int offset = 0;
    int compare = 0;

    offset = atoi(argv[1]);
    compare = atoi(argv[2]);

    __asm__ __volatile__(
        // bad_world function pointer into $rsi
        "movq %1, %%rsi\n"
        // offset into $rax
        "mov %2, %%eax\n"
        // compare into $rbx              
        "mov %3, %%ebx\n"

        // overlap address relative to $rip into $rdi
        "lea overlap(%%rip), %%rdi\n"
        // get offset to hidden instruction at runtime to avoid decompiler reconstruct code flow through constant propagation
        "add %%rax, %%rdi\n"
        // get compare at runtime to avoid decompiler resolving opaque predicate
        "cmp $4, %%rbx\n"
        // Opaque predicate. With compare == 4 hidden code will always be executed
        "jne hello_world_fn\n"
        // jump to overlap + offset. With offset == 4 the call in the middle of NOP will be executed
        "jmp *%%rdi\n" 

        // An innocuous NOP
        "overlap:\n"
        ".byte 0x0F, 0x1F, 0x84, 0x00\n"   // NOP prefix (4 bytes)
        ".byte 0xFF, 0xD6\n"               // CALL *RSI (hidden at byte 4-5)
        "jmp hello_world_fn+5\n"           // Padding to complete NOP appearance

        "hello_world_fn:"
        "call hello_world\n"
        
        :
        : "r" (hello_world), "r" (bad_world), "r" (offset), "r" (compare)
        : "rax", "rbx", "rcx", "rsi"
    );
    
    return 0;
}

Some values must be given at runtime to enter tricky paths. I choose to pass those arguments for simplicity. These values can also be packed in a configuration routine too. The important is to fool Ghidra constant propagation.

$ ./bad_world 4 1 
Hello World!

./bad_world 4 4
Bad world!

Code decompilation:

Bad world decompiled

In the decompiled code only a call to hello_world() is reported. Due to dynamic nature of offset and comparison Ghidra cannot resolve the target of the jump.

Function call graph only shows calls to hello_world() and atoi().

Bad World Graph

Function graph does not show any call to bad_world(). This is great. However a JMP RDI is there, requiring more attention.

Bad World Tree

The Disassembler View cannot show that jump. Some views with ability to follow control flow can somehow reveal the code hidden under the instruction.

Disassembly obfuscation is extremely powerful against disassembler but Ghidra decompiler in this case is more resistant.

Anyway we made a nice shot: no calls to bad_world() are reported.

If the analyst doesn’t pay too much attention, or have tons of lines of code to analyze it could miss this malicious code routine.

Reverse engineering tools cannot properly show bad_world program hidden in the shadows of hello_world overlapping instructions.

Conclusion and Further steps

With symbols pulverized, variables governing the flow encrypted or packed in some configuration routines, and shellcode sneakily intertwined under overlapping instructions we can make the dark side of the code appear when nobody expects it.

Anti-debugging, anti-reversing, and VM discovery logic could potentially be broken into pieces and injected inside multi-byte nops, call, lea, cmovc… This will be extremely hard to reverse.

At the same time, study of anti-decompilation techniques is necessary to make the hidden program even harder to be unmasked.

In this article the topic is explored experimentally with a playful approach. For more academic insights, have a look at references listed below.

Have a look also at nice tools playing with assembly syntax to perform other kind of assembly obfuscation: