Deep Dive into Linux Heap Memory Management: From Basics to Core Exploitation

If you work in systems programming, C/C++ performance, or low-level security, heap internals will eventually matter. Stack overflows have been largely tamed by canaries and NX, but heap bugs — use-after-free, double free, heap overflow — remain reliably exploitable. Modern mitigations (ASLR, PIE, RELRO) just make the attacker work harder; they don’t eliminate the bugs.

The heap allocator is both a memory manager and a caching layer between your program and the kernel. This article covers ptmalloc2 (glibc’s allocator), from the syscall interface up through arenas, chunks, bins, tcache, and the exploit techniques that target each layer.


I. System calls behind malloc

Before discussing allocator internals, understand how the allocator itself gets memory from the kernel. ptmalloc2 uses two syscalls, chosen based on allocation size.

brk / sbrk

When the kernel loads an ELF binary, the process gets a program break pointer at the end of its data segment. sbrk moves this pointer up (allocate) or down (free). It’s fast — a simple pointer bump — and the allocated memory is contiguous.

The catch: you can only free memory at the end of the heap. If you allocate A, B, C and then free B, you cannot brk back — C is still in the way. That freed B is stuck, and the kernel can’t reclaim the pages until everything past them is freed. This is why brk is used mainly for small allocations where the allocator expects to reuse the space internally.

mmap / munmap

For large allocations (default threshold: 128KB, controlled by MMAP_THRESHOLD), glibc switches to mmap. It creates an anonymous mapping in the process’s address space, independent of the main heap. When you free it, munmap returns the pages to the kernel immediately — no fragmentation issue.

The tradeoff: mmap is more expensive than brk (page table manipulation, TLB shootdown on munmap). But for large allocations, the fragmentation cost of brk is worse. A practical gotcha: if your application allocates and frees many large buffers in a loop, you’ll see high sys time from the repeated mmap/munmap calls. Tuning M_MMAP_THRESHOLD via mallopt() can help, but I’ve also seen it backfire badly when people set it too high and the brk heap grows uncontrollably.

What you see in /proc/pid/maps

08048000-08049000 r-xp 00000000 08:01 1234    /bin/a.out
08049000-0804a000 rw-p 00001000 08:01 1234    /bin/a.out
0804a000-0806b000 rw-p 00000000 00:00 0       [heap]      ← brk-managed
b7e00000-b7f00000 rw-p 00000000 00:00 0                   ← mmap'd

The [heap] segment grows via brk. The anonymous region without a label is an mmap allocation (either a thread arena or a large direct mapping).


II. Arena mechanism and concurrency

Why arenas exist

Early malloc implementations had a single global lock. On multi-threaded workloads, threads would spend most of their time waiting for the allocator. ptmalloc2 solved this with arenas — independent memory pools, each with its own lock.

  • Main arena: tied to the brk heap. Created at process start.
  • Thread arenas: backed by mmap segments. Created as needed.

Arena limits

Creating an arena per thread wastes address space. glibc caps the number:

  • 32-bit: 2 * CPU cores + 1
  • 64-bit: 8 * CPU cores + 1

On a 4-core 64-bit machine, that’s 33 arenas max. Beyond that, threads share existing arenas via mutex_trylock. If no arena is available, the thread blocks until one is released. I’ve debugged production services where the bottleneck wasn’t application logic but the allocator — on a 128-core machine, 1025 arenas sounds like a lot, but with thousands of short-lived threads, contention was still measurable.

Observing arena allocation

The following program demonstrates how thread arenas appear in the memory map:

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>

void* threadFunc(void* arg) {
    printf("Thread %ld before malloc\n", (long)arg);
    getchar();
    char* addr = (char*) malloc(1000);
    printf("Thread %ld after malloc\n", (long)arg);
    getchar();
    free(addr);
    printf("Thread %ld after free\n", (long)arg);
    getchar();
    return NULL;
}

int main() {
    pthread_t t1;
    printf("Main PID: %d\n", getpid());
    printf("Before any malloc\n");
    getchar();

    char* addr = (char*) malloc(1000);
    printf("After main-thread malloc\n");
    getchar();

    free(addr);
    printf("After main-thread free\n");
    getchar();

    pthread_create(&t1, NULL, threadFunc, (void*)1);
    pthread_join(t1, NULL);
    return 0;
}

Run it and inspect /proc/<pid>/maps at each pause:

Before malloc — no heap segment:

0804a000-0804b000 rw-p 00001000 08:01 539624  /home/user/arena_test

After main-thread malloc[heap] appears, allocated via brk:

0804b000-0804c000 rw-p 00000000 00:00 0       [heap]

Even 1000 bytes requested triggered a 132KB allocation (the default arena size).

After thread malloc — thread arena, allocated via mmap (not adjacent to data segment):

b7500000-b7520000 rw-p 00000000 00:00 0       (thread heap)

After free — the memory is not returned to the OS. glibc keeps it in its internal free lists. This is a common source of confusion: free() does not mean “memory goes back to the system.” It means “memory goes back to glibc’s pool.” RSS stays the same.


III. Allocator data structures

heap_info

Thread arenas can consist of multiple mmap’d segments. glibc tracks them with heap_info:

typedef struct _heap_info {
    mstate ar_ptr;           /* the arena that owns this segment */
    struct _heap_info *prev; /* previous heap segment */
    size_t size;             /* total size of this segment */
    size_t mprotect_size;
    char pad[-6 * SIZE_SZ & MALLOC_ALIGN_MASK];
} heap_info;

This links segments back to their arena. Not something you touch directly, but useful to know when reading crash dumps — corruption in this structure can cause arena lookup to dereference garbage.

malloc_state (arena header)

Each arena’s state lives in malloc_state. It holds the lock, bin lists, top chunk, and fastbin tracking. This structure is a common target for heap attacks — corrupt it and you control future allocations.

struct malloc_state {
    mutex_t mutex;
    int flags;
    mfastbinptr fastbinsY[NFASTBINS];
    mchunkptr top;
    mchunkptr last_remainder;
    mchunkptr bins[NBINS * 2 - 2];
    unsigned int binmap[BINMAPSIZE];
    struct malloc_state *next;
    INTERNAL_SIZE_T system_mem;
    INTERNAL_SIZE_T max_system_mem;
};

malloc_chunk

Every allocation and free block is a malloc_chunk. The structure is the same for allocated and freed chunks — what changes is which fields are active:

struct malloc_chunk {
    INTERNAL_SIZE_T prev_size;  /* size of previous chunk (only if previous is free) */
    INTERNAL_SIZE_T size;       /* this chunk's size + flag bits */
    struct malloc_chunk* fd;    /* forward link — only if free */
    struct malloc_chunk* bk;    /* backward link — only if free */
    struct malloc_chunk* fd_nextsize; /* large bin only */
    struct malloc_chunk* bk_nextsize; /* large bin only */
};

The critical trick: when a chunk is allocated, the fd and bk fields are reused as user data. The pointer malloc() returns actually points to where fd would be — two machine words past the start of the header. When the chunk is freed, the allocator writes fd and bk at those same locations.

This means: when you have a use-after-free bug, the data you read from a freed pointer is actually allocator metadata (bin pointers). And when you write to a freed pointer, you’re corrupting those pointers. This is the root cause of most heap exploitation.

Chunk format evolution

The current chunk format went through four stages:

Stage 1: Implicit free list Earliest allocators used just a size field. The low alignment bits (always 0 on 8-byte aligned sizes) were repurposed as a PREV_INUSE flag. Finding the next chunk: add size to the current address. Coalescing required scanning.

Stage 2: Knuth boundary tags Knuth added a footer (size copy) at the end of each chunk. This made backward coalescing possible — find the previous chunk’s footer by subtracting from the current address. Cost: every chunk has both header and footer overhead.

Stage 3: Optimized boundary tags Since only free chunks need footers, glibc stores the PREV_INUSE bit in the current chunk’s size field to indicate whether the previous chunk is free. If the previous chunk is allocated, the memory that would be prev_size is part of its payload. This is the format used today.

Stage 4: Multi-thread flags Two more flag bits were added:

  • IS_MMAPPED (bit 1) — chunk was allocated via mmap
  • NON_MAIN_ARENA (bit 2) — belongs to a thread arena

Three flags in the low 3 bits of size, possible because all chunk sizes are multiples of 8 (or 16 on 64-bit):

| size | bit 0: PREV_INUSE (P)
|      | bit 1: IS_MMAPPED (M)
|      | bit 2: NON_MAIN_ARENA (N)

Top chunk

The last chunk at the high end of an arena. When no bin has a suitable free chunk, the allocator carves from the top chunk. If the top chunk is too small, the arena is extended (sbrk for main, mmap for thread).

Why attackers love it: corrupt the top chunk’s size (set it to -1), and the House of Force attack becomes possible. Request an enormous allocation, the top pointer wraps around, and the next allocation lands at an attacker-chosen address. This attack is harder with modern ASLR, but I’ve seen it work in CTFs and on older embedded systems.

Last remainder chunk

When the allocator splits a chunk from the unsorted bin for a small request, the leftover goes into the unsorted bin as the “last remainder.” Subsequent small allocations split from it, keeping allocations adjacent — good cache locality. This is purely an optimization; you’ll never notice it unless you’re tracing allocator behavior.


IV. Bins: how free chunks are organized

The allocator doesn’t scan the heap linearly. Free chunks are organized into bins keyed by size. Two arrays in malloc_state:

mfastbinptr fastbinsY[NFASTBINS];  // fast bins (singly linked)
mchunkptr bins[NBINS * 2 - 2];     // normal bins (doubly linked)

Where NBINS = 128:

  • bin 1: Unsorted bin
  • bins 2–63: Small bins (62 bins)
  • bins 64–126: Large bins (63 bins)

Fast bins

Size range: up to 80 bytes on 32-bit, up to 160 bytes on 64-bit (incrementing by 8/16).

Singly linked, LIFO. No coalescing — freed fast chunks keep PREV_INUSE set, so they never merge with adjacent free chunks. This is fast but causes fragmentation.

Security: Because fast bins are singly linked, early glibc had no integrity checks on the forward pointer. The classic Fastbin Double Free attack: free(A); free(B); free(A) creates a cycle A → B → A. Next malloc returns A — still in the list. Write a fake fd pointer, and the following malloc returns an arbitrary address.

Unsorted bin

A staging area. Freed chunks (non-fast, non-mmap) land here first. When malloc runs, it scans the unsorted bin:

  • Exact match → return it.
  • Too large → split, remainder stays.
  • Otherwise → move to the correct small/large bin.

This gives recently freed chunks a chance to be reused without the cost of sorting.

Small bins

62 bins, each holding chunks of one fixed size (no splitting needed). FIFO (allocate from rear, free to front). Adjacent chunks are coalesced on free.

Large bins

63 bins, each covering a range of sizes with increasing step sizes:

Range Step Count
512–576 64 bytes 32
1024–… 512 bytes 16
4096 B 8
32768 B 4
262144 B 2
Rest 1

Within each bin, chunks are sorted by size (decreasing). fd_nextsize/bk_nextsize pointers skip directly to the best-fit chunk.

A bitmap (binmap) tracks which bins are non-empty — the allocator skips empty bins in O(1).

Allocation priority flow

When malloc(size) is called:

  1. Fast bin — size fits? Try the fast bin.
  2. Small bin — small request? Try exact small bin.
  3. Unsorted bin — scan once. Exact match returns. Sort remaining chunks.
  4. Large bin — search via binmap for best fit. Split if needed.
  5. Top chunk — carve from top.
  6. Extend arenasbrk (main) or mmap (thread).

V. Heap exploitation: how metadata becomes attack surface

All heap attacks follow the same pattern: corrupt allocator metadata so that a future malloc returns memory at an attacker-controlled address.

Use-after-free (UAF)

Call free(p), then read or write through p. The freed chunk is in a bin; its fd/bk pointers are allocator metadata. Reading gives you a heap/libc leak (useful for bypassing ASLR). Writing lets you corrupt the bin list.

Common pattern: allocate object A, free it, the fd pointer now points to another free chunk (or is null). If you can write through the dangling pointer, you can redirect the bin list. Next malloc of the same size returns A (from the bin). Next malloc returns whatever address you wrote into fd.

Double free

Free the same pointer twice without an intervening allocation. In fast bins: free(A); free(B); free(A) creates A → B → A. Allocate twice to get A, overwrite its fd with a target address, then the third allocation returns that target.

Modern glibc detects same-pointer double-free by checking the fastbin head. The bypass is to free a different chunk in between.

The unlink() macro removes a chunk from a doubly linked bin:

#define unlink(P, BK, FD) { \
    FD = P->fd;             \
    BK = P->bk;             \
    FD->bk = BK;            \
    BK->fd = FD;            \
}

Early glibc had no integrity checks. If you controlled the fd and bk of a freed chunk (via heap overflow), you could write an arbitrary value to an arbitrary address: FD->bk = BK writes the value of BK to FD + offsetof(bk).

Safe-unlink (glibc 2.27+): checks that FD->bk == P && BK->fd == P. The attacker can no longer use arbitrary addresses — they must find a location that already points to P (e.g., a global variable in the .bss section). This shifted exploitation toward targeting known pointers rather than crafting fake ones.

House of series

These are named exploitation patterns targeting specific structures:

  • House of Force: corrupt the top chunk size → force it to wrap around → allocate at an arbitrary address.
  • House of Spirit: free a fake chunk on the stack or BSS → next malloc returns a stack address → overwrite return address.
  • House of Lore: corrupt small/large bin bk pointer → malloc returns an arbitrary address.

Most of these are mitigated by modern glibc, but they still appear in CTFs and legacy systems.


VI. Modern mitigations

Tcache (glibc 2.26+)

Per-thread cache for frequently used chunk sizes. Reduces lock contention: freed chunks go to tcache first (no lock needed), and allocations from tcache skip the arena lock.

Early tcache was trivially exploitable: no integrity checks, no double-free detection. Overwrite the forward pointer and the next allocation returns your target. This made tcache poisoning the go-to attack for several glibc versions.

Safe-linking (glibc 2.32+)

XOR-encodes forward pointers in singly linked lists (fastbin, tcache):

#define PROTECT_PTR(pos, ptr) \
    ((__typeof (ptr)) ((((size_t) pos) >> 12) ^ ((size_t) ptr)))

The stored pointer is (pos >> 12) ^ real_ptr. The attacker needs a heap address leak to forge a valid pointer. This raised the bar significantly — a simple UAF is no longer enough; you also need to know where the chunk lives.

Key takeaway

The cat-and-mouse game between heap exploitation and mitigation is ongoing. Each glibc release closes one technique and the community finds another. The constant is this: the bugs (UAF, double free, overflow) are still there — only the exploitation path changes.


VII. Debugging heap issues

Valgrind

valgrind --leak-check=full ./program. Simulates every instruction, tracks shadow state for every byte. Catches off-by-one writes (the redzones between allocations). Slow (10-50x slowdown) but extremely thorough. I use it for initial triage, rarely for ongoing debugging.

AddressSanitizer

-fsanitize=address -g. Compile-time instrumentation: redzones around allocations, checks on every memory access. Catches use-after-free, heap overflow, stack buffer overflow, global buffer overflow. Runtime overhead ~2x (not 10-50x like Valgrind). This is my go-to for heap bugs.

glibc internal monitoring

  • malloc_stats(): prints arena stats to stderr.
  • mallinfo2(): returns allocator state (allocated space, mmap usage, fastbin count).

For long-running services, periodic mallinfo2() samples can reveal heap growth patterns that simple RSS monitoring misses. I’ve used this to find allocator-induced memory growth where the application itself had no leak — glibc just wasn’t returning pages to the OS.

MALLOC_CHECK_ and glibc debugging

Setting MALLOC_CHECK_=3 (or glibc.malloc.check=3 via tunables) enables extra heap integrity checks. On double-free or corruption, glibc aborts with a diagnostic rather than silently continuing. Not suitable for production (slows down all allocator operations) but invaluable for debugging.


VIII. Summary

glibc’s ptmalloc2 balances speed, fragmentation, concurrency, and backward compatibility. Arenas reduce lock contention. Chunk metadata enables coalescing. Bins organize free chunks for fast allocation. The same structures make heap exploitation possible — every metadata field is a potential corruption target.

If you understand arenas, chunks, bins, tcache, and the major mitigations, you can debug allocator behavior, investigate crashes, and read exploitation writeups without treating them as magic. The bugs are still there; only the exploitation path changes.

(End of Document)