Buddy System

Motivation

Buddy system 運用分配給 kernel 的 physical memory 是連續的這個特性來解決 external fragmentation 的問題

運作原理

分配（Allocation）

請求大小先向上取整至最近的 $2^{k}$ （例如 21 KB → 32 KB）。
在對應 order $k$ 的 free list 裡找空閒塊。
若找不到，往上一層（order $k + 1$ ）取一塊，對半分裂成兩個 buddy，左邊拿去用，右邊放入 order $k$ 的 free list。
遞迴重複，直到大小吻合。

例子： 256 KB 可用，kernel 請求 21 KB：

256 KB
├── 128 KB  A_L  ← 繼續分裂
│   ├── 64 KB  B_L  ← 繼續分裂
│   │   ├── 32 KB  C_L  ✅ 分配給請求（21 KB 向上取整 = 32 KB）
│   │   └── 32 KB  C_R  → 放入 free list (order 5)
│   └── 64 KB  B_R  → 放入 free list (order 6)
└── 128 KB  A_R  → 放入 free list (order 7)

graph TD
    A["256 KB"] --> AL["128 KB (A_L)"]
    A --> AR["128 KB (A_R) - free"]
    AL --> BL["64 KB (B_L)"]
    AL --> BR["64 KB (B_R) - free"]
    BL --> CL["32 KB (C_L) allocated"]
    BL --> CR["32 KB (C_R) - free"]

釋放與合併（Deallocation & Coalescing）

這是 Buddy System 最優雅的地方。釋放一個塊時，系統需要知道它的 buddy 在哪。因為分配是對半切的，buddy 的位址可以用一個簡單的 bitwise operation 算出來：

$buddy_addr = addr \oplus 2^{k}$

也就是把第 $k$ 個 bit flip 掉。不需要搜尋整個記憶體—— $O (1)$ 就能定位 buddy。若 buddy 也是空閒的，立刻合併成 $2^{k + 1}$ 大小的塊，再往上遞迴嘗試合併。

優點

Buddy System 的最大優點是快速 coalescing：合併空閒塊的時間複雜度為 $O (lo g N)$ ，其中 $N$ 是最大塊的 order。它有效抑制了 external fragmentation，讓大塊連續實體記憶體得以在使用後復原。

缺點

1. Internal Fragmentation

Internal Fragmentation 因為只能分配 2 的次方大小，65 KB 的請求會拿到 128 KB 的塊，白白浪費 63 KB（接近 50%）。請求大小愈接近 $2^{k} + 1$ ，浪費愈嚴重

2. Splitting/Coalescing Overhead

小請求頻繁到來時，系統要不停切割大塊；釋放時又要不停嘗試合併。這些操作雖然快，但在高頻場景下仍消耗可觀的 CPU 時間

3. Performance Bottleneck (Locking)

在多核心環境下，每次修改 free list 都要持有 spinlock。多個 CPU 核心同時分配記憶體時，這把鎖就成了 bottleneck，限制了

4. Inefficiency for Small Allocations

Kernel 內部充滿各種固定大小的小結構（task_struct、inode、dentry……），用 Buddy System 直接分配它們是極大的浪費。這正是 slab allocator 存在的理由——它建在 Buddy System 之上，專門高效處理小物件的反覆分配與釋放。（把 Buddy System 想成「批發商」，Slab Allocator 是「零售商」。）

5. Limited Physical Contiguity Over Time

即使設計上避免外部碎片，系統長時間運行後仍可能難以湊出高 order 的連續塊（high-order allocation failure）

Chilfox

目錄

D-OS-Ch10ga-Buddy_System