The Linux Kernel implements Bad Block Management through the Memory Technology Device (MTD) subsystem and the Unsorted Block Images (UBI) with the following key features:
- Bad Block Table Initialization
- Bad Block Skipping
- All read, write, and erase operations automatically consult the in-memory Bad Block Table before accessing physical blocks.
- The UBI layer maintains the Physical Erase Block (PEB) organization using three Red-Black trees:
- Free Tree: Contains erased, good PEBs available for allocation, sorted by erase count for wear-leveling.
- Used Tree: Contains PEBs currently storing volume data.
- Erroneous Tree: Contains PEBs marked as bad or unreliable.
- When the UBI layer needs to allocate a block for writing:
- A PEB is selected from the free tree. A PEB is never selected from the erroneous tree.
- The bad block status is verified by querying the BBT through the MTD layer.
- If the block is marked bad, it is removed from the free tree, added to the erroneous tree, and a different PEB is selected.
- The BBT lookup uses efficient bit manipulation for fast access:
- Byte offset in BBT array = PEB number ÷ 4 (since each byte holds four 2-bit entries)
- Bit position within byte = (PEB number mod 4) × 2
- The 2-bit status code is the extracted using the following implementation: (BBT_byte >> bit_position) & 0x03
- When a bad block is encountered during logical-to-physical address translation, the operation automatically skips to the next good block, making bad block avoidance transparent to upper layers.
- Runtime Bad Block Marking
- Program, erase, and read operations are continuously monitored for failure conditions through multiple mechanisms:
- NAND controller status register checking after program and erase operations.
- Error Correction Code (ECC) engine monitoring during read operations.
- Wear-leveling algorithm tracking of marginal block behavior.
- When an operation fails, the affected block is immediately marked as bad through the following sequence:
- Program Failure: The P-FAIL bit of the NAND controller status register is checked after each program operation. If P-FAIL = 1 indicates that the program operation timed out or failed, triggering bad block marking.
- Erase Failure: E-FAIL bit of the NAND controller status register is checked after each erase operation. If E-FAIL = 1 indicates that the erase operation failed, triggering bad block marking.
- ECC Uncorrectable Error: When the number of bit errors in a page exceeds the ECC correction capability, that is, more than 8 bits for BCH-8 or 16 bits for BCH-16, the block is marked as failing.
- The bad block marking procedure executes the following steps:
The UBI layer updates its data structures:
The bad PEB is removed from the free tree or used tree, depending on where it was located.
The bad PEB is added to the erroneous tree to prevent future allocation.
If the bad PEB was storing volume data, the LEB-to-PEB mapping is updated to invalidate affected logical erase blocks.
Subsequent operations automatically avoid the newly marked block, as it now resides in the erroneous tree and has status 01b in the BBT.