Power-safe file systems for embedded devices rescue hard drive designs
The ability to store and manage large amounts of data has become a critical requirement for automotive infotainment systems, medical devices, industrial controllers, portable media players and a variety of other embedded systems.
In response, many embedded system developers are opting to use hard drives in their designs. Hard drives may be less robust and less energy efficient than solid-state NAND flash devices, but they offer two compelling benefits for storage-hungry applications: greater capacity and lower price per bit.
The problem is that embedded systems often operate in environments where power failures and other unexpected shutdowns can occur. Traditional block-based file systems for hard drives and solid state disks (SSDs) were never designed to ensure file integrity in the event of such failures.
Figure 1: The file system before data is modified.
Unfortunately, embedded developers can rarely adopt the approaches to safeguarding hard-drive data used in the corporate IT world, such as replicating data in multiple locations, making frequent backups, or using an uninterruptible power supply (UPS). Nor can they use “server grade” file systems that provide a high level of protection, but consume too many system resources for embedded use.
It’s critical, then, that disk file systems for embedded systems prevent data corruption from occurring in the first place. They must also eliminate the time-consuming integrity checks typically required after a power failure since most embedded systems must be fully operational immediately after rebooting.
Figure 2: The file system after data has been modified.
Many disk file systems are reliable, but they can still lose data when a power failure occurs. For instance, when a hard drive loses power, it removes the heads to prevent them from crashing into the disk surface. If the file system driver is writing to the hard drive when this removal occurs, the write operation will be incomplete. The error-correction code (ECC) for the sector being written will become inconsistent and data in that sector will be lost.
When files or directories become corrupted, the traditional solution is to use a check-and-repair utility. Most such utilities have several limitations: they check only the file system structure and the metadata, not the file data; they are time consuming; and they can be used only when the file system isn’t in service, typically just after boot time.
To avoid these problems, a power-safe file system for embedded devices can use copy-on-write technology to protect both metadata and user data. In this approach, the file system never overwrites live data. Rather, it constructs a new view of the file system in unused blocks on the disk. The new view becomes “live” only after all the necessary updates have been safely written to disk.
Whenever user data is modified, a power-safe file system can follow these steps:
1. Write the new data to one or more unused blocks, leaving the original data unchanged.
2. Copy the existing list of indirect block pointers, then modify the copy to refer to the newly used blocks.
3. Copy the inode, which stores basic information about files and directories, then update the copy to refer to the new indirect block pointers.
When the operation is complete, the original data and the pointers to that data remain intact, but a new inode, set of blocks, and indirect pointers for the modified data now exist. Figure 1 shows the file system before a write operation and Figure 2 shows the file system after the operation is complete (updated data displayed in red).
To maintain high performance, the file system can group multiple write operations together, eliminating the need to perform the above procedure every time a file is modified. That way, when a “commit to disk” operation occurs, the file system sends many changes to the disk at once. Ideally, the file system will allow the developer to fine-tune how often updates occur.
A power-safe file system can use the concept of “superblocks” – global root blocks that contain the inodes for the system bitmap and inodes files. Specifically, it can maintain a stable superblock that reflects the original version of all the blocks and a working superblock that reflects the modified data. If a power failure occurs, the system can restore the last stable file-system state by reading the superblocks from disk, validating the signatures and CRC, and picking the superblock with the highest sequence number.
This approach eliminates the need for a time-consuming integrity check, which conventional file systems must perform after an unexpected shutdown. The time it takes to mount the file system is simply the time it takes to read a couple of blocks.
The biggest challenge to maintaining file system integrity on hard drives is avoiding the data corruption caused by power failures. By preventing this corruption, a power-safe file system can make hard drives, with their large capacities and low price per bit, a viable option for many embedded systems.
Paul Leroux is with QNX Software Systems of Ottawa.