Phoney's Space: Redesigning chkdsk and the new NTFS health model

The previous chkdsk and NTFS health model

While exceedingly rare, there are a variety of unique causes for disk corruption today. Whether they are caused by media errors from the hard disk or transient memory errors, corruptions can happen in file system metadata (the information used to map physical blocks to that vacation photo you took last year). To maintain access to your data, Windows must isolate and correct these errors, and the way to do this is by running the chkdsk utility.

In past versions, NTFS implemented a simpler health model, where the file system volume was either healthy or not. In that model, the volume was taken offline for as long as necessary to fix the file system corruptions and bring the volume back to a healthy state.

Downtime was directly proportional to the number of files in the volume. Reliable telemetry data from systems all over the world have shown us that, although corruptions are quite rare, when chkdsk is needed, it can take between a few seconds to a few hours to run, depending on the number of files in the drive–and even longer for larger storage servers.

In Windows Vista and Windows 7, we made significant optimizations to the speed of chkdsk but, as hard disk capacities have continued to double every 18 months and the number of files per volume is increasing at an equal rate, chkdsk has taken longer and longer to complete (even with speed improvements).

So in Windows 8, we’ve changed the way we approach the health model of NTFS and changed the way we fix corruptions so as to minimize the downtime due to chkdsk. We’ve also introduced a new file system for the future, ReFS, which does not require an offline chkdsk to repair corruptions.

Source: Toms Hardware | MSDN Blogs | Storage Spaces