So the question remains: what can we do to improve the performance of the checkpointer?

To reduce the latency, we shouldn't copy the bytes around, even in RAM, with write() system calls. Instead, we should exploit the handy copy-on-write semantics of fork(). The parent process can immediately resume execution, while the child process slowly writes its consistent snapshop of the parent's memory image to disk.

Some processes, however, will need to know that at the end of an establishCheckpoint() call, the data is actually committed to nonvolatile storage. In this case, the only way around the disk bandwidth problem is

  1. to buy faster disks, or
  2. to reduce the volume of data written to disk.
To acheive the latter, we can use incremental checkpointing. Starting with the second checkpoint, we only write out memory pages that have changed since the previous checkpoint. We can discover those pages by using the user-level virtual memory primitive mprotect() and the SEGV system call handler.