So the question remains: what can we do to improve the performance of
the checkpointer?
To reduce the latency, we shouldn't copy the bytes around, even in RAM, with write() system calls. Instead, we should exploit the handy copy-on-write semantics of fork(). The parent process can immediately resume execution, while the child process slowly writes its consistent snapshop of the parent's memory image to disk. Some processes, however, will need to know that at the end of an establishCheckpoint() call, the data is actually committed to nonvolatile storage. In this case, the only way around the disk bandwidth problem is
|