Checkpointing is a neat technique, and can be used in several interesting
ways.
- Distributed checkpointing manages a set of independent process
checkpointers to create consistent sets of checkpoints for a large
distributed system. The problem is well represented in the literature.
- Process migration, in its simplest form, is checkpointing: Take
a checkpoint of a process on machine A, kill the process, move the
checkpoint image to machine B, and restore the checkpoint image into
a process on that machine.
- Systems like SmallTalk and Self use what amounts to a checkpoint
to provide orthogonal persistence in a rich programming environment.
It is not used for fault-tolerance, but just as an explicit persistence
mechanism.
- Imagine if your debugger had a "step back one instruction" command,
so that you could figure out what course of events led the program
to a given confused state. That could be implemented by keeping track
of how many instructions had been executed, then re-executing the
program from the start for N-1 instructions. Checkpointing makes that
practical by bounding how far the rollback has to go, so that the
user only experiences minimal latency.
- Programs like TeX and Emacs read and process a large
body of standard system libraries at startup, causing an
annoying delay with each invocation. This delay is reduced
using a variation on checkpointing. At installation time,
the program starts up, does the initial processing, then takes
a checkpoint. That checkpoint is turned into an executable file
that begins by recovering the checkpoint. Users run the
checkpoint recovery program, and avoid the long startup delay.
|