So how does our checkpointer perform? There are two interesting measures: how much overhead it incurs during normal process execution, and the latency it introduces while establishing a checkpoint. Runtime overhead is negligible, as the only thing the checkpointer does during runtime is interpose on a couple of infrequently used system calls.

At checkpoint time, Icee saves some kernel state, opens the checkpoint image file, writes the process memory map to the image file, then fsyncs it to disk before closing it. In the graph below, then, the establishCheckpoint() call blocks all threads and doesn't return until the checkpoint is committed to disk.

Not surprisingly, latency is dominated by the disk bandwidth, as the large image files are written out. (The smallest JVM process image is about five megabytes.) On our 6.6MB/sec IDE disks, we saw a latency curve with a slope representing 4.6MB/sec throughput.