Checkpointing

Checkpointing adds restart and rollback capabilities to ASE scripts. It stores the current state of the simulation (and its history) into an ase.db. Something like what follows is found in many ASE scripts:

if os.path.exists('atoms_after_relax.traj'):
    a = ase.io.read('atoms_after_relax.traj')
else:
    ase.optimize.FIRE(a).run(fmax=0.01)
    ase.io.write('atoms_after_relax.traj')

The idea behind checkpointing is to replace this manual checkpointing capability with a unified infrastructure.

Manual checkpointing

The class Checkpoint takes care of storing and retrieving information from the database. This information always includes an Atoms object, and it can include attached information on the internal state of the script.

class ase.calculators.checkpoint.Checkpoint(db=’checkpoints.db’, logfile=None)[source]
load(atoms=None)[source]

Retrieve checkpoint data from file. If atoms object is specified, then the calculator connected to that object is copied to all returning atoms object.

Returns tuple of values as passed to flush or save during checkpoint write.

flush(*args, **kwargs)[source]

Store data to a checkpoint without increasing the checkpoint id. This is useful to continously update the checkpoint state in an iterative loop.

save(*args, **kwargs)[source]

Store data to a checkpoint and increase the checkpoint id. This closes the checkpoint.

In order to use checkpointing, first create a Checkpoint object:

from ase.calculators.checkpoint import Checkpoint
CP = Checkpoint()

You can optionally choose a database filename. Default is checkpoints.db.

Code blocks are wrapped into checkpointed regions:

try:
    a = CP.load()
except NoCheckpoint:
    ase.optimize.FIRE(a).run(fmax=0.01)
    CP.save(a)

The code block in the except statement is executed only if it has not yet been executed in a previous run of the script. The save() statement stores all of its parameters to the database.

This is not yet much shorter than the above example. The checkpointing object can, however, store arbitrary information along the Atoms object. Imagine we have computed elastic constants and don’t want to recompute them. We can then use:

try:
    a, C = CP.load()
except NoCheckpoint:
    C = fit_elastic_constants(a)
    CP.save(a, C)

Note that one parameter to save() needs to be an Atoms object, the others can be arbitrary. The load() statement returns these parameters in the order they were stored upon save. In the above example, the elastic constants are stored attached to the atomic configuration. If the script is executed again after the elastic constants have already been computed, it will skip that computation and just use the stored value.

If the checkpointed region contains a single statement, such as the above, there is a shorthand notation available:

C = CP(fit_elastic_constants)(a)

Sometimes it is necessary to checkpoint an iterative loop. If the script terminates within that loop, it is useful to resume calculation from the same loop position:

try:
    a, converged, tip_x, tip_y = CP.load()
except NoCheckpoint:
    converged = False
    tip_x = tip_x0
    tip_y = tip_y0
while not converged:
    ... do something to find better crack tip position ...
    converged = ...
    CP.flush(a, converged, tip_x, tip_y)

The above code block is an example of an iterative search for a crack tip position. Note that the convergence criteria needs to be stored to the database so the loop is not executed if convergence has been reached. The flush() statement overrides the last value stored to the database.

As a rule save() has to be used inside an except NoCheckpoint statement and flush() outside.

Automatic checkpointing with the checkpoint calculator

The CheckpointCalculator is a shorthand for wrapping every single energy/force evaluation in a checkpointed region. It wraps the actual calculator.

class ase.calculators.checkpoint.CheckpointCalculator(calculator, db=’checkpoints.db’, logfile=None)[source]

This wraps any calculator object to checkpoint whenever a calculation is performed.

This is particularily useful for expensive calculators, e.g. DFT and allows usage of complex workflows.

Example usage:

calc = … cp_calc = CheckpointCalculator(calc) atoms.set_calculator(cp_calc) e = atoms.get_potential_energy() # 1st time, does calc, writes to checkfile # subsequent runs, reads from checkpoint file

Example usage:

calc = ...
cp_calc = CheckpointCalculator(calc)
atoms.set_calculator(cp_calc)
e = atoms.get_potential_energy()

The first call to get_potential_energy() does the actual calculation, a rerun of the script will load energies and force from the database. Note that this is useful for calculation where each energy evaluation is slow (e.g. DFT), but not recommended for molecular dynamics with classical potentials since every single time step will be dumped to the database. This will generate huge files.