Checkpointing#

Checkpointing adds restart and rollback capabilities to ASE scripts. It stores the current state of the simulation (and its history) into an ase.db. Something like what follows is found in many ASE scripts:

if os.path.exists('atoms_after_relax.traj'):
    a = ase.io.read('atoms_after_relax.traj')
else:
    ase.optimize.FIRE(a).run(fmax=0.01)
    ase.io.write('atoms_after_relax.traj')

The idea behind checkpointing is to replace this manual checkpointing capability with a unified infrastructure.

Manual checkpointing#

The class Checkpoint takes care of storing and retrieving information from the database. This information always includes an Atoms object, and it can include attached information on the internal state of the script.

class ase.calculators.checkpoint.Checkpoint(db='checkpoints.db', logfile=None)[source]#

load(atoms=None)[source]#

Retrieve checkpoint data from file. If atoms object is specified, then the calculator connected to that object is copied to all returning atoms object.

Returns tuple of values as passed to flush or save during checkpoint write.

flush(*args, **kwargs)[source]#: Store data to a checkpoint without increasing the checkpoint id. This is useful to continuously update the checkpoint state in an iterative loop.

save(*args, **kwargs)[source]#: Store data to a checkpoint and increase the checkpoint id. This closes the checkpoint.

In order to use checkpointing, first create a Checkpoint object:

from ase.calculators.checkpoint import Checkpoint
CP = Checkpoint()

You can optionally choose a database filename. Default is checkpoints.db.

Code blocks are wrapped into checkpointed regions:

try:
    a = CP.load()
except NoCheckpoint:
    ase.optimize.FIRE(a).run(fmax=0.01)
    CP.save(a)

The code block in the except statement is executed only if it has not yet been executed in a previous run of the script. The save() statement stores all of its parameters to the database.

This is not yet much shorter than the above example. The checkpointing object can, however, store arbitrary information along the Atoms object. Imagine we have computed elastic constants and don’t want to recompute them. We can then use:

try:
    a, C = CP.load()
except NoCheckpoint:
    C = fit_elastic_constants(a)
    CP.save(a, C)

Note that one parameter to save() needs to be an Atoms object, the others can be arbitrary. The load() statement returns these parameters in the order they were stored upon save. In the above example, the elastic constants are stored attached to the atomic configuration. If the script is executed again after the elastic constants have already been computed, it will skip that computation and just use the stored value.

If the checkpointed region contains a single statement, such as the above, there is a shorthand notation available:

C = CP(fit_elastic_constants)(a)

Sometimes it is necessary to checkpoint an iterative loop. If the script terminates within that loop, it is useful to resume calculation from the same loop position:

try:
    a, converged, tip_x, tip_y = CP.load()
except NoCheckpoint:
    converged = False
    tip_x = tip_x0
    tip_y = tip_y0
while not converged:
    ... do something to find better crack tip position ...
    converged = ...
    CP.flush(a, converged, tip_x, tip_y)

The above code block is an example of an iterative search for a crack tip position. Note that the convergence criteria needs to be stored to the database so the loop is not executed if convergence has been reached. The flush() statement overrides the last value stored to the database.

As a rule save() has to be used inside an except NoCheckpoint statement and flush() outside.

Automatic checkpointing with the checkpoint calculator#

The CheckpointCalculator is a shorthand for wrapping every single energy/force evaluation in a checkpointed region. It wraps the actual calculator.

class ase.calculators.checkpoint.CheckpointCalculator(calculator, db='checkpoints.db', logfile=None)[source]#

This wraps any calculator object to checkpoint whenever a calculation is performed.

This is particularly useful for expensive calculators, e.g. DFT and allows usage of complex workflows.

Example usage:

calc = … cp_calc = CheckpointCalculator(calc) atoms.calc = cp_calc e = atoms.get_potential_energy() # 1st time, does calc, writes to checkfile # subsequent runs, reads from checkpoint file

Basic calculator implementation.

restart: str: Prefix for restart file. May contain a directory. Default is None: don’t restart.
ignore_bad_restart_file: bool: Deprecated, please do not use. Passing more than one positional argument to Calculator() is deprecated and will stop working in the future. Ignore broken or missing restart file. By default, it is an error if the restart file is missing or broken.
directory: str or PurePath: Working directory in which to read and write files and perform calculations.
label: str: Name used for all files. Not supported by all calculators. May contain a directory, but please use the directory parameter for that instead.
atoms: Atoms object: Optional Atoms object to which the calculator will be attached. When restarting, atoms will get its positions and unit-cell updated from file.

implemented_properties: List[str] = ['energy', 'forces', 'stress', 'stresses', 'dipole', 'charges', 'magmom', 'magmoms', 'free_energy', 'energies', 'dielectric_tensor', 'born_effective_charges', 'polarization']#: Properties calculator can handle (energy, forces, …)

default_parameters: Dict[str, Any] = {}#: Default parameters

calculate(atoms, properties, system_changes)[source]#

Do the calculation.

properties: list of str: List of what needs to be calculated. Can be any combination of ‘energy’, ‘forces’, ‘stress’, ‘dipole’, ‘charges’, ‘magmom’ and ‘magmoms’.
system_changes: list of str: List of what has changed since last calculation. Can be any combination of these six: ‘positions’, ‘numbers’, ‘cell’, ‘pbc’, ‘initial_charges’ and ‘initial_magmoms’.

Subclasses need to implement this, but can ignore properties and system_changes if they want. Calculated properties should be inserted into results dictionary like shown in this dummy example:

self.results = {'energy': 0.0,
                'forces': np.zeros((len(atoms), 3)),
                'stress': np.zeros(6),
                'dipole': np.zeros(3),
                'charges': np.zeros(len(atoms)),
                'magmom': 0.0,
                'magmoms': np.zeros(len(atoms))}

The subclass implementation should first call this implementation to set the atoms attribute and create any missing directories.

Example usage:

calc = ...
cp_calc = CheckpointCalculator(calc)
atoms.calc = cp_calc
e = atoms.get_potential_energy()

The first call to get_potential_energy() does the actual calculation, a rerun of the script will load energies and force from the database. Note that this is useful for calculation where each energy evaluation is slow (e.g. DFT), but not recommended for molecular dynamics with classical potentials since every single time step will be dumped to the database. This will generate huge files.

Checkpointing

Contents

Checkpointing#

Manual checkpointing#

Automatic checkpointing with the checkpoint calculator#