Genetic algorithm search for bulk crystal structures
Here we will use the GA to predict a crystal structure of a given chemical composition.
The implementation is based on:
M. Van den Bossche, H. Grönbeck, and B. HammerJ. Chem. Theory Comput. 2018, 14, 2797−2807
and has much the same functionality as e.g. the USPEX program from the Oganov group and the XtalOpt code by the Zurek group.
What sets crystal structure searches apart from other global optimization problems, is that typically the cell vectors need to be treated as additional degrees of freedom (in addition to the atomic coordinates). This results in the following ‘bulk GA’-specific functionality:
Generating random structures now also involves randomized unit cell vectors.
Cut-and-splice pairing needs to be done using scaled coordinates and includes a random combination of the two parent unit cells.
A ‘strain’ mutation applies a random deformation of the parent unit cell, similar to how the ‘rattle’ mutation acts on the atomic coordinates.
In a ‘permustrain’ mutation, atoms from different elements are swapped in addition to the ‘strain’ mutation.
A ‘soft’ mutation displaces the atoms along a low-frequency vibrational mode obtained via e.g. a simple harmonic bond model.
As an example, we will search for the most energetically stable Ag_{2}_{4} polymorphs using an EMT potential. Note that, in general, the number of atoms per unit cell should be chosen carefully, as one will only find crystal structures where the stoichiometry of the primitive cell is a divisor of the chosen stoichiometry.
Initial population
The following script (ga_bulk_start.py
) creates a gadb.db
database containing 20 randomly generated initial structures.
from ase import Atoms
from ase.data import atomic_numbers
from ase.ga.data import PrepareDB
from ase.ga.startgenerator import StartGenerator
from ase.ga.utilities import CellBounds, closest_distances_generator
# Number of random initial structures to generate
N = 20
# Target cell volume for the initial structures, in angstrom^3
volume = 240.
# Specify the 'building blocks' from which the initial structures
# will be constructed. Here we take single Ag atoms as building
# blocks, 24 in total.
blocks = [('Ag', 24)]
# We may also write:
blocks = ['Ag'] * 24
# Generate a dictionary with the closest allowed interatomic distances
Z = atomic_numbers['Ag']
blmin = closest_distances_generator(atom_numbers=[Z],
ratio_of_covalent_radii=0.5)
# Specify reasonable bounds on the minimal and maximal
# cell vector lengths (in angstrom) and angles (in degrees)
cellbounds = CellBounds(bounds={'phi': [35, 145], 'chi': [35, 145],
'psi': [35, 145], 'a': [3, 50],
'b': [3, 50], 'c': [3, 50]})
# Choose an (optional) 'cell splitting' scheme which basically
# controls the level of translational symmetry (within the unit
# cell) of the randomly generated structures. Here a 1:1 ratio
# of splitting factors 2 and 1 is used:
splits = {(2,): 1, (1,): 1}
# There will hence be a 50% probability that a candidate
# is constructed by repeating an randomly generated Ag12
# structure along a randomly chosen axis. In the other 50%
# of cases, no cell cell splitting will be applied.
# The 'slab' object in the GA serves as a template
# in the creation of new structures, which inherit
# the slab's atomic positions (if any), cell vectors
# (if specified), and periodic boundary conditions.
# Here only the last property is relevant:
slab = Atoms('', pbc=True)
# Initialize the random structure generator
sg = StartGenerator(slab, blocks, blmin, box_volume=volume,
number_of_variable_cell_vectors=3,
cellbounds=cellbounds, splits=splits)
# Create the database
da = PrepareDB(db_file_name='gadb.db',
stoichiometry=[Z] * 24)
# Generate N random structures
# and add them to the database
for i in range(N):
a = sg.get_new_candidate()
da.add_unrelaxed_candidate(a)
Run the GA search
Now we can start the actual search (ga_bulk_run.py
),
which should only take a few minutes to complete.
The relaxation function, which performs the variable-cell
local optimization, is imported from ga_bulk_relax.py
.
This script will try to use the EMT calculator implemented
in the ASAP code.
If ASAP is not available, ASE’s native (but slower) EMT
calculator will be used instead.
from ga_bulk_relax import relax
from ase.ga import get_raw_score
from ase.ga.cutandsplicepairing import CutAndSplicePairing
from ase.ga.data import DataConnection
from ase.ga.offspring_creator import OperationSelector
from ase.ga.ofp_comparator import OFPComparator
from ase.ga.population import Population
from ase.ga.soft_mutation import SoftMutation
from ase.ga.standardmutations import StrainMutation
from ase.ga.utilities import CellBounds, closest_distances_generator
from ase.io import write
# Connect to the database and retrieve some information
da = DataConnection('gadb.db')
slab = da.get_slab()
atom_numbers_to_optimize = da.get_atom_numbers_to_optimize()
n_top = len(atom_numbers_to_optimize)
# Use Oganov's fingerprint functions to decide whether
# two structures are identical or not
comp = OFPComparator(n_top=n_top, dE=1.0,
cos_dist_max=1e-3, rcut=10., binwidth=0.05,
pbc=[True, True, True], sigma=0.05, nsigma=4,
recalculate=False)
# Define the cell and interatomic distance bounds
# that the candidates must obey
blmin = closest_distances_generator(atom_numbers_to_optimize, 0.5)
cellbounds = CellBounds(bounds={'phi': [20, 160], 'chi': [20, 160],
'psi': [20, 160], 'a': [2, 60],
'b': [2, 60], 'c': [2, 60]})
# Define a pairing operator with 100% (0%) chance that the first
# (second) parent will be randomly translated, and with each parent
# contributing to at least 15% of the child's scaled coordinates
pairing = CutAndSplicePairing(slab, n_top, blmin, p1=1., p2=0., minfrac=0.15,
number_of_variable_cell_vectors=3,
cellbounds=cellbounds, use_tags=False)
# Define a strain mutation with a typical standard deviation of 0.7
# for the strain matrix elements (drawn from a normal distribution)
strainmut = StrainMutation(blmin, stddev=0.7, cellbounds=cellbounds,
number_of_variable_cell_vectors=3,
use_tags=False)
# Define a soft mutation; we need to provide a dictionary with
# (typically rather short) minimal interatomic distances which
# is used to determine when to stop displacing the atoms along
# the chosen mode. The minimal and maximal single-atom displacement
# distances (in Angstrom) for a valid mutation are provided via
# the 'bounds' keyword argument.
blmin_soft = closest_distances_generator(atom_numbers_to_optimize, 0.1)
softmut = SoftMutation(blmin_soft, bounds=[2., 5.], use_tags=False)
# By default, the operator will update a "used_modes.json" file
# after every mutation, listing which modes have been used so far
# for each structure in the database. The mode indices start at 3
# as the three lowest frequency modes are translational modes.
# Set up the relative probabilities for the different operators
operators = OperationSelector([4., 3., 3.],
[pairing, softmut, strainmut])
# Relax the initial candidates
while da.get_number_of_unrelaxed_candidates() > 0:
a = da.get_an_unrelaxed_candidate()
relax(a, cellbounds=cellbounds)
da.add_relaxed_step(a)
cell = a.get_cell()
if not cellbounds.is_within_bounds(cell):
da.kill_candidate(a.info['confid'])
# Initialize the population
population_size = 20
population = Population(data_connection=da,
population_size=population_size,
comparator=comp,
logfile='log.txt',
use_extinct=True)
# Update the scaling volume used in some operators
# based on a number of the best candidates
current_pop = population.get_current_population()
strainmut.update_scaling_volume(current_pop, w_adapt=0.5, n_adapt=4)
pairing.update_scaling_volume(current_pop, w_adapt=0.5, n_adapt=4)
# Test n_to_test new candidates; in this example we need
# only few GA iterations as the global minimum (FCC Ag)
# is very easily found (typically already after relaxation
# of the initial random structures).
n_to_test = 50
for step in range(n_to_test):
print(f'Now starting configuration number {step}')
# Create a new candidate
a3 = None
while a3 is None:
a1, a2 = population.get_two_candidates()
a3, desc = operators.get_new_individual([a1, a2])
# Save the unrelaxed candidate
da.add_unrelaxed_candidate(a3, description=desc)
# Relax the new candidate and save it
relax(a3, cellbounds=cellbounds)
da.add_relaxed_step(a3)
# If the relaxation has changed the cell parameters
# beyond the bounds we disregard it in the population
cell = a3.get_cell()
if not cellbounds.is_within_bounds(cell):
da.kill_candidate(a3.info['confid'])
# Update the population
population.update()
if step % 10 == 0:
# Update the scaling volumes of the strain mutation
# and the pairing operator based on the current
# best structures contained in the population
current_pop = population.get_current_population()
strainmut.update_scaling_volume(current_pop, w_adapt=0.5,
n_adapt=4)
pairing.update_scaling_volume(current_pop, w_adapt=0.5, n_adapt=4)
write('current_population.traj', current_pop)
print('GA finished after step %d' % step)
hiscore = get_raw_score(current_pop[0])
print('Highest raw score = %8.4f eV' % hiscore)
all_candidates = da.get_all_relaxed_candidates()
write('all_candidates.traj', all_candidates)
current_pop = population.get_current_population()
write('current_population.traj', current_pop)
Typical bulk GA operators
- class ase.ga.cutandsplicepairing.CutAndSplicePairing(slab, n_top, blmin, number_of_variable_cell_vectors=0, p1=1, p2=0.05, minfrac=None, cellbounds=None, test_dist_to_slab=True, use_tags=False, rng=<module 'numpy.random' from '/scratch/jensj/ase-docs/venv/lib/python3.11/site-packages/numpy/random/__init__.py'>, verbose=False)[source]
The Cut and Splice operator by Deaven and Ho.
Creates offspring from two parent structures using a randomly generated cutting plane.
The parents may have different unit cells, in which case the offspring unit cell will be a random combination of the parent cells.
The basic implementation (for fixed unit cells) is described in:
doi: L.B. Vilhelmsen and B. Hammer, PRL, 108, 126101 (2012) <10.1103/PhysRevLett.108.126101>
The extension to variable unit cells is similar to:
The operator can furthermore preserve molecular identity if desired (see the use_tags kwarg). Atoms with the same tag will then be considered as belonging to the same molecule, and their internal geometry will not be changed by the operator.
If use_tags is enabled, the operator will also conserve the number of molecules of each kind (in addition to conserving the overall stoichiometry). Currently, molecules are considered to be of the same kind if their chemical symbol strings are identical. In rare cases where this may not be sufficient (i.e. when desiring to keep the same ratio of isomers), the different isomers can be differentiated by giving them different elemental orderings (e.g. ‘XY2’ and ‘Y2X’).
Parameters:
- slab: Atoms object
Specifies the cell vectors and periodic boundary conditions to be applied to the randomly generated structures. Any included atoms (e.g. representing an underlying slab) are copied to these new structures.
- n_top: int
The number of atoms to optimize
- blmin: dict
Dictionary with minimal interatomic distances. Note: when preserving molecular identity (see use_tags), the blmin dict will (naturally) only be applied to intermolecular distances (not the intramolecular ones).
- number_of_variable_cell_vectors: int (default 0)
The number of variable cell vectors (0, 1, 2 or 3). To keep things simple, it is the ‘first’ vectors which will be treated as variable, i.e. the ‘a’ vector in the univariate case, the ‘a’ and ‘b’ vectors in the bivariate case, etc.
- p1: float or int between 0 and 1
Probability that a parent is shifted over a random distance along the normal of the cutting plane (only operative if number_of_variable_cell_vectors > 0).
- p2: float or int between 0 and 1
Same as p1, but for shifting along the directions in the cutting plane (only operative if number_of_variable_cell_vectors > 0).
- minfrac: float between 0 and 1, or None (default)
Minimal fraction of atoms a parent must contribute to the child. If None, each parent must contribute at least one atom.
- cellbounds: ase.ga.utilities.CellBounds instance
Describing limits on the cell shape, see
CellBounds
. Note that it only make sense to impose conditions regarding cell vectors which have been marked as variable (see number_of_variable_cell_vectors).- use_tags: bool
Whether to use the atomic tags to preserve molecular identity.
- test_dist_to_slab: bool (default True)
Whether to make sure that the distances between the atoms and the slab satisfy the blmin.
- rng: Random number generator
By default numpy.random.
- class ase.ga.standardmutations.StrainMutation(blmin, cellbounds=None, stddev=0.7, number_of_variable_cell_vectors=3, use_tags=False, rng=<module 'numpy.random' from '/scratch/jensj/ase-docs/venv/lib/python3.11/site-packages/numpy/random/__init__.py'>, verbose=False)[source]
Mutates a candidate by applying a randomly generated strain.
For more information, see also:
After initialization of the mutation, a scaling volume (to which each mutated structure is scaled before checking the constraints) is typically generated from the population, which is then also occasionally updated in the course of the GA run.
Parameters:
- blmin: dict
The closest allowed interatomic distances on the form: {(Z, Z*): dist, …}, where Z and Z* are atomic numbers.
- cellbounds: ase.ga.utilities.CellBounds instance
Describes limits on the cell shape, see
CellBounds
.- stddev: float
Standard deviation used in the generation of the strain matrix elements.
- number_of_variable_cell_vectors: int (default 3)
The number of variable cell vectors (1, 2 or 3). To keep things simple, it is the ‘first’ vectors which will be treated as variable, i.e. the ‘a’ vector in the univariate case, the ‘a’ and ‘b’ vectors in the bivariate case, etc.
- use_tags: boolean
Whether to use the atomic tags to preserve molecular identity.
- rng: Random number generator
By default numpy.random.
- class ase.ga.standardmutations.PermuStrainMutation(permutationmutation, strainmutation, verbose=False)[source]
Combination of PermutationMutation and StrainMutation.
For more information, see also:
Parameters:
- permutationmutation: OffspringCreator instance
A mutation that permutes atom types.
- strainmutation: OffspringCreator instance
A mutation that mutates by straining.
- class ase.ga.standardmutations.RotationalMutation(blmin, n_top=None, fraction=0.33, tags=None, min_angle=1.57, test_dist_to_slab=True, rng=<module 'numpy.random' from '/scratch/jensj/ase-docs/venv/lib/python3.11/site-packages/numpy/random/__init__.py'>, verbose=False)[source]
Mutates a candidate by applying random rotations to multi-atom moieties in the structure (atoms with the same tag are considered part of one such moiety).
Only performs whole-molecule rotations, no internal rotations.
For more information, see also:
Parameters:
- blmin: dict
The closest allowed interatomic distances on the form: {(Z, Z*): dist, …}, where Z and Z* are atomic numbers.
- n_top: int or None
The number of atoms to optimize (None = include all).
- fraction: float
Fraction of the moieties to be rotated.
- tags: None or list of integers
Specifies, respectively, whether all moieties or only those with matching tags are eligible for rotation.
- min_angle: float
Minimal angle (in radians) for each rotation; should lie in the interval [0, pi].
- test_dist_to_slab: boolean
Whether also the distances to the slab should be checked to satisfy the blmin.
- rng: Random number generator
By default numpy.random.
- class ase.ga.standardmutations.RattleRotationalMutation(rattlemutation, rotationalmutation, verbose=False)[source]
Combination of RattleMutation and RotationalMutation.
Parameters:
- rattlemutation: OffspringCreator instance
A mutation that rattles atoms.
- rotationalmutation: OffspringCreator instance
A mutation that rotates moieties.
- class ase.ga.soft_mutation.SoftMutation(blmin, bounds=[0.5, 2.0], calculator=<class 'ase.ga.soft_mutation.BondElectroNegativityModel'>, rcut=10.0, used_modes_file='used_modes.json', use_tags=False, verbose=False)[source]
Mutates the structure by displacing it along the lowest (nonzero) frequency modes found by vibrational analysis, as in:
Lyakhov, Oganov, Valle, Comp. Phys. Comm. 181 (2010) 1623-1632
As in the reference above, the next-lowest mode is used if the structure has already been softmutated along the current-lowest mode. This mutation hence acts in a deterministic way, in contrast to most other genetic operators.
If you find this implementation useful in your work, please consider citing:
Van den Bossche, Gronbeck, Hammer, J. Chem. Theory Comput. 14 (2018)
in addition to the paper mentioned above.
Parameters:
- blmin: dict
The closest allowed interatomic distances on the form: {(Z, Z*): dist, …}, where Z and Z* are atomic numbers.
- bounds: list
Lower and upper limits (in Angstrom) for the largest atomic displacement in the structure. For a given mode, the algorithm starts at zero amplitude and increases it until either blmin is violated or the largest displacement exceeds the provided upper bound). If the largest displacement in the resulting structure is lower than the provided lower bound, the mutant is considered too similar to the parent and None is returned.
- calculator: ASE calculator object
The calculator to be used in the vibrational analysis. The default is to use a calculator based on pairwise harmonic potentials with force constants from the “bond electronegativity” model described in the reference above. Any calculator with a working
get_forces()
method will work.- rcut: float
Cutoff radius in Angstrom for the pairwise harmonic potentials.
- used_modes_file: str or None
Name of json dump file where previously used modes will be stored (and read). If None, no such file will be used. Default is to use the filename ‘used_modes.json’.
- use_tags: boolean
Whether to use the atomic tags to preserve molecular identity.
Useful helper functions
- class ase.ga.utilities.CellBounds(bounds={})[source]
Class for defining as well as checking limits on cell vector lengths and angles.
Parameters:
- bounds: dict
Any of the following keywords can be used, in conjunction with a [low, high] list determining the lower and upper bounds:
- a, b, c:
Minimal and maximal lengths (in Angstrom) for the 1st, 2nd and 3rd lattice vectors.
- alpha, beta, gamma:
Minimal and maximal values (in degrees) for the angles between the lattice vectors.
- phi, chi, psi:
Minimal and maximal values (in degrees) for the angles between each lattice vector and the plane defined by the other two vectors.
Example:
>>> from ase.ga.utilities import CellBounds >>> CellBounds(bounds={'phi': [20, 160], ... 'chi': [60, 120], ... 'psi': [20, 160], ... 'a': [2, 20], 'b': [2, 20], 'c': [2, 20]})
- class ase.ga.startgenerator.StartGenerator(slab, blocks, blmin, number_of_variable_cell_vectors=0, box_to_place_in=None, box_volume=None, splits=None, cellbounds=None, test_dist_to_slab=True, test_too_far=True, rng=<module 'numpy.random' from '/scratch/jensj/ase-docs/venv/lib/python3.11/site-packages/numpy/random/__init__.py'>)[source]
Class for generating random starting candidates.
Its basic task consists of randomly placing atoms or molecules within a predescribed box, while respecting certain minimal interatomic distances.
Depending on the problem at hand, certain box vectors may not be known or chosen beforehand, and hence also need to be generated at random. Common cases include bulk crystals, films and chains, with respectively 3, 2 and 1 unknown cell vectors.
Parameters:
- slab: Atoms object
Specifies the cell vectors and periodic boundary conditions to be applied to the randomly generated structures. Any included atoms (e.g. representing an underlying slab) are copied to these new structures. Variable cell vectors (see number_of_variable_cell_vectors) will be ignored because these will be generated at random.
- blocks: list
List of building units for the structure. Each item can be:
an integer: representing a single atom by its atomic number,
a string: for a single atom (a chemical symbol) or a molecule (name recognized by ase.build.molecule),
an Atoms object,
an (A, B) tuple or list where A is any of the above and B is the number of A units to include.
A few examples:
>>> blocks = ['Ti'] * 4 + ['O'] * 8 >>> blocks = [('Ti', 4), ('O', 8)] >>> blocks = [('CO2', 3)] # 3 CO2 molecules >>> co = Atoms('CO', positions=[[0, 0, 0], [1.4, 0, 0]]) >>> blocks = [(co, 3)]
Each individual block (single atom or molecule) in the randomly generated candidates is given a unique integer tag. These can be used to preserve the molecular identity of these subunits.
- blmin: dict or float
Dictionary with minimal interatomic distances. If a number is provided instead, the dictionary will be generated with this ratio of covalent bond radii. Note: when preserving molecular identity (see use_tags), the blmin dict will (naturally) only be applied to intermolecular distances (not the intramolecular ones).
- number_of_variable_cell_vectors: int (default 0)
The number of variable cell vectors (0, 1, 2 or 3). To keep things simple, it is the ‘first’ vectors which will be treated as variable, i.e. the ‘a’ vector in the univariate case, the ‘a’ and ‘b’ vectors in the bivariate case, etc.
- box_to_place_in: [list, list of lists] (default None)
The box in which the atoms can be placed. The default (None) means the box is equal to the entire unit cell of the ‘slab’ object. In many cases, however, smaller boxes are desired (e.g. for adsorbates on a slab surface or for isolated clusters). Then, box_to_place_in can be set as [p0, [v1, v2, v3]] with positions being generated as p0 + r1 * v1 + r2 * v2 + r3 + v3. In case of one or more variable cell vectors, the corresponding items in p0/v1/v2/v3 will be ignored.
- box_volume: int or float or None (default)
Initial guess for the box volume in cubic Angstrom (used in generating the variable cell vectors). Typical values in the solid state are 8-12 A^3 per atom. If there are no variable cell vectors, the default None is required (box volume equal to the box_to_place_in volume).
- splits: dict or None
Splitting scheme for increasing the translational symmetry in the random candidates, based on:
This should be a dict specifying the relative probabilities for each split, written as tuples. For example,
>>> splits = {(2,): 3, (1,): 1}
This means that, for each structure, either a splitting factor of 2 is applied to one randomly chosen axis, or a splitting factor of 1 is applied (i.e., no splitting). The probability ratio of the two scenararios will be 3:1, i.e. 75% chance for the former and 25% chance for the latter splitting scheme. Only the directions in which the ‘slab’ object is periodic are eligible for splitting.
To e.g. always apply splitting factors of 2 and 3 along two randomly chosen axes:
>>> splits = {(2, 3): 1}
By default, no splitting is applied (splits = None = {(1,): 1}).
- cellbounds: ase.ga.utilities.CellBounds instance
Describing limits on the cell shape, see
CellBounds
. Note that it only make sense to impose conditions regarding cell vectors which have been marked as variable (see number_of_variable_cell_vectors).- test_dist_to_slab: bool (default True)
Whether to make sure that the distances between the atoms and the slab satisfy the blmin.
- test_too_far: bool (default True)
Whether to also make sure that there are no isolated atoms or molecules with nearest-neighbour bond lengths larger than 2x the value in the blmin dict.
- rng: Random number generator
By default numpy.random.
- class ase.ga.ofp_comparator.OFPComparator(n_top=None, dE=1.0, cos_dist_max=0.005, rcut=20.0, binwidth=0.05, sigma=0.02, nsigma=4, pbc=True, maxdims=None, recalculate=False)[source]
Implementation of comparison using Oganov’s fingerprint (OFP) functions, based on:
Parameters:
- n_top: int or None
The number of atoms to optimize (None = include all).
- dE: float
Energy difference above which two structures are automatically considered to be different. (Default 1 eV)
- cos_dist_max: float
Maximal cosine distance between two structures in order to be still considered the same structure. Default 5e-3
- rcut: float
Cutoff radius in Angstrom for the fingerprints. (Default 20 Angstrom)
- binwidth: float
Width in Angstrom of the bins over which the fingerprints are discretized. (Default 0.05 Angstrom)
- pbc: list of three booleans or None
Specifies whether to apply periodic boundary conditions along each of the three unit cell vectors when calculating the fingerprint. The default (None) is to apply PBCs in all 3 directions.
Note: for isolated systems (pbc = [False, False, False]), the pair correlation function itself is always short-ranged (decays to zero beyond a certain radius), so unity is not subtracted for calculating the fingerprint. Also the volume normalization disappears.
- maxdims: list of three floats or None
If PBCs in only 1 or 2 dimensions are specified, the maximal thicknesses along the non-periodic directions can be specified here (the values given for the periodic directions will not be used). If set to None (the default), the length of the cell vector along the non-periodic direction is used.
Note: in this implementation, the cell vectors are assumed to be orthogonal.
- sigma: float
Standard deviation of the gaussian smearing to be applied in the calculation of the fingerprints (in Angstrom). Default 0.02 Angstrom.
- nsigma: int
Distance (as the number of standard deviations sigma) at which the gaussian smearing is cut off (i.e. no smearing beyond that distance). (Default 4)
- recalculate: boolean
If True, ignores the fingerprints stored in atoms.info and recalculates them. (Default False)