Genetic algorithm search for molecular crystal structures#

Similar to the Genetic algorithm search for bulk crystal structures tutorial, this one illustrates how to search for stable crystal structures using the GA, but this time for a molecular crystal (solid dinitrogen, N₂). For such materials, the global optimization problem can be strongly simplified by preserving molecular identity. By limiting the search to crystals consisting of N₂ molecules, compared to individual N atoms (which may or may not form N₂ molecules), the number of degrees of freedom is considerably reduced.

Note

This assumption, of course, also needs to be checked – at high pressures, for example, nitrogen is known to polymerize and so will no longer consist of individual N₂ molecules!

The same approach can also be used e.g. for compounds with polyatomic ions or for adsorbates on surfaces. The main differences with the ‘atomic’ GA therefore lie in the identity preservation during the various stages of the GA:

The random initial structures need to be constructed from the appropriate building blocks (here: N₂ molecules with a N-N bond length as in the gas phase).
The genetic operators must leave the internal geometry of the polyatomic building blocks intact. In the present implementation, this is achieved by looking at the tags attribute of the parent Atoms objects: atoms belong to the same molecule if they have the same tag. These tags are either initialized by the initial structure generator, or can be added manually using the Atoms.set_tags() method.
The local optimization procedure may, if desired, change these internal geometries (i.e. the N-N bond lengths in our example). However, care may need to be taken that molecular identity is preserved also here (e.g. by applying certain constraints, see ase.constraints).

Initial population#

As usual, a GA run starts with generating a series of random initial structures. The following script (ga_molecular_crystal_start.py) creates a gadb.db database containing 10 structures, each consisting of 8 N₂ molecules.

from ase import Atoms
from ase.data import atomic_numbers
from ase.ga.data import PrepareDB
from ase.ga.startgenerator import StartGenerator
from ase.ga.utilities import CellBounds, closest_distances_generator

# Number of randomly generated structures
N = 10

# The building blocks
blocks = [('N2', 8)]
# By writing 'N2', the generator will automatically
# get the N2 geometry using ase.build.molecule.

# A guess for the cell volume in Angstrom^3
box_volume = 30.0 * 8

# The cell splitting scheme:
splits = {(2,): 1, (1,): 1}

# The minimal interatomic distances which the
# initial structures must satisfy. We can take these
# a bit larger than usual because these minimal
# distances will only be applied intermolecularly
# (and not intramolecularly):
Z = atomic_numbers['N']
blmin = closest_distances_generator(
    atom_numbers=[Z], ratio_of_covalent_radii=1.3
)

# The bounds for the randomly generated unit cells:
cellbounds = CellBounds(
    bounds={
        'phi': [30, 150],
        'chi': [30, 150],
        'psi': [30, 150],
        'a': [3, 50],
        'b': [3, 50],
        'c': [3, 50],
    }
)

# The familiar 'slab' object, here only providing
# the PBC as there are no atoms or cell vectors
# that need to be applied.
slab = Atoms('', pbc=True)

# create the starting population
sg = StartGenerator(
    slab,
    blocks,
    blmin,
    box_volume=box_volume,
    cellbounds=cellbounds,
    splits=splits,
    number_of_variable_cell_vectors=3,
    test_too_far=False,
)

# Initialize the database
da = PrepareDB(
    db_file_name='gadb.db', simulation_cell=slab, stoichiometry=[Z] * 16
)

# Generate the new structures
# and add them to the database
for i in range(N):
    a = sg.get_new_candidate()
    da.add_unrelaxed_candidate(a)

Run the GA search#

Now the actual search can begin (ga_molecular_crystal_run.py), which should only take about five minutes to complete.

The relax function, which performs the variable-cell local optimization, is imported from ga_molecular_crystal_relax.py. For the purpose of this tutorial, a simple interatomic potential is used where the intramolecular interaction is described with a harmonic potential and the intermolecular interactions with a Lennard-Jones potential.

Note

Solid N₂ takes on a variety of crystal structures. This short and simplified GA example is not intended as a thorough investigation of this variety. Typical structures produced by this example are e.g. cubic or hexagonal close packed with intermolecular distances on the order of 3.7 Å.

import numpy as np
from ga_molecular_crystal_relax import relax

from ase.ga import get_raw_score
from ase.ga.cutandsplicepairing import CutAndSplicePairing
from ase.ga.data import DataConnection
from ase.ga.offspring_creator import OperationSelector
from ase.ga.ofp_comparator import OFPComparator
from ase.ga.population import Population
from ase.ga.soft_mutation import SoftMutation
from ase.ga.standardmutations import (
    RattleMutation,
    RattleRotationalMutation,
    RotationalMutation,
    StrainMutation,
)
from ase.ga.utilities import CellBounds, closest_distances_generator
from ase.io import write

da = DataConnection('gadb.db')

# Various items needed for initializing the genetic operators
slab = da.get_slab()
atom_numbers_to_optimize = da.get_atom_numbers_to_optimize()
n_top = len(atom_numbers_to_optimize)
blmin = closest_distances_generator(atom_numbers_to_optimize, 1.0)
cellbounds = CellBounds(
    bounds={'phi': [30, 150], 'chi': [30, 150], 'psi': [30, 150]}
)

# Note the "use_tags" keyword argument being used
# to signal that we want to preserve molecular identity
# via the tags
pairing = CutAndSplicePairing(
    slab,
    n_top,
    blmin,
    p1=1.0,
    p2=0.0,
    minfrac=0.15,
    cellbounds=cellbounds,
    number_of_variable_cell_vectors=3,
    use_tags=True,
)

rattlemut = RattleMutation(
    blmin, n_top, rattle_prop=0.3, rattle_strength=0.5, use_tags=True
)

strainmut = StrainMutation(
    blmin, stddev=0.7, cellbounds=cellbounds, use_tags=True
)

rotmut = RotationalMutation(blmin, fraction=0.3, min_angle=0.5 * np.pi)

rattlerotmut = RattleRotationalMutation(rattlemut, rotmut)

blmin_soft = closest_distances_generator(atom_numbers_to_optimize, 0.8)
softmut = SoftMutation(blmin_soft, bounds=[2.0, 5.0], use_tags=True)

operators = OperationSelector(
    [5, 1, 1, 1, 1, 1],
    [pairing, rattlemut, strainmut, rotmut, rattlerotmut, softmut],
)

# Relaxing the initial candidates
while da.get_number_of_unrelaxed_candidates() > 0:
    a = da.get_an_unrelaxed_candidate()
    relax(a)
    da.add_relaxed_step(a)

# The structure comparator for the population
comp = OFPComparator(
    n_top=n_top,
    dE=1.0,
    cos_dist_max=5e-3,
    rcut=10.0,
    binwidth=0.05,
    pbc=[True, True, True],
    sigma=0.05,
    nsigma=4,
    recalculate=False,
)

# The population
population = Population(
    data_connection=da, population_size=10, comparator=comp, logfile='log.txt'
)

current_pop = population.get_current_population()
strainmut.update_scaling_volume(current_pop, w_adapt=0.5, n_adapt=4)
pairing.update_scaling_volume(current_pop, w_adapt=0.5, n_adapt=4)

# Test a few new candidates
n_to_test = 10

for step in range(n_to_test):
    print(f'Now starting configuration number {step}', flush=True)

    # Generate a new candidate
    a3 = None
    while a3 is None:
        a1, a2 = population.get_two_candidates()
        a3, desc = operators.get_new_individual([a1, a2])

    # Relax it and add to database
    da.add_unrelaxed_candidate(a3, description=desc)
    relax(a3)
    da.add_relaxed_step(a3)

    # Update the population
    population.update()
    current_pop = population.get_current_population()
    write('current_population.traj', current_pop)

    # Update the strain mutation and pairing operators
    if step % 10 == 0:
        strainmut.update_scaling_volume(current_pop, w_adapt=0.5, n_adapt=4)
        pairing.update_scaling_volume(current_pop, w_adapt=0.5, n_adapt=4)

    # Print out information for easier follow-up/analysis/plotting:
    print(
        'Step %d %s %.3f %.3f %.3f'
        % (step, desc, get_raw_score(a1), get_raw_score(a2), get_raw_score(a3))
    )

    print(
        'Step %d highest raw score in pop: %.3f'
        % (step, get_raw_score(current_pop[0]))
    )

print('GA finished after step %d' % step)
hiscore = get_raw_score(current_pop[0])
print('Highest raw score = %8.4f eV' % hiscore)
write('all_candidates.traj', da.get_all_relaxed_candidates())
write('current_population.traj', current_pop)

Genetic algorithm search for molecular crystal structures

Contents

Genetic algorithm search for molecular crystal structures#

Initial population#

Run the GA search#