# Genetic algorithm search for molecular crystal structures¶

Similar to the Genetic algorithm search for bulk crystal structures tutorial, this one illustrates how
to search for stable crystal structures using the GA, but this time for a
molecular crystal (solid dinitrogen, N_{2}). For such materials, the
global optimization problem can be strongly simplified by preserving
**molecular identity**. By limiting the search to crystals consisting of
N_{2} molecules, compared to individual N atoms (which may
or may not form N_{2} molecules), the number of degrees of freedom
is considerably reduced.

Note

This assumption, of course, also needs to be checked –
at high pressures, for example, nitrogen is
known to polymerize and so
will no longer consist of individual N_{2} molecules!

The same approach can also be used e.g. for compounds with polyatomic ions or for adsorbates on surfaces. The main differences with the ‘atomic’ GA therefore lie in the identity preservation during the various stages of the GA:

The random initial structures need to be constructed from the appropriate building blocks (here: N

_{2}molecules with a N-N bond length as in the gas phase).The genetic operators must leave the internal geometry of the polyatomic building blocks intact. In the present implementation, this is achieved by looking at the

*tags*attribute of the parent Atoms objects: atoms belong to the same molecule if they have the same tag. These tags are either initialized by the initial structure generator, or can be added manually using the`Atoms.set_tags()`

method.The local optimization procedure may, if desired, change these internal geometries (i.e. the N-N bond lengths in our example). However, care may need to be taken that molecular identity is preserved also here (e.g. by applying certain constraints, see

`ase.constraints`

).

See also

The Constrained minima hopping (global optimization) tutorial addresses molecular identity in a different global optimization approach.

## Initial population¶

As usual, a GA run starts with generating a series of random initial
structures. The following script (`ga_molecular_crystal_start.py`

)
creates a `gadb.db`

database containing 10 structures, each
consisting of 8 N_{2} molecules.

```
from ase import Atoms
from ase.data import atomic_numbers
from ase.ga.data import PrepareDB
from ase.ga.startgenerator import StartGenerator
from ase.ga.utilities import CellBounds, closest_distances_generator
# Number of randomly generated structures
N = 10
# The building blocks
blocks = [('N2', 8)]
# By writing 'N2', the generator will automatically
# get the N2 geometry using ase.build.molecule.
# A guess for the cell volume in Angstrom^3
box_volume = 30. * 8
# The cell splitting scheme:
splits = {(2,): 1, (1,): 1}
# The minimal interatomic distances which the
# initial structures must satisfy. We can take these
# a bit larger than usual because these minimal
# distances will only be applied intermolecularly
# (and not intramolecularly):
Z = atomic_numbers['N']
blmin = closest_distances_generator(atom_numbers=[Z],
ratio_of_covalent_radii=1.3)
# The bounds for the randomly generated unit cells:
cellbounds = CellBounds(bounds={'phi': [30, 150], 'chi': [30, 150],
'psi': [30, 150], 'a': [3, 50],
'b': [3, 50], 'c': [3, 50]})
# The familiar 'slab' object, here only providing
# the PBC as there are no atoms or cell vectors
# that need to be applied.
slab = Atoms('', pbc=True)
# create the starting population
sg = StartGenerator(slab, blocks, blmin, box_volume=box_volume,
cellbounds=cellbounds, splits=splits,
number_of_variable_cell_vectors=3,
test_too_far=False)
# Initialize the database
da = PrepareDB(db_file_name='gadb.db', simulation_cell=slab,
stoichiometry=[Z] * 16)
# Generate the new structures
# and add them to the database
for i in range(N):
a = sg.get_new_candidate()
da.add_unrelaxed_candidate(a)
```

## Run the GA search¶

Now the actual search can begin (`ga_molecular_crystal_run.py`

),
which should only take about five minutes to complete.

The `relax`

function, which performs the variable-cell local optimization,
is imported from `ga_molecular_crystal_relax.py`

. For the purpose
of this tutorial, a simple interatomic potential is used where the
intramolecular interaction is described with a harmonic potential and the
intermolecular interactions with a Lennard-Jones potential.

Note

Solid N_{2} takes on a variety of crystal structures.
This short and simplified GA example is not intended as a
thorough investigation of this variety. Typical structures produced
by this example are e.g. cubic or hexagonal close packed with
intermolecular distances on the order of 3.7 Å.

```
import numpy as np
from ga_molecular_crystal_relax import relax
from ase.ga import get_raw_score
from ase.ga.cutandsplicepairing import CutAndSplicePairing
from ase.ga.data import DataConnection
from ase.ga.offspring_creator import OperationSelector
from ase.ga.ofp_comparator import OFPComparator
from ase.ga.population import Population
from ase.ga.soft_mutation import SoftMutation
from ase.ga.standardmutations import (
RattleMutation,
RattleRotationalMutation,
RotationalMutation,
StrainMutation,
)
from ase.ga.utilities import CellBounds, closest_distances_generator
from ase.io import write
da = DataConnection('gadb.db')
# Various items needed for initializing the genetic operators
slab = da.get_slab()
atom_numbers_to_optimize = da.get_atom_numbers_to_optimize()
n_top = len(atom_numbers_to_optimize)
blmin = closest_distances_generator(atom_numbers_to_optimize, 1.0)
cellbounds = CellBounds(bounds={'phi': [30, 150], 'chi': [30, 150],
'psi': [30, 150]})
# Note the "use_tags" keyword argument being used
# to signal that we want to preserve molecular identity
# via the tags
pairing = CutAndSplicePairing(slab, n_top, blmin, p1=1., p2=0.,
minfrac=0.15, cellbounds=cellbounds,
number_of_variable_cell_vectors=3,
use_tags=True)
rattlemut = RattleMutation(blmin, n_top, rattle_prop=0.3, rattle_strength=0.5,
use_tags=True)
strainmut = StrainMutation(blmin, stddev=0.7, cellbounds=cellbounds,
use_tags=True)
rotmut = RotationalMutation(blmin, fraction=0.3, min_angle=0.5 * np.pi)
rattlerotmut = RattleRotationalMutation(rattlemut, rotmut)
blmin_soft = closest_distances_generator(atom_numbers_to_optimize, 0.8)
softmut = SoftMutation(blmin_soft, bounds=[2., 5.], use_tags=True)
operators = OperationSelector([5, 1, 1, 1, 1, 1], [
pairing, rattlemut, strainmut, rotmut, rattlerotmut, softmut])
# Relaxing the initial candidates
while da.get_number_of_unrelaxed_candidates() > 0:
a = da.get_an_unrelaxed_candidate()
relax(a)
da.add_relaxed_step(a)
# The structure comparator for the population
comp = OFPComparator(n_top=n_top, dE=1.0, cos_dist_max=5e-3, rcut=10.,
binwidth=0.05, pbc=[True, True, True], sigma=0.05,
nsigma=4, recalculate=False)
# The population
population = Population(data_connection=da,
population_size=10,
comparator=comp,
logfile='log.txt')
current_pop = population.get_current_population()
strainmut.update_scaling_volume(current_pop, w_adapt=0.5, n_adapt=4)
pairing.update_scaling_volume(current_pop, w_adapt=0.5, n_adapt=4)
# Test a few new candidates
n_to_test = 10
for step in range(n_to_test):
print(f'Now starting configuration number {step}', flush=True)
# Generate a new candidate
a3 = None
while a3 is None:
a1, a2 = population.get_two_candidates()
a3, desc = operators.get_new_individual([a1, a2])
# Relax it and add to database
da.add_unrelaxed_candidate(a3, description=desc)
relax(a3)
da.add_relaxed_step(a3)
# Update the population
population.update()
current_pop = population.get_current_population()
write('current_population.traj', current_pop)
# Update the strain mutation and pairing operators
if step % 10 == 0:
strainmut.update_scaling_volume(current_pop, w_adapt=0.5,
n_adapt=4)
pairing.update_scaling_volume(current_pop, w_adapt=0.5, n_adapt=4)
# Print out information for easier follow-up/analysis/plotting:
print('Step %d %s %.3f %.3f %.3f' % (
step, desc, get_raw_score(a1), get_raw_score(a2), get_raw_score(a3)))
print('Step %d highest raw score in pop: %.3f' %
(step, get_raw_score(current_pop[0])))
print('GA finished after step %d' % step)
hiscore = get_raw_score(current_pop[0])
print('Highest raw score = %8.4f eV' % hiscore)
write('all_candidates.traj', da.get_all_relaxed_candidates())
write('current_population.traj', current_pop)
```