3. Basic Usage

3.1. Active learning

abICS was originally developed for directly combining first-principles calculations with replica-exchange Monte Carlo methods to perform configurational sampling, but the scale of the models and the number of steps that can be calculated are limited by the large computational cost of first-principles calculations. In contrast, Ver. 2 implements an active learning method to construct a neural network model that can rapidly predict the energy after structural optimization, dramatically improving the sampling speed [Kasamatsu et al., 2022] .

The general flow of the active learning method implemented in abICS is as follows.

  1. Perform ab initio calculations on a large number of randomly generated atomic configurations and prepare training data (correspondence between configurations and energies).

  2. Build a neural network model that predicts energy from atomic configurations using the prepared training data.

  3. Perform statistical thermodynamic sampling of atomic configurations using a replica exchange Monte Carlo method with a neural network model.

  4. Evaluate the accuracy of the neural network model by sampling the ion configurations that appear in the Monte Carlo calculations and performing ab initio calculations on each of them.

  5. If the accuracy is not sufficient, add the results calculated in 4. to the training data and repeat from 2.

../_images/al_scheme.png

Schematic of the active learning procedure using abICS

3.2. Preparing an abICS control file

First, we have to prepare an input file that controls the entire abICS framework. The input file of abICS is comprised of the following five sections:

  1. [sampling] section specifies the parameters of the replica exchange Monte Carlo part, such as the number of replicas, the temperature range, and the number of Monte Carlo steps. In addition, [sampling.solver] subsection specifies the parameters for the (first principle calculation) solver, including the type of solver (VASP, QE,…), the path to the solver, and the directory containing immutable input files.

  2. [mlref] section specifies options for extracting only atomic configurations from the sampling results in order to evaluate the accuracy of the neural network model and to expand the training data. In addition, for generating training data, [mlref.solver] subsection specifies the parameters for the (first principle calculation) solver, including the type of solver (VASP, QE,…), the path to the solver, and the directory containing immutable input files. This section is used for abics_mlref .

  3. [train] section specifies optinons for making a trainer to learn a placement energy prediction model from training data. This section is used for abics_train .

  4. [observer] section specifies the type of physical quantity to be calculated.

  5. [config] section specifies the configuration of the alloy, etc.

For details, see Input Files Format . The following is an example of an input file selecting aenet as a solver.

[sampling]
nreplicas = 8
nprocs_per_replica = 1 
kTstart = 600.0
kTend = 2000.0
nsteps = 6400 # Number of steps for sampling
RXtrial_frequency = 4
sample_frequency = 16
print_frequency = 1
reload = false

[sampling.solver]
type = 'aenet'
path= 'predict.x-2.0.4-ifort_serial'
base_input_dir = './baseinput'
perturb = 0.0
run_scheme = 'subprocess' #'mpi_spawn_ready'
ignore_species = ["O"]

[mlref]
nreplicas = 8
ndata = 5

[mlref.solver]
type = 'qe'
base_input_dir = './baseinput_ref'
perturb = 0.05
ignore_species = []

[train]
type = 'aenet'
base_input_dir = './aenet_train_input'
exe_command = ['generate.x-2.0.4-ifort_serial', 'srun train.x-2.0.4-ifort_intelmpi']
ignore_species = ["O"]
vac_map = []
restart = false

[config]
unitcell = [[8.1135997772, 0.0000000000, 0.0000000000],
            [0.0000000000, 8.1135997772, 0.0000000000],
            [0.0000000000, 0.0000000000, 8.1135997772]]
supercell = [1,1,1]

[[config.base_structure]]
type = "O"
coords = [
     [0.237399980, 0.237399980, 0.237399980],
     [0.762599945, 0.762599945, 0.762599945],
     [0.512599945, 0.012600004, 0.737399936],
     [0.487399966, 0.987399936, 0.262599975],
     [0.012600004, 0.737399936, 0.512599945],
     [0.987399936, 0.262599975, 0.487399966],
     [0.737399936, 0.512599945, 0.012600004],
     [0.262599975, 0.487399966, 0.987399936],
     [0.987399936, 0.487399966, 0.262599975],
     [0.012600004, 0.512599945, 0.737399936],
     [0.487399966, 0.262599975, 0.987399936],
     [0.512599945, 0.737399936, 0.012600004],
     [0.262599975, 0.987399936, 0.487399966],
     [0.737399936, 0.012600004, 0.512599945],
     [0.237399980, 0.737399936, 0.737399936],
     [0.762599945, 0.262599975, 0.262599975],
     [0.512599945, 0.512599945, 0.237399980],
     [0.487399966, 0.487399966, 0.762599945],
     [0.012600004, 0.237399980, 0.012600004],
     [0.987399936, 0.762599945, 0.987399936],
     [0.987399936, 0.987399936, 0.762599945],
     [0.012600004, 0.012600004, 0.237399980],
     [0.487399966, 0.762599945, 0.487399966],
     [0.512599945, 0.237399980, 0.512599945],
     [0.737399936, 0.237399980, 0.737399936],
     [0.262599975, 0.762599945, 0.262599975],
     [0.237399980, 0.512599945, 0.512599945],
     [0.762599945, 0.487399966, 0.487399966],
     [0.762599945, 0.987399936, 0.987399936],
     [0.237399980, 0.012600004, 0.012600004],
     [0.737399936, 0.737399936, 0.237399980],
     [0.262599975, 0.262599975, 0.762599945],
     ]

[[config.defect_structure]]
coords = [
     [0.000000000, 0.000000000, 0.000000000],
     [0.749999940, 0.249999985, 0.499999970],
     [0.249999985, 0.749999940, 0.499999970],
     [0.249999985, 0.499999970, 0.749999940],
     [0.749999940, 0.499999970, 0.249999985],
     [0.499999970, 0.749999940, 0.249999985],
     [0.499999970, 0.249999985, 0.749999940],
     [0.000000000, 0.499999970, 0.499999970],
     [0.749999940, 0.749999940, 0.000000000],
     [0.249999985, 0.249999985, 0.000000000],
     [0.249999985, 0.000000000, 0.249999985],
     [0.749999940, 0.000000000, 0.749999940],
     [0.499999970, 0.000000000, 0.499999970],
     [0.000000000, 0.749999940, 0.749999940],
     [0.000000000, 0.249999985, 0.249999985],
     [0.499999970, 0.499999970, 0.000000000],
     [0.374999970, 0.374999970, 0.374999970],
     [0.624999940, 0.624999940, 0.624999940],
     [0.374999970, 0.874999940, 0.874999940],
     [0.624999940, 0.124999993, 0.124999993],
     [0.874999940, 0.874999940, 0.374999970],
     [0.124999993, 0.124999993, 0.624999940],
     [0.874999940, 0.374999970, 0.874999940],
     [0.124999993, 0.624999940, 0.124999993],
     ]
[[config.defect_structure.groups]]
name = 'Al'
# species = ['Al']    # default
# coords = [[[0,0,0]]]  # default
num = 16 #432 #16000
[[config.defect_structure.groups]]
name = 'Mg'
# species = ['Mg']    # default
# coords = [[[0,0,0]]]  # default
num = 8 #216 #8000


[observer]
reference_structure = "MgAl2O4.vasp"
ignored_species = ["Al", "O"]

3.3. Preparing a reference file for first-principles solvers

The user must prepare reference input file(s) for first-principles solvers that are used for generating the training data according to the input format of the solver. The path of the reference file is specified by base_input_dir in the [mlref.solver] section in the abICS input file (see below). The coordinate information should not be written here; it will be written by abICS. The following is an example of a QE reference file.

&CONTROL
  calculation = 'relax'
  tstress = .false.
  tprnfor = .false.
  pseudo_dir = './pseudo'
  disk_io = 'low'
  wf_collect = .false.
/
&SYSTEM
  ecutwfc      =  60.0
  occupations  = "smearing"
  smearing     = "gauss"
  degauss      = 0.01
/
&electrons
  mixing_beta = 0.7
  conv_thr = 1.0d-8
  electron_maxstep = 100
/
&ions
/
ATOMIC_SPECIES
Al 26.981 Al.pbe-nl-kjpaw_psl.1.0.0.UPF
Mg 24.305 Mg.pbe-spnl-kjpaw_psl.1.0.0.UPF
O  16.000 O.pbe-n-kjpaw_psl.1.0.0.UPF
ATOMIC_POSITIONS crystal

K_POINTS gamma

3.3.1. Specific notes for first-principles solvers

Every setting other than those pertaining to the atomic coordinates should be set in the solver reference input file. However, the specification of atoms to be relaxed can be controlled by abICS. To enable the structural optimization option, please set the option to do structural optimization in the reference file of the solver, and also specify the related parameters such as the total number of steps for relaxation. Furthermore, to interoperate with abICS, there are a few rules regarding the file name and contents of the reference file for each solver. We’ll explain them below.

VASP

  • URL : https://www.vasp.at

  • Reference file rules

    • Please prepare INCAR, POTCAR, KPOINTS files.

      • In POTCAR file, please arrange the atoms in alphabetical order.

      • The POSCAR file is basically not needed, but maybe needed depending on the version of pymatgen. In that case, please prepare a suitable file.

Quantum Espresso

  • URL : https://www.quantum-espresso.org

  • Available version: 6.2 or higher

    • “Old XML” format is not available

  • Reference file rules

    • Please set the reference file name as scf.in .

    • calculation option must be scf or relax .

    • If the calculation is done only at \(\Gamma\) point, the calculation becomes fast if you set Gamma in kpoints.

OpenMX

3.4. Preparing a reference file for training and evaluating the machine learning model

The user must prepare a reference file according to the input format of the machine learning model trainer and calculator to be used (only aenet is supported at the moment). The path of the reference file is specified by base_input_dir in the [solver] section in the abICS input file (see below). The coordinate information should not be written here because it will obviously change in the course of the simulation. The lattice sites are specified in a separate abICS input file (see below), and abICS will take care of generating the coordinates section at each sampling step.

3.4.1. Machine learning trainer/calculator-specific notes

aenet

  • URL : http://ann.atomistic.net

  • Checked with version 2.0.4.

  • Reference file rules

    • Place the input files for aenet in the generate, train, and predict directories in the directory which is set in the base_input_dir of the [trainer] section.

    • aenet compiles the atomic configuration and energy data for training into an intermediate binary format that is converted to atomic environment descriptor-energy relationships before training. Please place the input files for this conversion generate.x in the generate directory.

    • Place an input file for train.x in the train directory that reads the training data generated by generate.x and trains. The file should be named train.in.

    • Place the input file predict.in for predict.x in the predict directory to evaluate the energy for the input coordinates using the trained potential model.

  • abICS control file

    • In the [solver] section, for type , perturb , and run_scheme, set the following if using an active learning scheme.

    type = “aenet”
    perturb = 0.0
    run_scheme = ‘subprocess’
    

3.5. Creating a set of training data

  1. Generate a set of input files for the first-principle calculation using abics_mlref.

  2. Perform the first-principle calculation with these inputs. (In the tutorial GNU parallel is used for the high-throughput calculation.)

3.6. Creating a neural network

  1. Run abics_mlref again to convert the results of the first-principle calculation into a common format that abics_train will read.

  2. Execute abics_train to create a neural network. When the calculation is completed successfully, the trained neural network is output in baseinput directory.

3.7. Monte Carlo sampling

By using abics_sampling, Monte Carlo sampling can be performed by using the trained neural network. (The number of MPI processes must be larger than the number of replicas.) Running the program will create directories named by the replica numbers under the current directory, and each replica runs the solver in it.

abICS can call the aenet library via the LAMMPS interface (aenetPyLammps). This is faster than calling aenet directly because it does not need file I/O. To use aenetPyLammps, you need to install aenet-lammps and lammps. For details, please refer to the Predict energy of annet model via LAMMPS interface.