5.3. [mlref] section

Set options for retrieving only atomic configurations from the results of RXMC calculations. This is used, for example, to evaluate the accuracy of neural network models and to extend the training data. The file format is as follows.

[mlref]
nreplicas = 3
ndata = 50

5.3.1. Input Format

Keywords and their values are specified by a keyword and its value in the form keyword = value. Comments can also be entered by adding # (Subsequent characters are ignored).

5.3.2. Key words

  • About replica

    • nreplicas

      Format : int (natural number)

      Description : The number of replicas.

    • ndata

      Format : int (natural number)

      Description : The number of data (configuration) to be sampled

    • sampler

      Format : string (default: “linspace”)

      Description : The method to extract \(N_\text{data}\) samples from \(N\) samples generated by Monte Carlo method.

      • “linspace”

        Extract equilispaced samples by numpy.linspace(0, N-1, num=ndata, dtype=int)

      • “random”

        Random sampling by numpy.random.choice(range(N), size=ndata, replace=False)

5.4. [mlref.solver] section

Configure the solver used to calculate the training data (configuration energy). This section specifies solver parameters such as solver type (VASP, QE, …), path to solver, directory with solver-specific input file(s). It is basically the same as the [sampling.solver] section and has the following file format.

[mlref.solver]
type = 'vasp'
base_input_dir = './baseinput'
perturb = 0.1

5.4.1. Input Format

Keywords and their values are specified by a keyword and its value in the form keyword = value. Comments can also be entered by adding # (Subsequent characters are ignored).

5.4.2. Keywords

  • type

    Format : str

    Description : The solver type (OpenMX, QE, VASP, aenet).

  • base_input_dir

    Format : str or list of str

    Description : The path to the base input file. If multiple calculations are set up in the form of a list, each calculation using each input is performed in turn. For the second and subsequent calculations, the structure from the last step of the previous calculation is used as the initial coordinates, and the energy from the last calculation is used. For example, it is possible to perform a fast structural optimization in the first input file at the expense of accuracy, and then perform the structural optimization in the second and later input files with a higher accuracy setting. Or, in the case of grid vector relaxation, one can run the same input multiple times to reset the computational mesh based on a set plane-wave cutoff.

  • perturb

    Format : float

    Description : If a structure with good symmetry is input, structure optimization tends to stop at the saddle point. In order to avoid this, an initial structure is formed by randomly displacing each atom in proportion to this parameter. It can also be set to 0.0 or false. Default value = 0.0.

  • ignore_species

    Format : list

    Description : Specify atomic species to “ignore” in neural network models such as aenet. For those that always have an occupancy of 1, it is computationally more efficient to ignore their presence when training and evaluating neural network models.