Replica exchange Monte Carlo exchange
¶
exchange
explores the parameter space by using the replica exchange Monte Carlo (RXMC) method.
Preparation¶
mpi4py should be installed if MPI parallelization is used.
$ python3 -m pip install mpi4py
Input parameters¶
This has two subsections algorithm.param
and algorithm.exchange
.
[algorithm.param
]¶
This defines a space to be explored.
When mesh_path
key is defined the discrete space is used.
Otherwise, continuous space is used.
Continuous space
initial_list
Format: List of float. Length should be equal to
dimension
.Description: Initial value of parameters. If not defined, these will be initialize randomly.
min_list
Format: List of float. Length should be equal to
dimension
.Description: Minimum value of each parameter. When a parameter falls below this value during the Monte Carlo search, the solver is not evaluated and the value is considered infinite.
max_list
Format: List of float. Length should be equal to
dimension
.Description: Maximum value of each parameter. When a parameter exceeds this value during the Monte Carlo search, the solver is not evaluated and the value is considered infinite.
step_list
Format: List of float. Length should be equal to
dimension
.Description: The step length in one Monte Carlo update (deviation of Gaussian distribution).
Discrete space
mesh_path
Format: string
Description: Path to the mesh definition file.
neighborlist_path
Format: string
Description: Path to the neighborhood-list file.
[algorithm.exchange
]¶
numsteps
Format: Integer
Description: The number of Monte Carlo steps.
numsteps_exchange
Format: Integer
Description: The number of interval Monte Carlo steps between replica exchange.
Tmin
Format: Float
Description: The minimum value of the “temperature” (\(T\)).
Tmax
Format: Float
Description: The maximum value of the “temperature” (\(T\)).
bmin
Format: Float
Description: The minimum value of the “inverse temperature” (\(\beta = 1/T\)). One of the “temperature” and “inverse temperature” should be defined.
bmax
Format: Float
Description: The maximum value of the “inverse temperature” (\(\beta = 1/T\)). One of the “temperature” and “inverse temperature” should be defined.
Tlogspace
Format: Boolean (default: true)
Description: Whether to assign “temperature” to replicas equally spaced in the logarithmic space or not.
nreplica_per_proc
Format: Integer (default: 1)
Description: The number of replicas in a MPI process.
Reference file¶
Mesh definition file¶
Define the grid space to be explored in this file. The first column is the index of the mesh, and the second and subsequent columns are the values of variables. Note that the index of the mesh will be ignored for this “algorithm”.
Below, a sample file is shown.
1 6.000000 6.000000
2 6.000000 5.750000
3 6.000000 5.500000
4 6.000000 5.250000
5 6.000000 5.000000
6 6.000000 4.750000
7 6.000000 4.500000
8 6.000000 4.250000
9 6.000000 4.000000
...
Neighborhood-list file¶
Before searching in the discrete space by Markov Chain Monte Carlo method, we should define “neighborhoods” for each point \(i\), which are points that a walker can move from \(i\) A neighborhood-list file defines the list of neighborhoods. In this file, the index of an initial point \(i\) is specified by the first column, and the indices of final points \(j\) are specified by the second and successive columns.
An utility tool, odatse_neighborlist
is available for generating a neighborhood-list file from a mesh file. For details, please see Related Tools.
0 1 2 3
1 0 2 3 4
2 0 1 3 4 5
3 0 1 2 4 5 6 7
4 1 2 3 5 6 7 8
5 2 3 4 7 8 9
...
Output files¶
RANK/trial.txt
¶
This file stores the suggested parameters and the corresponding value returned from the solver for each replica. The first column is the index of the MC step. The second column is the index of the walker in the process. The third column is the temperature of the replica. The fourth column is the value of the solver. The remaining columns are the coordinates.
Example:
# step walker T fx z1 z2
0 0 0.004999999999999999 0.07830821484593968 3.682008067401509 3.9502750191292586
1 0 0.004999999999999999 0.0758494287185766 2.811346329442423 3.691101784194861
2 0 0.004999999999999999 0.08566823949124412 3.606664760390988 3.2093903670436497
3 0 0.004999999999999999 0.06273922648753057 4.330900869594549 4.311333132184154
RANK/result.txt
¶
This file stores the sampled parameters and the corresponding value returned from the solver for each replica.
This has the same format as trial.txt
.
# step walker T fx z1 z2
0 0 0.004999999999999999 0.07830821484593968 3.682008067401509 3.9502750191292586
1 0 0.004999999999999999 0.07830821484593968 3.682008067401509 3.9502750191292586
2 0 0.004999999999999999 0.07830821484593968 3.682008067401509 3.9502750191292586
3 0 0.004999999999999999 0.06273922648753057 4.330900869594549 4.311333132184154
best_result.txt
¶
The optimal value of the solver and the corresponding parameter among the all samples.
nprocs = 4
rank = 2
step = 65
fx = 0.008233957976993406
z1 = 4.221129370933539
z2 = 5.139591716517661
result_T#.txt
¶
This file stores samples for each temperature ( #
is replaced with the index of temperature ).
The first column is the index of the MC step.
The second column is the index of the walker.
The third column is the value of the solver.
The remaining columns are the coordinates.
# T = 1.0
0 15 28.70157662892569 3.3139009347685118 -4.20946994566609
1 15 28.70157662892569 3.3139009347685118 -4.20946994566609
2 15 28.70157662892569 3.3139009347685118 -4.20946994566609
3 15 28.98676409223712 3.7442621319489637 -3.868754990884034
Restart¶
The execution mode is specified by the run_mode
parameter to the constructor.
The operation of each mode is described as follows.
The parameter values correspond to --init
, --resume
, and --cont
options of odatse
command, respectively.
"initial"
(default)The program is started from the initialized state. If the checkpointing is enabled, the intermediate states will be stored at the folloing occasions:
the specified number of steps has been done, or the specified period of time has passed.
at the end of the execution.
"resume"
The program execution is resumed from the latest checkpoint. The conditions such as the number of MPI processes should be kept the same.
"continue"
The program execution is continued from the previous run. The value of
numsteps
should be increased. The step counter is taken over.For example: in the first run, the calculation is carried out for 1000 steps with
numsteps = 1000
. In the next run, the calculation is continued withnumsteps = 2000
, where the calculations from 1001st step to 2000th step are carried out.
Algorithm¶
Markov chain Monte Carlo¶
The Markov chain Monte Carlo (MCMC) sampling explores the parameter space by moving walkers \(\vec{x}\) stochastically according to the weight function \(W(\vec{x})\). For the weight function, the Boltzmann factor \(W(\vec{x}) = e^{-f(\vec{x})/T}\) is generally adopted, where \(T>0\) is the “temperature.” It is impossible in the many cases, unfortunately, to sample walkers according to \(W\) directly. Insteadly, the MCMC method moves walkers slightly and generates a time series \(\{\vec{x}_t\}\) such that the distribution of the walkers obeys \(W\) . Let us call the transision probability from \(\vec{x}\) to \(\vec{x}'\) as \(p(\vec{x}' | \vec{x})\). When \(p\) is determined by the following condition (“the balance condition”)
the distribution of the generated time series \(\{\vec{x}_t\}\) will converges to \(W(\vec{x})\) [1]. Practically, the stronger condition (“the detailed balance condition”)
is usually imposed. The detailed balance condition returns to the balance condition by taking the summation of \(\vec{x}\).
ODAT-SE adopts the Metropolis-Hasting (MH) method for solving the detailed balance condition. The MH method splits the transition process into the suggestion process and the acceptance process.
Generate a candidate \(\vec{x}\) with the suggestion probability \(P(\vec{x} | \vec{x}_t)\).
As \(P\), use a simple distribution such as the normal distribution with centered at x.
Accept the candidate \(\vec{x}\) with the acceptance probability \(Q(\vec{x} | \vec{x}_t)\).
If accepted, let \(\vec{x}_{t+1}\) be vec{x}.
Otherwise, let \(\vec{x}_{t+1}\) be vec{x}_t.
The whole transision probability is the product of these two ones, \(p(\vec{x} | \vec{x_t}) = P(\vec{x} | \vec{x}_t) Q(\vec{x} | \vec{x}_t)\). The acceptance probability \(Q(\vec{x} | \vec{x}_t)\) is defined as
It is easy to verify that the detailed balance condition is satisfied by substituting it into the detailed balance condition equation.
When adopting the Boltzmann factor for the weight and a symmetry distribution \(P(\vec{x} | \vec{x}_t) = P(\vec{x}_t | \vec{x})\) for the suggestion probability, the acceptance probability \(Q\) will be the following simple form:
By saying \(\Delta f = f(\vec{x}) - f(\vec{x}_t)\) and using the fact \(Q=1\) for \(\Delta f \le 0\), the procedure of MCMC with the MH algorithm is the following:
Choose a candidate from near the current position and calculate \(f\) and \(\Delta f\).
If \(\Delta f \le 0\), that is, the walker is descending, accept it.
Otherwise, accept it with the probability \(Q=e^{-\Delta f/T}\).
Repeat 1-3.
The solution is given as the point giving the minimum value of \(f(\vec{x})\). The third process of the above procedure endures that walkers can climb over the hill with a height of \(\Delta f \sim T\), the MCMC sampling can escape from local minima.
Replica exchange Monte Carlo¶
The “temperature” \(T\) is one of the most important hyper parameters in the MCMC sampling. The MCMC sampling can climb over the hill with a height of \(T\) but cannot easily escape from the deeper valley than \(T\). It is why we should increase the temperature in order to avoid stuck to local minima. On the other hand, since walkers cannot see the smaller valleys than \(T\), the precision of the obtained result \(\min f(\vec{x})\) becomes about \(T\), and it is necessary to decrease the temperature in order to achieve more precise result. This dilemma leads us that we should tune the temperature carefully.
One of the ways to overcome this problem is to update temperature too. For example, simulated annealing decreases temperature as the iteration goes. Another algorithm, simulated tempering, treats temperature as another parameter to be sampled, not a fixed hyper parameter, and update temperature after some iterations according to the (detailed) balance condition. Simulated tempering studies the details of a valley by cooling and escapes from a valley by heating. Replica exchange Monte Carlo (RXMC), also known as parallel tempering, is a parallelized version of the simulated tempering. In this algorithm, several copies of a system with different temperature, called as replicas, will be simulated in parallel. Then, with some interval of steps, each replica exchanges temperature with another one according to the (detailed) balance condition. As the simulated tempering does, RXMC can observe the details of a valley and escape from it by cooling and heating. Moreover, because each temperature is assigned to just one replica, the temperature distribution will not be biased. Using more replicas narrows the temperature interval, and increases the acceptance ratio of the temperature exchange. This is why this algorithm suits for the massively parallel calculation.
It is recommended that users perform minsearch
optimization starting from the result of exchange
, because the RXMC result has uncertainty due to temperature.
footnote