Replica exchange Monte Carlo exchange

exchange explores the parameter space by using the replica exchange Monte Carlo (RXMC) method.

Preparation

mpi4py should be installed.

python3 -m pip install mpi4py

Input parameters

This has two subsections algorithm.param and algorithm.exchange .

[algorithm.param]

  • initial_list

    Format: List of float. Length should be equal to dimension.

    Description: Initial value of parameters. If not defined, these will be initialize randomly.

  • unit_list

    Format: List of float. Length should be equal to dimension.

    Description: Unit length of each parameter. Algorithm makes parameters dimensionless and normalized by dividing these by unit_list. If not defined, each component will be 1.0.

  • min_list

    Format: List of float. Length should be equal to dimension.

    Description: Minimum value of each parameter.

  • max_list

    Format: List of float. Length should be equal to dimension.

    Description: Maximum value of each parameter.

[algorithm.exchange]

  • numsteps

    Format: Integer

    Description: The number of Monte Carlo steps.

  • numsteps_exchange

    Format: Integer

    Description: The number of interval Monte Carlo steps between replica exchange.

  • Tmin

    Format: Float (default: 0.1)

    Description: The minimum value of the “temperature”.

  • Tmax

    Format: Float (default: 10.0)

    Description: The maximum value of the “temperature”.

  • Tlogspace

    Format: Boolean (default: true)

    Description: Whether to assign “temperature” to replicas equally spaced in the logarithmic space or not.

Output files

RANK/trial.txt

This file stores the suggested parameters and the corresponding value returned from the solver for each replica. The first column is the index of the MC step. The second column is the temperature of the replica. The third column is the value of the solver. The remaining columns are the coordinates.

Exapmle:

# step T fx z1 z2
0 0.004999999999999999 0.07830821484593968 3.682008067401509 3.9502750191292586
1 0.004999999999999999 0.0758494287185766 2.811346329442423 3.691101784194861
2 0.004999999999999999 0.08566823949124412 3.606664760390988 3.2093903670436497
3 0.004999999999999999 0.06273922648753057 4.330900869594549 4.311333132184154

RANK/result.txt

This file stores the sampled parameters and the corresponding value returned from the solver for each replica. This has the same format as trial.txt.

# step T fx z1 z2
0 0.004999999999999999 0.07830821484593968 3.682008067401509 3.9502750191292586
1 0.004999999999999999 0.07830821484593968 3.682008067401509 3.9502750191292586
2 0.004999999999999999 0.07830821484593968 3.682008067401509 3.9502750191292586
3 0.004999999999999999 0.06273922648753057 4.330900869594549 4.311333132184154

best_result.txt

The optimal value of the solver and the corresponding parameter among the all samples.

nprocs = 4
rank = 2
step = 65
fx = 0.008233957976993406
z1 = 4.221129370933539
z2 = 5.139591716517661

Algorithm

Markov chain Monte Carlo

The Markov chain Monte Carlo (MCMC) sampling explores the parameter space by moving walkers \(\vec{x}\) stochastically according to the weight function \(W(\vec{x})\). For the weight function, the Boltzmann factor \(W(\vec{x}) = e^{-f(\vec{x})/T}\) is generally adopted, where \(T>0\) is the “temperature.” It is impossible in the many cases, unfortunately, to sample walkers according to \(W\) directly. Insteadly, the MCMC method moves walkers slightly and generates a time series \(\{\vec{x}_t\}\) such that the distribution of the walkers obeys \(W\) . Let us call the transision probability from \(\vec{x}\) to \(\vec{x}'\) as \(p(\vec{x}' | \vec{x})\). When \(p\) is determined by the following condition (“the balance condition”)

\[W(\vec{x}') = \sum_{\vec{x}} p(\vec{x}' | \vec{x}) W(\vec{x}),\]

the distribution of the generated time series \(\{\vec{x}_t\}\) will converges to \(W(\vec{x})\) 1. Practically, the stronger condition (“the detailed balance condition”)

\[p(\vec{x} | \vec{x}') W(\vec{x}') = W(\vec{x})p(\vec{x}' | \vec{x})\]

is usually imposed. The detailed balance condition returns to the balance condition by taking the summation of \(\vec{x}\).

2DMAT adopts the Metropolis-Hasting (MH) method for solving the detailed balance condition. The MH method splits the transition process into the suggestion process and the acceptance process.

  1. Generate a candidate \(\vec{x}\) with the suggestion probability \(P(\vec{x} | \vec{x}_t)\).

    • As \(P\), use a simple distribution such as the normal distribution with centered at x.

  2. Accept the candidate \(\vec{x}\) with the acceptance probability \(Q(\vec{x} | \vec{x}_t)\).

    • If accepted, let \(\vec{x}_{t+1}\) be vec{x}.

    • Otherwise, let \(\vec{x}_{t+1}\) be vec{x}_t.

The whole transision probability is the product of these two ones, \(p(\vec{x} | \vec{x_t}) = P(\vec{x} | \vec{x}_t) Q(\vec{x} | \vec{x}_t)\). The acceptance probability \(Q(\vec{x} | \vec{x}_t)\) is defined as

\[Q(\vec{x} | \vec{x}_t) = \min\left[1, \frac{W(\vec{x})P(\vec{x}_t | \vec{x}) }{W(\vec{x}_t) P(\vec{x} | \vec{x}_t)} \right].\]

It is easy to verify that the detailed balance condition is satisfied by substituting it into the detailed balance condition equation.

When adopting the Boltzmann factor for the weight and a symmetry distribution \(P(\vec{x} | \vec{x}_t) = P(\vec{x}_t | \vec{x})\) for the suggestion probability, the acceptance probability \(Q\) will be the following simple form:

\[Q(\vec{x} | \vec{x}_t) = \min\left[1, \frac{W(\vec{x})}{W(\vec{x}_t)} \right] = \min\left[1, \exp\left(-\frac{f(\vec{x}) - f(\vec{x}_t)}{T}\right) \right].\]

By saying \(\Delta f = f(\vec{x}) - f(\vec{x}_t)\) and using the fact \(Q=1\) for \(\Delta f \le 0\), the procedure of MCMC with the MH algorithm is the following:

  1. Choose a candidate from near the current position and calculate \(f\) and \(\Delta f\).

  2. If \(\Delta f \le 0\), that is, the walker is descending, accept it.

  3. Otherwise, accept it with the probability \(Q=e^{-\Delta f/T}\).

  4. Repeat 1-3.

The solution is given as the point giving the minimum value of \(f(\vec{x})\). The third process of the above procedure endures that walkers can climb over the hill with a height of \(\Delta f \sim T\), the MCMC sampling can escape from local minima.

Replica exchange Monte Carlo

The “temperature” \(T\) is one of the most important hyper parameters in the MCMC sampling. The MCMC sampling can climb over the hill with a height of \(T\) but cannot easily escape from the deeper valley than \(T\). It is why we should increase the temperature in order to avoid stuck to local minima. On the other hand, since walkers cannot see the smaller valleys than \(T\), the precision of the obtained result \(\min f(\vec{x})\) becomes about \(T\), and it is necessary to decrease the temperature in order to achieve more precise result. This dilemma leads us that we should tune the temperature carefully.

One of the ways to overcome this problem is to update temperature too. For example, simulated annealing decreases temperature as the iteration goes. Another algorithm, simulated tempering, treats temperature as another parameter to be sampled, not a fixed hyper parameter, and update temperature after some iterations according to the (detailed) balance condition. Simulated tempering studies the details of a valley by cooling and escapes from a valley by heating. Replica exchange Monte Carlo (RXMC), also known as parallel tempering, is a parallelized version of the simulated tempering. In this algorithm, several copies of a system with different temperature, called as replicas, will be simulated in parallel. Then, with some interval of steps, each replica exchanges temperature with another one according to the (detailed) balance condition. As the simulated tempering does, RXMC can observe the details of a valley and escape from it by cooling and heating. Moreover, because each temperature is assigned to just one replica, the temperature distribution will not be biased. Using more replicas narrows the temperature interval, and increases the acceptance ratio of the temperature exchange. This is why this algorithm suits for the massively parallel calculation.

It is recommended that users perform minsearch optimization starting from the result of exchange, because the RXMC result has uncertainty due to temperature.

footnote

1

To be precisely, the non-periodicality and the ergodicity are necessary for convergence.