plt_model_evidence.py¶

NAME¶

Calculate model evidence

SYNOPSIS¶

python3 plt_model_evidence.py [OPTION]... -n NDATA FILEs

DESCRIPTION¶

Extracts beta and partition function values from PAMC output files (FILE) and calculates the model evidence. Results are written to standard output and a plot of the results is saved to a file.

When multiple FILEs are specified, the average and variance of their model evidence values are calculated and a plot with error bars is generated.

Note

Python 3.6 or higher is required (due to the use of f-strings).
All calculations are performed on a logarithmic scale for numerical stability.
The x-axis (beta) in plots is always displayed on a logarithmic scale.

FILE: PAMC output filename(s) (fx.txt). Multiple files can be specified.
-n NDATA, --ndata NDATA: Specifies the number of data points for each dataset as comma-separated integers. This is a required parameter. Examples: “100” (one dataset with 100 points), “50,100,75” (three datasets with 50, 100, and 75 points respectively)
-w WEIGHT, --weight WEIGHT: Specifies the relative weights between datasets as comma-separated values. Weights are automatically normalized to sum to 1.0. The number of weight values must match the number of data points.
-V VOLUME, --Volume VOLUME: Specifies the normalization of the prior probability distribution (volume of the domain $V_\Omega$). Default is 1.0.
-f RESULT, --result RESULT: Specifies the filename for outputting model evidence values. Default is model_evidence.txt.
-o OUTPUT, --output OUTPUT: Specifies the filename for the model evidence plot. The output format is determined by the file extension, and any format supported by matplotlib can be specified. Default is model_evidence.png.
-h, --help: Displays help message and exits the program.

USAGE¶

Basic usage (one data file and one dataset)
```
$ python3 plt_model_evidence.py -n 100 fx.txt
```
Calculates the model evidence for a dataset with 100 data points, and outputs model_evidence.txt and model_evidence.png.
When there are multiple datasets
```
$ python3 plt_model_evidence.py -n 50,100,75 -w 0.2,0.5,0.3 fx.txt
```
Calculates the model evidence for three spots (with 50, 100, and 75 data points respectively, and relative weights of 0.2, 0.5, and 0.3).
When using multiple data files
```
$ python3 plt_model_evidence.py -n 100 -o evidence_plot.pdf -f evidence_data.txt fx_1.txt fx_2.txt fx_3.txt
```
Calculates the model evidence from three data files and determines the mean and variance. Outputs the results to evidence_data.txt and generates a plot with error bars in evidence_plot.pdf.

NOTES¶

Calculation of Model Evidence¶

The R-factor is defined as follows:

\[R(X;D)^2 = \sum_\mu w_\mu \sum_i \left( I_\mu(\theta_i) - I^{\text{(cal)}}_\mu(\theta_i;X) \right)^2\]

where $I_\mu(\theta_i)$ represents the measured data points in dataset $\mu$, and $I^{\text{(cal)}}_\mu(\theta_i;X)$ is the theoretical calculated value under parameter $X$. $w_\mu$ is the relative weight of each dataset, normalized so that their sum equals 1.

The model evidence $P(D|\beta)$ is calculated using the following formula:

\[\log P(D|\beta) = \log Z(D;\beta) - \log V_\Omega + \frac{n}{2} \log \beta + \sum_{\mu} \frac{n_{\mu}}{2} \log w_{\mu} - \frac{n}{2} \log \pi\]

where $Z(D;\beta)$ is the partition function:

\[Z(D;\beta) = \int_\Omega \exp\left(-\beta\,R(X;D)^2\right) dX\]

and:

$V_\Omega$: Normalization factor of the prior probability distribution

$n_\mu$: Number of data points in each dataset

$n$: Total number of data points (sum of all datasets)

$\beta$: Inverse temperature

Input File Format¶

The input file (PAMC output file) is expected to have the following format:

# Comment line (optional)
beta_value  fx_mean  fx_var  nreplica  logz_value  acceptance
...

The script reads the following values from each line:

Column 1 (index 0): beta value (inverse temperature)
Column 5 (index 4): logz value (logarithm of the partition function)

Output File Format¶

The output file (model_evidence.txt) has the following format:

# max log_P(D;beta) = {maximum_value} at Tstep = {index}, beta = {corresponding_beta_value}
# $1: Tstep
# $2: beta
# $3: model_evidence
0  beta0  model_evidence0
1  beta1  model_evidence1
...

When processing multiple input files, a variance column is added:

# max log_P(D;beta) = {maximum_value} at Tstep = {index}, beta = {corresponding_beta_value}
# $1: Tstep
# $2: beta
# $3: average model_evidence
# $4: variance
0  beta0  avg_model_evidence0  variance0
1  beta1  avg_model_evidence1  variance1
...

Processing Mechanism¶

This script processes data in the following steps:

Reads beta values and logz values from input files
Obtains the number of data points and weights for each dataset
Calculates the logarithm of the model evidence
Calculates the mean and variance for multiple files
Outputs the results to a file
Plots the model evidence as a function of beta

Plot Characteristics¶

X-axis (beta) is always displayed on a logarithmic scale
For a single file, only points are displayed; for multiple files, error bars are included
Markers are displayed as red “x”
Grid lines are displayed to make it easier to identify data positions

Error Handling¶

If the input file does not exist: A file open error occurs
If the data format is invalid: An error occurs in numpy.loadtxt
If the lengths of NDATA and WEIGHT do not match: An AssertionError occurs

In particular, the number of data points list and their weights must always match.