plt_model_evidence.py¶
NAME¶
Calculate model evidence
SYNOPSIS¶
python3 plt_model_evidence.py [OPTION]... -n NDATA FILEs
DESCRIPTION¶
Extracts beta and partition function values from PAMC output files (FILE) and calculates the model evidence. Results are written to standard output and a plot of the results is saved to a file.
When multiple FILEs are specified, the average and variance of their model evidence values are calculated and a plot with error bars is generated.
Note
Python 3.6 or higher is required (due to the use of f-strings).
All calculations are performed on a logarithmic scale for numerical stability.
The x-axis (beta) in plots is always displayed on a logarithmic scale.
- FILE
PAMC output filename(s) (fx.txt). Multiple files can be specified.
- -n NDATA, --ndata NDATA
Specifies the number of data points for each dataset as comma-separated integers. This is a required parameter. Examples: “100” (one dataset with 100 points), “50,100,75” (three datasets with 50, 100, and 75 points respectively)
- -w WEIGHT, --weight WEIGHT
Specifies the relative weights between datasets as comma-separated values. Weights are automatically normalized to sum to 1.0. The number of weight values must match the number of data points.
- -V VOLUME, --Volume VOLUME
Specifies the normalization of the prior probability distribution (volume of the domain \(V_\Omega\)). Default is 1.0.
- -f RESULT, --result RESULT
Specifies the filename for outputting model evidence values. Default is model_evidence.txt.
- -o OUTPUT, --output OUTPUT
Specifies the filename for the model evidence plot. The output format is determined by the file extension, and any format supported by matplotlib can be specified. Default is model_evidence.png.
- -h, --help
Displays help message and exits the program.
USAGE¶
Basic usage (one data file and one dataset)
$ python3 plt_model_evidence.py -n 100 fx.txt
Calculates the model evidence for a dataset with 100 data points, and outputs model_evidence.txt and model_evidence.png.
When there are multiple datasets
$ python3 plt_model_evidence.py -n 50,100,75 -w 0.2,0.5,0.3 fx.txt
Calculates the model evidence for three spots (with 50, 100, and 75 data points respectively, and relative weights of 0.2, 0.5, and 0.3).
When using multiple data files
$ python3 plt_model_evidence.py -n 100 -o evidence_plot.pdf -f evidence_data.txt fx_1.txt fx_2.txt fx_3.txt
Calculates the model evidence from three data files and determines the mean and variance. Outputs the results to evidence_data.txt and generates a plot with error bars in evidence_plot.pdf.
NOTES¶
Calculation of Model Evidence¶
The R-factor is defined as follows:
where \(I_\mu(\theta_i)\) represents the measured data points in dataset \(\mu\), and \(I^{\text{(cal)}}_\mu(\theta_i;X)\) is the theoretical calculated value under parameter \(X\). \(w_\mu\) is the relative weight of each dataset, normalized so that their sum equals 1.
The model evidence \(P(D|\beta)\) is calculated using the following formula:
where \(Z(D;\beta)\) is the partition function:
and:
\(V_\Omega\): Normalization factor of the prior probability distribution
\(n_\mu\): Number of data points in each dataset
\(n\): Total number of data points (sum of all datasets)
\(\beta\): Inverse temperature
Input File Format¶
The input file (PAMC output file) is expected to have the following format:
# Comment line (optional)
beta_value fx_mean fx_var nreplica logz_value acceptance
...
- The script reads the following values from each line:
Column 1 (index 0): beta value (inverse temperature)
Column 5 (index 4): logz value (logarithm of the partition function)
Output File Format¶
The output file (model_evidence.txt) has the following format:
# max log_P(D;beta) = {maximum_value} at Tstep = {index}, beta = {corresponding_beta_value}
# $1: Tstep
# $2: beta
# $3: model_evidence
0 beta0 model_evidence0
1 beta1 model_evidence1
...
When processing multiple input files, a variance column is added:
# max log_P(D;beta) = {maximum_value} at Tstep = {index}, beta = {corresponding_beta_value}
# $1: Tstep
# $2: beta
# $3: average model_evidence
# $4: variance
0 beta0 avg_model_evidence0 variance0
1 beta1 avg_model_evidence1 variance1
...
Processing Mechanism¶
This script processes data in the following steps:
Reads beta values and logz values from input files
Obtains the number of data points and weights for each dataset
Calculates the logarithm of the model evidence
Calculates the mean and variance for multiple files
Outputs the results to a file
Plots the model evidence as a function of beta
Plot Characteristics¶
X-axis (beta) is always displayed on a logarithmic scale
For a single file, only points are displayed; for multiple files, error bars are included
Markers are displayed as red “x”
Grid lines are displayed to make it easier to identify data positions
Error Handling¶
If the input file does not exist: A file open error occurs
If the data format is invalid: An error occurs in numpy.loadtxt
If the lengths of NDATA and WEIGHT do not match: An AssertionError occurs
In particular, the number of data points list and their weights must always match.