plt_1D_histogram.py¶
NAME¶
Create 1D Marginalized Histograms
SYNOPSIS¶
python3 plt_1D_histogram.py [OPTION]... [FILE]...
DESCRIPTION¶
Creates 1D marginalized histograms from data files specified in FILE.
The data files should be in text format, containing numerical data in multiple columns. In the standard format, each line contains space-separated values for beta, fx, x1, …, xN, weight. Beta is the inverse temperature, x1, … xN are parameter values (N is the dimension of parameters), fx is the function value at that point, and weight is the weight value. Field names can be specified with the field_list option, or you can use the parameter labels (label_list) from the input file used in PAMC calculations.
If FILE is not specified, the script reads files named result_*_summarized.txt from the directory specified by the data_dir option.
The axes for creating histograms can be specified with the columns option. If not specified, all axes x1, …, xN will be used. Specify field names as a comma-separated list. For example, --column x1,x3
will create histograms marginalized along the x1
and x3
axes.
The histogram range can be specified with the range option. In that case, the same range will be used for all displayed axes. To specify ranges for each axis individually, provide a list of [xmin, xmax]
pairs in the config file, or use the min_list
and max_list
from the input parameter file.
Note
Python 3.6 or higher is required (due to the use of f-strings).
The tqdm library is required for progress bar display. If not installed, regular messages will be displayed.
Be mindful of memory usage when processing large datasets.
The following command line options are available. These options can also be provided collectively in a config file. The config file uses TOML format, with options specified in the format option_name = value.
- -b BINS, --bins BINS
Specifies the number of bins. Default value is 60.
- -c COLUMNS, --columns COLUMNS
Specifies the field names for which to create histograms. Multiple field names can be specified as a comma-separated list. If omitted, all axes will be used.
- -d DATA_DIR, --data_dir DATA_DIR
Specifies the directory from which to retrieve data files (when
file
is not specified). If not specified, the current directory is used.- -f FORMAT, --format FORMAT
Specifies the format of the output histogram files. Any format supported by matplotlib can be specified. Multiple formats can be specified as a comma-separated list. Default value is
png
.- -o OUTPUT_DIR, --output_dir OUTPUT_DIR
Specifies the directory to which histogram files are output. If not specified, files are written to the current directory. If the directory does not exist, it is automatically created.
- -r RANGE, --range RANGE
Specifies the histogram range in the format xmin,xmax. If specified via the range command line option, it applies to all axes. To vary by axis, specify in the parameter file or config file. If not specified in any of these, the range is automatically set for each axis.
- -w WEIGHT_COLUMN, --weight_column WEIGHT_COLUMN
Specifies the column number (0-based) of the weight value. Default value is -1 (last column).
- --config CONFIG
Specifies a config file. The config file is in TOML format and specifies options equivalent to command line options. Option priority is: parameter file < config file < command line options.
- --params PARAMS
Specifies the input parameter file used when running PAMC. Range information (min_list, max_list) and field_list information (label_list) are obtained from the parameter file.
- --field_list FIELD_LIST
Specifies field names. If not specified, the standard format is assumed: beta, fx, x1, .. xN, weight (where N is the parameter dimension). If obtained from a parameter file, the values from label_list are used for x1 .. xN. Used for field name specification in columns.
- --progress
Displays a progress bar during execution. The tqdm library is required for display. If tqdm is not installed, messages about the processing status of each file are displayed instead.
- --xlabel XLABEL
Specifies the label string for the x-axis.
- -h, --help
Displays a help message and exits the program.
USAGE¶
Run with a specified input data file file.txt. Output to the 1dhist directory.
$ python3 plt_1D_histogram.py -o 1dhist file.txt
1dhist/1Dhistogram_file.png is output.
When input data files are prepared in the data directory as result_T0_summarized.txt to result_T10_summarized.txt. Set the output destination to the 1dhist directory.
$ python3 plt_1D_histogram.py -d data -o 1dhist
1Dhistogram_result_T0_beta_NNNN.png to 1Dhistogram_result_T10_beta_MMMM.png are output to the 1dhist directory. In the filename,
summarized
is replaced withbeta_{beta}
.Create histograms for the x1 and x3 fields from the input data file.txt, and output in png and pdf formats.
$ python3 plt_1D_histogram.py -c x1,x3 -o 1dhist -f png,pdf file.txt
1dhist/1Dhistogram_file.png and 1dhist/1Dhistogram_file.pdf are output.
Set the value range to 3.0-6.0. The same range is set for all axes.
$ python3 plt_1D_histogram.py -r 3.0,6.0 -o 1dhist file.txt
Use a config file to describe the options. Prepare conf.toml as follows:
field_list = ["beta", "fx", "z1", "z2", "z3", "weight"] columns = ["z1", "z2"] bins = 120 range = [[3.0, 6.0], [-3.0, 3.0], [0.0, 3.0]] data_dir = "./summarized" output_dir = "1dhist"
The axis labels are z1, z2, z3, and their value ranges are 3.0-6.0, -3.0-3.0, and 0.0-3.0, respectively. Histograms are drawn for z1 and z2.
Run with the config file specified.
$ python3 plt_1D_histogram.py --config conf.toml
Histograms are created for each result_T*_summarized.txt in the summarized/ directory and output to 1dhist/1Dhistogram_result_T*.png.
NOTES¶
Data File Format¶
Data files must be in the following format:
# Comment line (optional)
beta_value fx_value x1_value x2_value ... xN_value weight_value
beta_value fx_value x1_value x2_value ... xN_value weight_value
...
Each line consists of numerical data separated by whitespace. In the standard format, each column has the following meaning:
Column 1: beta value (inverse temperature)
Column 2: fx value (function value)
Columns 3 to (N+2): Parameter values x1, x2, …, xN
Last column: weight
Histogram Creation Mechanism¶
This script creates histograms using the following procedure:
Load data from input files
Normalize weights (so that they sum to 1)
Create a 1D histogram for each specified variable (column)
Save each histogram in the specified format
Output file naming convention:
Normal files:
1Dhistogram_{input_filename}.{format}
Files containing _summarized.txt (output from summarize_each_T.py):
1Dhistogram_{input_filename_with_summarized_replaced_by_beta_{beta_value}}.{format}
Performance¶
When processing large data files, the required memory is roughly proportional to the file size
Processing speed is relatively fast due to the use of NumPy
When processing many files, progress can be monitored with the
--progress
option
Error Handling and Limitations¶
If a data file is not found: An error message is displayed
If the data format is invalid (non-numeric, mismatched column count): That file is skipped and an error message is displayed
If a field name does not exist: A key error occurs
If the output directory cannot be written to: A permission error is displayed
If an error occurs during processing, that file is skipped and processing continues with the next file. A summary of successes and failures is displayed at the end.