separateT.py¶
NAME¶
Separate MCMC log files by temperature points
SYNOPSIS¶
python3 separateT.py [OPTION]... [FILE]...
DESCRIPTION¶
Separates MCMC log files (result.txt, trial.txt) into individual files for each temperature point.
Files are created in the same directory as the input, and filenames follow the format of the original filename with _T{index}
appended. {index}
is the index of different temperature points, assigned from 0 in the order they appear in the log file.
Note
Python 3.6 or higher is required (due to the use of type hints and f-strings).
MCMC log files are expected to have temperature values in the third column (index 2).
Comment lines (lines starting with #) in the input file are preserved in all output files.
The tqdm library is required for progress bar display. If not installed, regular messages will be displayed.
The following command line options are available.
If FILE is specified, that file will be processed. If no file is explicitly specified, DATA_DIR/*/FILE_TYPE
will be processed.
- FILE
Specifies the MCMC log file. Multiple files can be specified.
- -d DATA_DIR, --data_dir DATA_DIR
Specifies the directory from which to retrieve data files (when
FILE
is not specified).- -t FILE_TYPE, --file_type FILE_TYPE
Specifies the target filename when running with a directory specified. Default is result.txt.
- --progress
Displays a progress bar during execution. The tqdm library is required for display. If tqdm is not installed, the name of the file being processed will be displayed as a message instead.
- -h, --help
Displays help message and exits the program.
USAGE¶
Run with a specified filename
python3 separateT.py output/0/result.txt
output/0/result_T0.txt, output/0/result_T1.txt, … are created.
Split files under a specified directory
python3 separateT.py -d output
output/0/result.txt, output/1/result.txt, … will be processed.
Split files other than result.txt under a specified directory
python3 separateT.py -t trial.txt -d output
Splits output/0/trial.txt, output/1/trial.txt, …
Process multiple files at once and display progress
python3 separateT.py --progress file1.txt file2.txt file3.txt
Each file is split by temperature points, and progress is displayed with a progress bar.
NOTES¶
File Format¶
The input file (MCMC log file) is expected to have the following format:
# Comment line (optional)
step replica_id T fx x1 ... xN ...
step replica_id T fx x1 ... xN ...
...
Each line contains space-separated data, with the third column (index 2) being the temperature value T. Consecutive lines with the same temperature value are grouped into a single file.
Processing Mechanism¶
This script processes data in the following steps:
Reads the input file line by line
Records comment lines (lines starting with #) as headers
Obtains the temperature value from the third column (index 2) of each data line
Whenever the temperature value changes, writes the accumulated data to a separate file
Data for each temperature value is saved to a file with the original filename plus “_T{index}”
Output File Format¶
The output files have the following format:
Filename: Original filename with “_T{index}” added (e.g., result.txt → result_T0.txt, result_T1.txt, …)
File content: Headers (comment lines) from the input file, followed by data lines with the same temperature value
Performance¶
Files are processed line by line, so memory usage is kept low even for very large files
Data for each temperature point is buffered in memory, so memory usage may increase if there is a large amount of data for a single temperature point
Processing time increases with the size of the input file, but is relatively fast due to line-by-line processing
When processing multiple files, you can use the --progress option to monitor progress
Error Handling¶
If the input file is not found: A file open error occurs and a message is displayed
If the output file cannot be written: A permission error or similar occurs and a message is displayed
If the data line has insufficient columns: An index error may occur (if the third column does not exist)