extract_combined.py¶
NAME¶
Extract specific items from log files in combined format
SYNOPSIS¶
python3 extract_combined.py [OPTION]... -t tag [FILE]...
DESCRIPTION¶
Extracts lines tagged with a specific tag from MCMC log files in combined format.
In the combined format, multiple data types are stored with a tag at the beginning of each line in the format <tag>
. This script extracts lines with a specific tag
and outputs them to a file. The output file will be named tag
and placed in the same directory as the input file.
Note
Python 3.5 or higher is required (due to the use of type hints).
The tag format must be
<tag>
followed by a space.
The following command line options are available. If FILE is specified, that file will be processed. If no file is explicitly specified, DATA_DIR/*/combined.txt will be processed.
- FILE
Specifies the MCMC log file(s) (combined.txt). Multiple files can be specified.
- -t TAG, --tag TAG
Specifies the tag to extract. This is a required parameter. The tag is a string.
- -d DATA_DIR, --data_dir DATA_DIR
Specifies the directory to get data files from (when
FILE
is not specified).- --progress
Displays a progress bar during execution. Requires the tqdm library. If tqdm is not installed, the name of the file being processed will be displayed instead.
- -h, --help
Displays help message and exits the program.
USAGE¶
Basic usage examples are shown below.
# Extract lines with the "energy" tag from a specific file
python3 extract_combined.py -t energy path/to/combined.txt
# Extract lines with the "acc" tag from all combined.txt files in a specific directory
python3 extract_combined.py -t acc -d ./mcmc_results/
# Process multiple files with a progress bar
python3 extract_combined.py -t energy --progress file1.txt file2.txt file3.txt
NOTES¶
Usage Notes¶
Exact Tag Matching: Tags must match exactly. If you specify the “energy” tag, only lines starting with “
<energy>
” will be extracted. Lines with “<Energy>
” or “<energy_value>
” will not be matched. A space must follow the tag.Output File Overwriting: If a file with the same name already exists, it will be overwritten without warning. It is recommended to back up important files beforehand.
Processing Large Files: The script processes files line by line, so memory consumption is kept low even for very large files. When processing large files, you can use the
--progress
option to monitor progress.Handling Filenames and Paths: The output filename is simply
tag
. If the tag contains invalid characters (characters that cannot be used in filenames), an error may occur when creating the file.Empty File Processing: If the input file is empty or if no lines with the specified tag are found, an empty file will be generated.
Combined Format File Structure¶
Files in combined format have the following structure:
<tag1> value1 value2 ...
<tag2> value3 value4 ...
<tag1> value5 value6 ...
...
Each line begins with a tag in the format <tag>
, and extract_combined.py
searches for and extracts lines with matching tags.
The tag portion is removed from the output file.
General Workflow¶
This script is used when analyzing MCMC simulation results:
When MCMC simulation is run with the export_combined_files option enabled,
combined.txt
files are generated.Use
extract_combined.py
to extract necessary files.Analyze or plot the extracted data files with other tools.
When processing multiple simulation results at once, combining the -d
option with the --progress
option is efficient.
- Examples of commonly used tags in MCMC simulations:
trial.txt
: Monte Carlo trial step logsresult.txt
: Monte Carlo step logsweight.txt
: Weight values in PAMC calculationstime.txt
: Calculation times
Error Handling and Output¶
If the specified tag is not found: An empty file is generated
If the input file cannot be read: An error message is displayed on standard error output
If the output file cannot be written: A permission error is displayed on standard error output
The script displays progress information on standard output. If the --progress
option is specified and the tqdm library is installed, a progress bar will be displayed. Otherwise, the name of the file being processed will be displayed.