2. Tutorial¶
In this tutorial, the procedure to use the database query tool getcif
is described for searching and obtaining crystallographic information from databases for the materials science.
It consists of getting an API key, preparing an input parameter file, and running the getcif program.
We will explain the steps along an example of searching and obtaining information for ABO3-type materials provided in the docs/tutorial/getcif
directory.
2.1. Getting an API key¶
In order to access the Materials Project database via API, users need to register to the Materials Project and obtain an API key. Visit the Materials Project website https://next-gen.materialsproject.org, create an account and do Login. An API key is automatically generated on registration and shown in the user dashboard. The API key should be kept safe and not shared with others.
The API key is made available to getcif by one of the following ways:
storing in the pymatgen configuration file by typing in as follows:
$ pmg config --add PMG_MAPI_KEY <API_KEY>or editing the file
~/.config/.pmgrc
to include the following:PMG_MAPI_KEY: <API_KEY>
setting to an environment variable by:
$ MP_API_KEY="<API_KEY>" $ export MP_API_KEYstoring the API key to a file located in the directory where getcif is run. The default value of the file name is
materials_project.key
. Otherwise, it is given in the input parameter file. The file name must end with.key
.database: api_key_file: materials_project.keyComment: it will be recommended to exclude files with
.key
as a suffix from version control system. (e.g. for Git, add*.key
in.gitignore
file.)
2.2. Prepare an input parameter file¶
An input parameter file describes search conditions and data items to retrieve from databases.
An example is presented below. It is a text file in YAML format that contains information for accessing the database, search conditions, and types of data to obtain. See file format section for the details of specification.
In YAML format, parameters are given in dictionary form as keyword: value
, where value
is a scalar such as a number or a string, or a set of values enclosed in [ ]
or listed in itemized form, or a nested dictionary.
For the search conditions and data fields, a list may be given by a space-separated items without brackets as a special notation.
database:
target: materials project
option:
output_dir: result
# dry_run: false
properties:
band_gap: < 1.0
is_stable: true
is_metal: false
formula: "**O3"
spacegroup_symbol: Pm-3m
fields: |
structure
band_gap
symmetry
The input parameter file consists of database
, option
, properties
, and fields
sections.
The database
section describes settings about connecting to databases.
In the example, target
is set to Materials Project, though this term is not considered at present. api_key
can be used to set the API key. The key may also be set in the pymatgen configuration file or in the environment variable. The latter is assumed in the tutorial.
The option
section describes optional settings for the command execution.
output_dir
specifies the directory to place the obtained data. The default is the current directory. If dry_run
is set to true
, getcif does not connect to the database; instead, it just prints the search conditions and exits. dry_run
may be specified in the command-line option.
The properties
section describes search conditions. They are given in the form of keyword: value
and treated as AND conditions.
In the example, the search condition is specified to find materials with band gap less than or equal to 1.0, stable insulator, having composition formula of ABO3 (where A and B are arbitrary species), that belong to the space group Pm-3m
(perovskite).
The band_gap
takes a pair of values for the lower and upper limits, as well as the description such as < 1.0
.
The available terms for specifying search conditions are listed in the Appendix.
The fields
section describes the data items to obtain. It is given as a YAML list, or a space-sparated list.
structure
specifies the crystal structure data that will be stored in CIF format.
band_gap
specifies the value of band gap, and symmetry
specifies the information on the symmetry. material_id
that refers to the index of material data in the Materials Project, and formula_pretty
that refers to the composition formula are automatically obtained.
The available items are listed in the Appendix, or can be found in the help message of getcif command.
2.3. Obtaining data¶
The program getcif
is executed with the input parameter file (input.yaml
) as follows.
$ getcif input.yaml
Then it connects to the Materials Project database, and obtains the data that match the specified conditions. The summary including the material IDs, the composition formulas, and other data items is printed to the standard output as follows.
material_id formula band_gap symmetry formula_pretty
mp-861502 AcFeO3 0.9887999999999995 crystal_system=<CrystalSystem.cubic: 'Cubic'> symbol='Pm-3m' number=221 point_group='m-3m' symprec=0.1 version='2.0.2' AcFeO3
mp-977455 PaAgO3 0.915 crystal_system=<CrystalSystem.cubic: 'Cubic'> symbol='Pm-3m' number=221 point_group='m-3m' symprec=0.1 version='2.0.2' PaAgO3
mp-11775 RbUO3 0.45420000000000016 crystal_system=<CrystalSystem.cubic: 'Cubic'> symbol='Pm-3m' number=221 point_group='m-3m' symprec=0.1 version='2.0.2' RbUO3
mp-3163 BaSnO3 0.37239999999999984 crystal_system=<CrystalSystem.cubic: 'Cubic'> symbol='Pm-3m' number=221 point_group='m-3m' symprec=0.1 version='2.0.2' BaSnO3
mp-4126 KUO3 0.44540000000000024 crystal_system=<CrystalSystem.cubic: 'Cubic'> symbol='Pm-3m' number=221 point_group='m-3m' symprec=0.1 version='2.0.2' KUO3
mp-865322 UTlO3 0.27360000000000007 crystal_system=<CrystalSystem.cubic: 'Cubic'> symbol='Pm-3m' number=221 point_group='m-3m' symprec=0.1 version='2.0.2' UTlO3
mp-753781 EuHfO3 0.4795999999999996 crystal_system=<CrystalSystem.cubic: 'Cubic'> symbol='Pm-3m' number=221 point_group='m-3m' symprec=0.1 version='2.0.2' EuHfO3
The obtained data are placed in the directory specified by output_dir
with the subdirectories of the material ID for each material.
In this example, seven subdirectories with names from mp-3163 to mp-977455 are created within result
directory, and each subdirectory contains the following files:
- band_gap
the value of band gap
- formula
the composition formula (that corresponds to the field
formula_pretty
)
- structure.cif
the crystal structure data in CIF format
- symmetry
the information about symmetry
If an option --dry-run
is added as a command-line option to getcif
,
the program prints the search condition as follows, and exits.
It will be useful for checking the search parameters.
$ getcif --dry-run input.yaml
{'band_gap': (None, 1.0), 'is_stable': True, 'is_metal': False, 'formula': '**O3', 'spacegroup_symbol': 'Pm-3m', 'fields': ['structure', 'band_gap', 'symmetry', 'material_id', 'formula_pretty']}