2. Tutorial

In this tutorial, the procedure to use the database query tool getcif is described for searching and obtaining crystallographic information from databases for the materials science. It consists of getting an API key, preparing an input parameter file, and running the getcif program. We will explain the steps along an example of searching and obtaining information for ABO3-type materials provided in the docs/tutorial/getcif directory.

2.1. Getting an API key

In order to access the Materials Project database via API, users need to register to the Materials Project and obtain an API key. Visit the Materials Project website https://next-gen.materialsproject.org, create an account and do Login. An API key is automatically generated on registration and shown in the user dashboard. The API key should be kept safe and not shared with others.

The API key is made available to getcif by one of the following ways:

  1. storing in the pymatgen configuration file by typing in as follows:

    $ pmg config --add PMG_MAPI_KEY <API_KEY>
    

    or editing the file ~/.config/.pmgrc to include the following:

    PMG_MAPI_KEY: <API_KEY>
    
  2. setting to an environment variable by:

    $ MP_API_KEY="<API_KEY>"
    $ export MP_API_KEY
    
  3. storing the API key to a file located in the directory where getcif is run. The default value of the file name is materials_project.key. Otherwise, it is given in the input parameter file. The file name must end with .key.

    database:
      api_key_file: materials_project.key
    

    Comment: it will be recommended to exclude files with .key as a suffix from version control system. (e.g. for Git, add *.key in .gitignore file.)

2.2. Prepare an input parameter file

An input parameter file describes search conditions and data items to retrieve from databases.

An example is presented below. It is a text file in YAML format that contains information for accessing the database, search conditions, and types of data to obtain. See file format section for the details of specification.

In YAML format, parameters are given in dictionary form as keyword: value, where value is a scalar such as a number or a string, or a set of values enclosed in [ ] or listed in itemized form, or a nested dictionary. For the search conditions and data fields, a list may be given by a space-separated items without brackets as a special notation.

database:
  target: materials project

option:
  output_dir: result
  # dry_run: false

properties:
  band_gap: < 1.0
  is_stable: true
  is_metal: false
  formula: "**O3"
  spacegroup_symbol: Pm-3m

fields: |
  structure
  band_gap
  symmetry

The input parameter file consists of database, option, properties, and fields sections. The database section describes settings about connecting to databases. In the example, target is set to Materials Project, though this term is not considered at present. api_key can be used to set the API key. The key may also be set in the pymatgen configuration file or in the environment variable. The latter is assumed in the tutorial.

The option section describes optional settings for the command execution. output_dir specifies the directory to place the obtained data. The default is the current directory. If dry_run is set to true, getcif does not connect to the database; instead, it just prints the search conditions and exits. dry_run may be specified in the command-line option.

The properties section describes search conditions. They are given in the form of keyword: value and treated as AND conditions. In the example, the search condition is specified to find materials with band gap less than or equal to 1.0, stable insulator, having composition formula of ABO3 (where A and B are arbitrary species), that belong to the space group Pm-3m (perovskite). The band_gap takes a pair of values for the lower and upper limits, as well as the description such as < 1.0. The available terms for specifying search conditions are listed in the Appendix.

The fields section describes the data items to obtain. It is given as a YAML list, or a space-sparated list. structure specifies the crystal structure data that will be stored in CIF format. band_gap specifies the value of band gap, and symmetry specifies the information on the symmetry. material_id that refers to the index of material data in the Materials Project, and formula_pretty that refers to the composition formula are automatically obtained. The available items are listed in the Appendix, or can be found in the help message of getcif command.

2.3. Obtaining data

The program getcif is executed with the input parameter file (input.yaml) as follows.

$ getcif input.yaml

Then it connects to the Materials Project database, and obtains the data that match the specified conditions. The summary including the material IDs, the composition formulas, and other data items is printed to the standard output as follows.

material_id  formula  band_gap  symmetry  formula_pretty
mp-861502  AcFeO3  0.9887999999999995  crystal_system=<CrystalSystem.cubic: 'Cubic'> symbol='Pm-3m' number=221 point_group='m-3m' symprec=0.1 version='2.0.2'  AcFeO3
mp-977455  PaAgO3  0.915  crystal_system=<CrystalSystem.cubic: 'Cubic'> symbol='Pm-3m' number=221 point_group='m-3m' symprec=0.1 version='2.0.2'  PaAgO3
mp-11775  RbUO3  0.45420000000000016  crystal_system=<CrystalSystem.cubic: 'Cubic'> symbol='Pm-3m' number=221 point_group='m-3m' symprec=0.1 version='2.0.2'  RbUO3
mp-3163  BaSnO3  0.37239999999999984  crystal_system=<CrystalSystem.cubic: 'Cubic'> symbol='Pm-3m' number=221 point_group='m-3m' symprec=0.1 version='2.0.2'  BaSnO3
mp-4126  KUO3  0.44540000000000024  crystal_system=<CrystalSystem.cubic: 'Cubic'> symbol='Pm-3m' number=221 point_group='m-3m' symprec=0.1 version='2.0.2'  KUO3
mp-865322  UTlO3  0.27360000000000007  crystal_system=<CrystalSystem.cubic: 'Cubic'> symbol='Pm-3m' number=221 point_group='m-3m' symprec=0.1 version='2.0.2'  UTlO3
mp-753781  EuHfO3  0.4795999999999996  crystal_system=<CrystalSystem.cubic: 'Cubic'> symbol='Pm-3m' number=221 point_group='m-3m' symprec=0.1 version='2.0.2'  EuHfO3

The obtained data are placed in the directory specified by output_dir with the subdirectories of the material ID for each material. In this example, seven subdirectories with names from mp-3163 to mp-977455 are created within result directory, and each subdirectory contains the following files:

  • band_gap

    the value of band gap

  • formula

    the composition formula (that corresponds to the field formula_pretty)

  • structure.cif

    the crystal structure data in CIF format

  • symmetry

    the information about symmetry

If an option --dry-run is added as a command-line option to getcif, the program prints the search condition as follows, and exits. It will be useful for checking the search parameters.

$ getcif --dry-run input.yaml
{'band_gap': (None, 1.0), 'is_stable': True, 'is_metal': False, 'formula': '**O3', 'spacegroup_symbol': 'Pm-3m', 'fields': ['structure', 'band_gap', 'symmetry', 'material_id', 'formula_pretty']}