kliff.descriptors#

class kliff.descriptors.Descriptor(cut_dists, cut_name, hyperparams, normalize=True, dtype=<class 'numpy.float32'>)[source]#

Base class of atomic environment descriptors.

Process dataset to generate fingerprints. This is the base class for all descriptors, so it should not be used directly. Instead, descriptors built on top of this such as SymmetryFunction and Bispectrum can be used to transform the atomic environment information into fingerprints.

Parameters
  • cut_dists (Dict[str, float]) – Cutoff distances, with key of the form A-B where A and B are species string, and value should be a float. Example: cut_dists = {‘C-C’: 5.0}

  • cut_name (str) – Name of the cutoff function, such as cos, P3, and P7.

  • hyperparams (Union[Dict, str]) – A dictionary of the hyperparams of the descriptor or a string to select the predefined hyperparams.

  • normalize (bool) – If True, the fingerprints is centered and normalized: zeta = (zeta - mean(zeta)) / stdev(zeta)

  • dtype – np.dtype Data type of the generated fingerprints, such as np.float32 and np.float64.

size#

int Length of the fingerprint vector.

mean#

list Mean of the fingerprints.

stdev#

list Standard deviation of the fingerprints.

generate_fingerprints(configs, fit_forces=False, fit_stress=False, fingerprints_filename='fingerprints.pkl', fingerprints_mean_stdev_filename=None, use_welford_method=False, nprocs=1)[source]#

Convert all configurations to their fingerprints.

Parameters
  • configs (List[Configuration]) – Dataset configurations

  • fit_forces (bool) – Whether to compute the gradient of fingerprints w.r.t. atomic coordinates so as to compute forces.

  • fit_stress (bool) – Whether to compute the gradient of fingerprints w.r.t. atomic coordinates so as to compute stress.

  • use_welford_method (bool) – Whether to compute mean and standard deviation using the Welford method, which is memory efficient. See https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance

  • fingerprints_filename (Union[Path, str]) – Path to dump fingerprints to a pickle file.

  • fingerprints_mean_stdev_filename (Union[str, Path, None]) – Path to dump the mean and standard deviation of the fingerprints as a pickle file. If normalize=False for the descriptor, this is ignored.

  • nprocs (int) – Number of processes used to generate the fingerprints. If 1, run in serial mode, otherwise nprocs processes will be forked via multiprocessing to do the work.

transform(conf, fit_forces=False, fit_stress=False)[source]#

Transform atomic coords to atomic environment descriptor values.

Parameters
  • conf (Configuration) – atomic configuration

  • fit_forces (bool) – Whether to fit forces, so as to compute gradients of fingerprints w.r.t. coords

  • fit_stress (bool) – Whether to fit stress, so as to compute gradients of fingerprints w.r.t. coords

Returns

Descriptor values. 2D array with shape (num_atoms, num_descriptors),

where num_atoms is the number of atoms in the configuration, and num_descriptors is the size of the descriptor vector (depending on the choice of the hyperparameters).

dzeta_dr: Gradient of the descriptor w.r.t. atomic coordinates. 4D array if

grad is True, otherwise None. Shape: (num_atoms, num_descriptors, num_atoms, 3), where num_atoms and num_descriptors has the same meanings as described in zeta, and 3 denotes the 3D space for the Cartesian coordinates.

dzeta_ds: Gradient of the descriptor w.r.t. virial stress component. 2D

array of shape (num_atoms, num_descriptors, 6), where num_atoms and num_descriptors has the same meanings as described in zeta, and 6 denote the virial stress component in Voigt notation, see https://en.wikipedia.org/wiki/Voigt_notation

Return type

zeta

write_kim_params(path, fname='descriptor.params')[source]#

Write descriptor info for KIM model.

Parameters
  • path (Union[Path, str]) – Directory Path to write the file.

  • fname (str) – Name of the file.

get_size()[source]#

Return the size of the descriptor vector.

get_mean()[source]#

Return a list of the mean of the fingerprints.

get_stdev()[source]#

Return a list of the standard deviation of the fingerprints.

get_dtype()[source]#

Return the data type of the fingerprints.

get_cutoff()[source]#

Return the name and values of cutoff.

get_hyperparams()[source]#

Return the hyperparameters of descriptors.

state_dict()[source]#

Return the state dict of the descriptor.

Return type

Dict[str, Any]

load_state_dict(data)[source]#

Load state dict of a descriptor.

Parameters

data (Dict[str, Any]) – state dict to load.

class kliff.descriptors.SymmetryFunction(cut_dists, cut_name, hyperparams, normalize=True, dtype=<class 'numpy.float32'>)[source]#

Atom-centered symmetry functions descriptor as discussed in [Behler2011].

Parameters
  • cut_dists (dict) – Cutoff distances, with key of the form A-B where A and B are atomic species string, and value should be a float.

  • cut_name (str) – Name of the cutoff function.

  • hyperparams (dict or str) –

    A dictionary of the hyper parameters of that define the descriptor. We provide two sets of hyperparams that can be used by setting hyperparams='set51' or hyperparams='set30', which are taken from [Artrith2012] and [Artrith2013], respectively. To see what they are, one can do:

    >>> cut_name = 'cos'  # just for init purpose
    >>> cut_dists = {'C-C': 5.}  # just for init purpose
    >>> hyperparams = 'set51'
    >>> desc = SymmetryFunction(cut_dists, cut_name, hyperparams)
    >>> desc.get_hyperparams()
    

  • normalize (bool (optional)) – If True, the fingerprints is centered and normalized according to: zeta = (zeta - mean(zeta)) / stdev(zeta)

  • dtype (np.dtype (optional)) – Data type for the generated fingerprints, such as np.float32 and np.float64.

Example

If set51 or set30 hyperparams are used, the cutoff distances should be given in Angstrom.

>>> cut_name = 'cos'
>>> cut_dists = {'C-C': 5., 'C-H': 4.5, 'H-H': 4.0}
>>> hyperparams = 'set51'
>>> desc = SymmetryFunction(cut_dists, cut_name, hyperparams)

You can provide your own hyperparams as a dictionary:

>>> cut_name = 'cos'
>>> cut_dists = {'C-C': 5., 'C-H': 4.5, 'H-H': 4.0}
>>> hyperparams = {'g1': None,
>>>                'g2': [{'eta':0.1, 'Rs':0.2}, {'eta':0.3, 'Rs':0.4}],
>>>                'g3': [{'kappa':0.1}, {'kappa':0.2}, {'kappa':0.3}]}
>>> desc = SymmetryFunction(cut_dists, cut_name, hyperparams)

References

Behler2011

J. Behler, “Atom-centered symmetry functions for constructing high-dimensional neural network potentials,” J. Chem. Phys. 134, 074106 (2011).

Artrith2012

N. Artrith and J. Behler. “High-dimensional neural network potentials for metal surfaces: A prototype study for copper.” Physical Review B 85, no. 4 (2012): 045439.

Artrith2013

N. Artrith, B. Hiller, and J. Behler. “Neural network potentials for metals and oxides–First applications to copper clusters at zinc oxide.” physica status solidi (b) 250, no. 6 (2013): 1191-1203.

transform(conf, fit_forces=False, fit_stress=False)[source]#

Transform atomic coords to atomic environment descriptor values.

Parameters

conf (Configuration object) – A configuration of atoms.

fit_forces: bool (optional)

Whether to compute the gradient of descriptor values w.r.t. atomic coordinates so as to compute forces.

fit_stress: bool (optional)

Whether to compute the gradient of descriptor values w.r.t. atomic coordinates so as to compute stress.

Returns

  • zeta (2D array) – Descriptor values, each row for one atom. zeta has shape (num_atoms, num_descriptors), where num_atoms is the number of atoms in the configuration, and num_descriptors is the size of the descriptor vector (depending on the the choice of hyper-parameters).

  • dzetadr_forces (3D array if fit_forces is True, otherwise None) – Gradient of descriptor values w.r.t. atomic coordinates for forces computation. dzetadr_forces has shape (num_atoms, num_descriptors, num_atoms*DIM), where num_atoms and num_descriptors has the same meanings as described in zeta. DIM = 3 denotes three Cartesian coordinates.

  • dzetadr_stress (3D array if fit_stress is True, otherwise None) – Gradient of descriptor values w.r.t. atomic coordinates for stress computation. dzetadr_stress has shape (num_atoms, num_descriptors, 6), where num_atoms and num_descriptors has the same meanings as described in zeta. The last dimension is the 6 component associated with virial stress in the order of 11, 22, 33, 23, 31, 12.

write_kim_params(path, fname='descriptor.params')[source]#

Write descriptor info for KIM model.

Parameters
  • path – Directory Path to write the file.

  • fname – Name of the file.

get_size()[source]#

Return the size of the descriptor vector.

get_hyperparams()[source]#

Return the hyperparameters of descriptors.

generate_fingerprints(configs, fit_forces=False, fit_stress=False, fingerprints_filename='fingerprints.pkl', fingerprints_mean_stdev_filename=None, use_welford_method=False, nprocs=1)#

Convert all configurations to their fingerprints.

Parameters
  • configs (List[Configuration]) – Dataset configurations

  • fit_forces (bool) – Whether to compute the gradient of fingerprints w.r.t. atomic coordinates so as to compute forces.

  • fit_stress (bool) – Whether to compute the gradient of fingerprints w.r.t. atomic coordinates so as to compute stress.

  • use_welford_method (bool) – Whether to compute mean and standard deviation using the Welford method, which is memory efficient. See https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance

  • fingerprints_filename (Union[Path, str]) – Path to dump fingerprints to a pickle file.

  • fingerprints_mean_stdev_filename (Union[str, Path, None]) – Path to dump the mean and standard deviation of the fingerprints as a pickle file. If normalize=False for the descriptor, this is ignored.

  • nprocs (int) – Number of processes used to generate the fingerprints. If 1, run in serial mode, otherwise nprocs processes will be forked via multiprocessing to do the work.

get_cutoff()#

Return the name and values of cutoff.

get_dtype()#

Return the data type of the fingerprints.

get_mean()#

Return a list of the mean of the fingerprints.

get_stdev()#

Return a list of the standard deviation of the fingerprints.

load_state_dict(data)#

Load state dict of a descriptor.

Parameters

data (Dict[str, Any]) – state dict to load.

state_dict()#

Return the state dict of the descriptor.

Return type

Dict[str, Any]

class kliff.descriptors.Bispectrum(cut_dists, cut_name=None, hyperparams=None, normalize=True, dtype=<class 'numpy.float32'>)[source]#

Bispectrum descriptor.

Process dataset to generate fingerprints using the Bispectrum descriptor as discussed in [Bartok2010] and [Thompson2015].

Parameters
  • cut_dists (dict) – Cutoff distances, with key of the form A-B where A and B are atomic species string, and value should be a float.

  • cut_name (str) – Name of the cutoff function.

  • hyperparams (dict) – A dictionary of the hyperparams of the descriptor.

  • normalize (bool (optional)) – If True, the fingerprints is centered and normalized according to: zeta = (zeta - mean(zeta)) / stdev(zeta)

  • dtype (np.dtype) – Data type for the generated fingerprints, such as np.float32 and np.float64.

Example

>>> cut_name = 'cos'
>>> cut_dists = {'C-C': 5.0, 'C-H': 4.5, 'H-H': 4.0}
>>> hyperparams = {'jmax': 4, 'weight': {'C':1.0, 'H':1.0}}
>>> desc = Bispectrum(cut_dists, cut_name, hyperparams)

References

Bartok2010

Bartók, Albert P., Mike C. Payne, Risi Kondor, and Gábor Csányi. “Gaussian approximation potentials: The accuracy of quantum mechanics, without the electrons.” Physical review letters 104, no. 13 (2010): 136403.

Thompson2015

Thompson, Aidan P., Laura P. Swiler, Christian R. Trott, Stephen M. Foiles, and Garritt J. Tucker. “Spectral neighbor analysis method for automated generation of quantum-accurate interatomic potentials.” Journal of Computational Physics 285 (2015): 316-330.

transform(conf, grad=False)[source]#

Transform atomic coords to atomic environment descriptor values.

Parameters
  • conf – atomic configuration

  • fit_forces – Whether to fit forces, so as to compute gradients of fingerprints w.r.t. coords

  • fit_stress – Whether to fit stress, so as to compute gradients of fingerprints w.r.t. coords

Returns

Descriptor values. 2D array with shape (num_atoms, num_descriptors),

where num_atoms is the number of atoms in the configuration, and num_descriptors is the size of the descriptor vector (depending on the choice of the hyperparameters).

dzeta_dr: Gradient of the descriptor w.r.t. atomic coordinates. 4D array if

grad is True, otherwise None. Shape: (num_atoms, num_descriptors, num_atoms, 3), where num_atoms and num_descriptors has the same meanings as described in zeta, and 3 denotes the 3D space for the Cartesian coordinates.

dzeta_ds: Gradient of the descriptor w.r.t. virial stress component. 2D

array of shape (num_atoms, num_descriptors, 6), where num_atoms and num_descriptors has the same meanings as described in zeta, and 6 denote the virial stress component in Voigt notation, see https://en.wikipedia.org/wiki/Voigt_notation

Return type

zeta

update_hyperparams(params)[source]#

Update the hyperparameters based on the input at initialization.

get_size()[source]#

Return the size of descriptor.

generate_fingerprints(configs, fit_forces=False, fit_stress=False, fingerprints_filename='fingerprints.pkl', fingerprints_mean_stdev_filename=None, use_welford_method=False, nprocs=1)#

Convert all configurations to their fingerprints.

Parameters
  • configs (List[Configuration]) – Dataset configurations

  • fit_forces (bool) – Whether to compute the gradient of fingerprints w.r.t. atomic coordinates so as to compute forces.

  • fit_stress (bool) – Whether to compute the gradient of fingerprints w.r.t. atomic coordinates so as to compute stress.

  • use_welford_method (bool) – Whether to compute mean and standard deviation using the Welford method, which is memory efficient. See https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance

  • fingerprints_filename (Union[Path, str]) – Path to dump fingerprints to a pickle file.

  • fingerprints_mean_stdev_filename (Union[str, Path, None]) – Path to dump the mean and standard deviation of the fingerprints as a pickle file. If normalize=False for the descriptor, this is ignored.

  • nprocs (int) – Number of processes used to generate the fingerprints. If 1, run in serial mode, otherwise nprocs processes will be forked via multiprocessing to do the work.

get_cutoff()#

Return the name and values of cutoff.

get_dtype()#

Return the data type of the fingerprints.

get_hyperparams()#

Return the hyperparameters of descriptors.

get_mean()#

Return a list of the mean of the fingerprints.

get_stdev()#

Return a list of the standard deviation of the fingerprints.

load_state_dict(data)#

Load state dict of a descriptor.

Parameters

data (Dict[str, Any]) – state dict to load.

state_dict()#

Return the state dict of the descriptor.

Return type

Dict[str, Any]

write_kim_params(path, fname='descriptor.params')#

Write descriptor info for KIM model.

Parameters
  • path (Union[Path, str]) – Directory Path to write the file.

  • fname (str) – Name of the file.