Theory#

A parametric potential typically takes the form

$\mathcal{V} = \mathcal{V}(\bm r_1,\dots,\bm r_{N_a}, Z_1,\dots,Z_{N_a}; \bm\theta)$

where $\bm r_1,\dots,\bm r_{N_a}$ and $Z_1,\dots,Z_{N_a}$ are the coordinates and species of a system of $N_a$ atoms, respectively, and $\bm\theta$ denotes a set of fitting parameters. For notational simplicity, in the following discussion, we assume that the atomic species information is implicitly carried by the coordinates and thus we can exclude $Z$ from the functional form, and use $\bm R$ to denote the coordinates of all atoms in the configuration. Then we have

$\mathcal{V} = \mathcal{V}(\bm R; \bm\theta).$

A potential parameterization process is typically formulated as a weighted least-squares minimization problem, where we adjust the potential parameters $\bm\theta$ so as to reproduce a training set of reference data obtained from experiments and/or first-principles computations. Mathematically, we hope to minimize a loss function

$\mathcal{L(\bm\theta)} = \frac{1}{2} \sum_{i=1}^{N_p} \|w_i (\bm p_i(\mathcal{V}(\bm R_i; \bm\theta)) - \bm q_i) \| ^2$

with respect to $\bm\theta$ , where $\{\bm q_1,\dots, \bm q_{N_p}\}$ is a training set of $N_p$ reference data, $\bm p_i$ is the corresponding prediction for $\bm q_i$ computed from the potential (as indicated by its argument), $\|\cdot\|$ denote the $L_2$ norm, and $w_i$ is the weight for the $i$ -th data point. We call

$\bm u = \bm p(\mathcal{V}(\bm R; \bm\theta)) - \bm q$

the residual function that characterizes the difference between the potential predictions and the reference data for a set of properties.

Generally speaking, $\bm q$ can be a collection of any material properties considered important for a given application, such as the cohesive energy, equilibrium lattice constant, and elastic constants of a given crystal phase. These materials properties can be obtained from experiments and/or first-principles calculations. However, nowadays, most of the potentials are trained using the force-matching scheme, where the potential is trained to a large set of forces on atoms (and/or energies, stresses) obtained by first-principles calculations for a set of atomic configurations. This is extremely true for machine learning potentials, where a large set of training data is necessary, and it seems impossible to collect sufficient number of material properties for the training set.

The reference $\bm q$ and the prediction $\bm p$ are typically represented as vectors such that $q[m]$ is the $m$ -th reference property and $p[m]$ is the corresponding $m$ -th prediction obtained from the potential. Assuming we want to fit a potential to energy and forces, then $\bm q$ is a vector of size $1+3N_a$ , in which $N_a$ is the number of atoms in a configuration, with

$q[0] &= E_\text{ref}\\ q[1] &= f_\text{ref}^{0, x}, \quad q[2] = f_\text{ref}^{0, y}, \quad q[3] = f_\text{ref}^{0, z}, \\ q[4] &= f_\text{ref}^{1, x}, \quad q[5] = f_\text{ref}^{1, y}, \quad q[6] = f_\text{ref}^{1, z}, \\ \cdots \\ q[3N_a-2] &= f_\text{ref}^{N_a-1, x}, \quad q[3N_a-1] = f_\text{ref}^{N_a-1, y}, \quad q[3N_a] = f_\text{ref}^{N_a-1, z}, \\$

where $E_\text{ref}$ is the reference energy, and $f_\text{ref}^{i, x}$ , $f_\text{ref}^{i, y}$ , and $f_\text{ref}^{i, z}$ denote the $x$ -, $y$ -, and $z$ -component of reference force on atom $i$ , respectively. In other words, we put the energy as the 0th component of $\bm q$ , and then put the force on the first atom as the 1st to 3rd components of $\bm q$ , the force on the second atom the next three components till the forces on all atoms are placed in $\bm q$ . In the same fashion, we can construct the prediction vector $\bm p$ , and then to compute the residual vector.

Note

We use boldface with subscript to denote a data point (e.g. $\bm q_i$ means the $i$ -th data point in the training set), and use normal text with square bracket to denote the component of a data point (e.g. : $q[m]$ indicates the $m$ -th component of a general data point $\bm q$ .

If stress is used in the fitting, $q[3N_a]$ to $q[3N_a+5]$ will store the reference Voigt stress $\sigma_{xx}, \sigma_{yy}, \sigma_{zz}, \sigma_{yz}, \sigma_{xy}, \sigma_{xz}$ , and, of course, $p[3N_a]$ to $p[3N_a+5]$ are the corresponding predictions computed from the potential.

The objective of the parameterization process is to find a set of parameters $\bm\theta$ of potential that reproduce the reference data as well as possible.