.. _theory:

======
Theory
======

A parametric potential typically takes the form

.. math::
    \mathcal{V} = \mathcal{V}(\bm r_1,\dots,\bm r_{N_a}, Z_1,\dots,Z_{N_a}; \bm\theta)

where :math:`\bm r_1,\dots,\bm r_{N_a}` and :math:`Z_1,\dots,Z_{N_a}` are the
coordinates and species of a system of :math:`N_a` atoms, respectively, and
:math:`\bm\theta` denotes a set of fitting parameters.
For notational simplicity, in the following discussion, we assume that the atomic
species information is implicitly carried by the coordinates and thus we can exclude
:math:`Z` from the functional form, and use :math:`\bm R` to denote the
coordinates of all atoms in the configuration. Then we have

.. math::
    \mathcal{V} = \mathcal{V}(\bm R; \bm\theta).

A potential parameterization process is typically formulated as a weighted
least-squares minimization problem, where we adjust the potential parameters
:math:`\bm\theta` so as to reproduce a training set of reference data obtained from
experiments and/or first-principles computations. Mathematically, we hope to minimize a loss function

.. math::
    \mathcal{L(\bm\theta)} = \frac{1}{2} \sum_{i=1}^{N_p}
    \|w_i (\bm p_i(\mathcal{V}(\bm R_i; \bm\theta)) - \bm q_i) \| ^2

with respect to :math:`\bm\theta`, where :math:`\{\bm q_1,\dots, \bm q_{N_p}\}` is
a training set of :math:`N_p` reference data, :math:`\bm p_i` is the corresponding
prediction for :math:`\bm q_i` computed from the potential (as indicated by its
argument), :math:`\|\cdot\|` denote the :math:`L_2` norm, and :math:`w_i` is the
weight for the :math:`i`-th data point. We call

.. math::
    \bm u = \bm p(\mathcal{V}(\bm R; \bm\theta)) - \bm q

the residual function that characterizes the difference between the potential
predictions and the reference data for a set of properties.

Generally speaking, :math:`\bm q` can be a collection of any material properties
considered important for a given application, such as the cohesive energy,
equilibrium lattice constant, and elastic constants of a given crystal phase.
These materials properties can be obtained from experiments and/or
first-principles calculations.
However, nowadays, most of the potentials are trained using the `force-matching`
scheme, where the potential is trained to a large set of forces on atoms
(and/or energies, stresses) obtained by first-principles calculations for a
set of atomic configurations. This is extremely true for machine learning
potentials, where a large set of training data is necessary, and it seems impossible
to collect sufficient number of material properties for the training set.

The reference :math:`\bm q` and the prediction :math:`\bm p` are typically
represented as vectors such that
:math:`q[m]` is the :math:`m`-th reference property and :math:`p[m]` is the
corresponding :math:`m`-th prediction obtained from the potential.
Assuming we want to fit a potential to energy and forces, then :math:`\bm q`
is a vector of size :math:`1+3N_a`, in which :math:`N_a` is the number
of atoms in a configuration, with

.. math::
    q[0] &= E_\text{ref}\\
    q[1] &= f_\text{ref}^{0, x}, \quad
    q[2] = f_\text{ref}^{0, y}, \quad
    q[3] = f_\text{ref}^{0, z}, \\
    q[4] &= f_\text{ref}^{1, x}, \quad
    q[5] = f_\text{ref}^{1, y}, \quad
    q[6] = f_\text{ref}^{1, z}, \\
    \cdots \\
    q[3N_a-2] &= f_\text{ref}^{N_a-1, x}, \quad
    q[3N_a-1] = f_\text{ref}^{N_a-1, y}, \quad
    q[3N_a] = f_\text{ref}^{N_a-1, z}, \\

where :math:`E_\text{ref}` is the reference energy, and :math:`f_\text{ref}^{i, x}`,
:math:`f_\text{ref}^{i, y}`, and :math:`f_\text{ref}^{i, z}` denote the
:math:`x`-, :math:`y`-, and :math:`z`-component of reference force on atom
:math:`i`, respectively.
In other words, we put the energy as the 0th component of :math:`\bm q`, and
then put the force on the first atom as the 1st to 3rd components of  :math:`\bm q`,
the force on the second atom the next three components till the forces on all
atoms are placed in :math:`\bm q`.
In the same fashion, we can construct the prediction vector :math:`\bm p`, and
then to compute the residual vector.

.. note::
    We use boldface with subscript to denote a data point (e.g. :math:`\bm q_i`
    means the  :math:`i`-th data point in the training set), and use normal text
    with square bracket to denote the component of a data point (e.g. : :math:`q[m]`
    indicates the :math:`m`-th component of a general data point :math:`\bm q`.

If stress is used in the fitting, :math:`q[3N_a]` to :math:`q[3N_a+5]` will store
the reference Voigt stress
:math:`\sigma_{xx}, \sigma_{yy}, \sigma_{zz}, \sigma_{yz}, \sigma_{xy}, \sigma_{xz}`,
and, of course, :math:`p[3N_a]` to :math:`p[3N_a+5]` are the corresponding
predictions computed from the potential.

The objective of the parameterization process is to find a set of parameters
:math:`\bm\theta` of potential that reproduce the reference data as well as
possible.