.. _doc.loss:

====
Loss
====

As discussed in :ref:`theory`, we solve a minimization problem to fit the potential.
For physics-motivated potentials, the geodesic Levenberg-Marquardt (``geodesicLM``)
minimization method [transtrum2012geodesicLM]_ can be used, which has been
shown to perform well for potentials in [wen2016potfit]_. KLIFF also interacts
with SciPy_ to utilize the zoo of optimization methods there.
For machine learning potentials, KLIFF wraps the optimization methods in PyTorch_.

KLIFF provides a uniform interface to use all the optimization methods.
To carry out optimization, first create a loss object:

.. code-block:: python

    from kliff.legacy.loss import Loss

    calculator = ...  # create a calculator
    Loss(calculator, nprocs=1, residual_fn=None, residual_data=None)

``calculator`` (discussed in :ref:`doc.calculators`) provides predictions
calculated using a potential and the corresponding reference data via
:meth:`~kliff.calculator.Calculator.get_prediction()` and
:meth:`~kliff.calculator.Calculator.get_reference()`, respectively, which the
optimizer can be used to construct the objective function.

``nprocs`` informs KLIFF the number of cores that KLIFF can use to parallelize
over the dataset to evaluate the objective function.

``residual_data`` is a dictionary that will be used by ``residual_fn`` to compute
the residual. ``residual_data`` is optional, and its default is:

.. code-block:: python

    residual_data = {'normalize_by_natoms': True}

The meaning of this value is made clear in the below discussion.


``residual_fn`` is a function used to compute the residual.
As discussed in :ref:`theory`, the objective function is a sum of the square
of the norm of the residual of each individual configuration, i.e.

.. math::
    \mathcal{L(\bm\theta)} = \frac{1}{2} \sum_{i=1}^{N_p}
    \|w_i \bm u_i\|^2

with the residual

.. math::
    \bm u_i = \bm p_i - \bm q_i ,

in which :math:`\bm p_i` is a vector of predictions computed using the potential
for the :math:`i`-th configuration, and :math:`\bm q_i` is a vector of the
corresponding reference data.
The residual is computed using the ``residual_fn``, which should be of the form

.. code-block:: python

    def residual_fn(identifier, natoms, weight, prediction, reference, data):
        """A function to compute the residual for a configuration."""

        # ... compute u based on p (prediction) and q (reference)
        # and it should be a vector
        return u

In the above residual function,

- ``identifier`` is a (unique) ``str`` associated with the configuration, which
  is specified in :class:`~kliff.dataset.Configuration`. If it is not provided
  there, ``identifier`` is default to the path to the file that storing the
  configuration, e.g. ``Si_training_set/NVT_runs/T300_step100.xyz``.
- ``natoms`` is an ``int`` denoting the number of atoms in the configuration.
- ``weight`` is a :class:`~kliff.dataset.weight.Weight` instance that generates the
  weights from the configuration (see :ref:`doc.dataset.weight`).
- ``prediction`` is a vector of the prediction :math:`\bm p` computed from the
  potential.
- ``reference`` is a vector of the corresponding reference data :math:`\bm q`.
- ``data`` is ``residual_data`` provided at the initialization of ``Loss``.
  ``residual_data`` is a dictionary, with which the user can provide extra
  information to ``residual_fn``.

``residual_fn`` is also optional, and it defaults to :func:`~kliff.loss.energy_forces_residual`
discussed below.


Built-in residual function
==========================
KLIFF provides a number of residual functions readily to be plugged into ``Loss``
and let the wheel spin. For example, the :func:`~kliff.loss.energy_forces_residual`
that constructs the residual using energy and forces is defined as (in a nutshell):

.. code-block:: python

    def energy_forces_residual(identifier, natoms, weight, prediction, reference, data):

        # extract up the weight information
        config_weight = weight.config_weight
        energy_weight = weight.energy_weight
        forces_weight = weight.forces_weight

        # obtain residual and properly normalize it
        residual = config_weight * (prediction - reference)
        residual[0] *= energy_weight
        residual[1:] *= forces_weight

        if data["normalize_by_natoms"]:
            residual /= natoms

        return residual


This residual function retrieves the weights for energy and forces f``weight``
instance and enables the normalization of the residual based on the number of atoms.
Normalization by the number of atoms makes each individual configuration in the training
set contributes equally to the loss function; otherwise, configurations with more atoms
can dominate the loss, which (most of the times) is not what we prefer.


One can provide a ``residual_data`` instead of using the default one to tune the loss,
for example, if one wants to ignore the normalization by the number of atoms.

.. code-block:: python

    from kliff.loss import Loss
    from kliff.loss import energy_forces_residual

    calculator = ...  # create a calculator

    # provide my data
    residual_data = {'normalize_by_natoms': False}
    Loss(calculator, nprocs=1, residual_fn=energy_forces_residual, residual_data=residual_data)


.. warning::
    Even though ``residual_fn`` and ``residual_data`` is optional, we strongly
    recommend the users to explicitly provide them to reminder themselves what
    they are doing as done above.

.. seealso::
    See :mod:`kliff.loss` for other built-in residual functions.


.. _doc.loss.use_your_own_residual_function:

Use your own residual function
==============================

The built-in residual function treats each configuration in the training set, and
each atom in a configuration equally important. Sometimes, this may not be what
you want. In these cases, you can define and use your own ``residual_fn``.

For example, if you are creating a potential that is going to be used
to investigate fracture properties, and your training set include both
configurations with cracks and configurations without cracks, then you may want to
weigh more for the configurations with cracks.

.. code-block:: python

    from kliff.loss import Loss

    # define my own residual function
    def residual_fn(identifier, natoms, weight, prediction, reference, data):

        # extract the weight information
        config_weight = weight.config_weight
            energy_weight = weight.energy_weight
            forces_weight = weight.forces_weight

        # larger weight for configuration with cracks
            if 'with_cracks' in identifer:
                config_weight *= 10

        normalize = data["normalize_by_natoms"]
        if normalize:
            energy_weight /= natoms
                forces_weight /= natoms

            # obtain residual and properly normalize it
            residual = config_weight * (prediction - reference)
            residual[0] *= energy_weight
            residual[1:] *= forces_weight

            return residual


    calculator = ...  # create a calculator
    loss = Loss(
        calculator,
        nprocs=1,
        residual_fn=residual_fn,
        residual_data={"normalize_by_natoms": True}
    )


The above code takes advantage of ``identifier`` to distinguish configurations with
cracks and without cracks, and then weigh more for configurations with cracks.

For configurations with cracks, you may even want to weigh more for the atoms near
the creak tip. Then you need to identify which atoms are near the crack tip
and manipulate the corresponding components of ``residual``.


.. note::
    If you are using your own ``residual_fn``, its ``data`` argument can be completely
    ignored since it can be directly provided in your own ``residual_fn``.   

.. seealso::
    See :ref:`doc.dataset.weight.define_your_weight_class` for an alternative
    implementation of this example.

.. note::
    Handling the weight is preferably done using the weight class (see :ref:`doc.dataset.weight`)
    instead of in the residual function.


.. _PyTorch: https://pytorch.org
.. _SciPy: https://scipy.org
.. _scipy.optimize.minimize: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html
.. _scipy.optimize.least_squares: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.least_squares

.. [wen2016potfit] Wen, M., Li, J., Brommer, P., Elliott, R.S., Sethna, J.P. and
   Tadmor, E.B., 2016. A KIM-compliant potfit for fitting sloppy interatomic
   potentials: application to the EDIP model for silicon. Modelling and Simulation in
   Materials Science and Engineering, 25(1), p.014001.
.. [transtrum2012geodesicLM] 
   Transtrum, M.K., Sethna, J.P., 2012. Improvements to the Levenberg-Marquardt
   algorithm for nonlinear least-squares minimization. arXiv:1201.5885 [physics].