Train a Stillinger-Weber potential¶
In this tutorial, we train a Stillinger-Weber (SW) potential for silicon that is archived on OpenKIM.
Before getting started to train the SW model, let’s first make sure it is installed.
If you haven’t already, follow installation to install kim-api
and kimpy, and openkim-models.
Then do $ kim-api-collections-management list, and make sure
SW_StillingerWeber_1985_Si__MO_405512056662_006 is listed in one of
the collections.
Tip
If you see SW_StillingerWeber_1985_Si__MO_405512056662_005 (note the last
three digits), you need to change model = KIMModel(model_name="SW_StillingerWeber_1985_Si__MO_405512056662_006")
to the corresponding model name in your installation.
We are going to create potentials for diamond silicon, and fit the
potentials to a training set of energies and forces consisting of
compressed and stretched diamond silicon structures, as well as
configurations drawn from molecular dynamics trajectories at different
temperatures. Download the training set Si_training_set.tar.gz.
(It will be automatically downloaded if not present.) The data is stored
in # extended xyz format, and see doc.dataset for more
information of this format.
Warning
The Si_training_set is just a toy data set for the purpose to demonstrate how to
use KLIFF to train potentials. It should not be used to train any potential for real
simulations.
Warning
Regression calculator, and loss module is now part of legacy module.`
Let’s first import the modules that will be used in this example.
from kliff.legacy.calculators import Calculator
from kliff.dataset import Dataset
from kliff.dataset.weight import Weight
from kliff.legacy.loss import Loss
from kliff.models import KIMModel
from kliff.utils import download_dataset
Model¶
We first create a KIM model for the SW potential, and print out all the
available parameters that can be optimized (we call these the
model parameters). Continuing in our python script we write
model = KIMModel(model_name="SW_StillingerWeber_1985_Si__MO_405512056662_006")
model.echo_model_params()
#================================================================================
# Available parameters to optimize (In MODEL SPACE).
# Model: SW_StillingerWeber_1985_Si__MO_405512056662_006
#================================================================================
name: A
value: [15.28484792]
size: 1
name: B
value: [0.60222456]
size: 1
name: p
value: [4.]
size: 1
name: q
value: [0.]
size: 1
name: sigma
value: [2.0951]
size: 1
name: gamma
value: [2.51412]
size: 1
name: cutoff
value: [3.77118]
size: 1
name: lambda
value: [45.5322]
size: 1
name: costheta0
value: [-0.33333333]
size: 1
#================================================================================
# Following parameters have transformation objects attached,
# Parameter value in PARAM SPACE:
#================================================================================
'#================================================================================n# Available parameters to optimize (In MODEL SPACE).n# Model: SW_StillingerWeber_1985_Si__MO_405512056662_006n#================================================================================nnname: Anvalue: [15.28484792]nsize: 1nnname: Bnvalue: [0.60222456]nsize: 1nnname: pnvalue: [4.]nsize: 1nnname: qnvalue: [0.]nsize: 1nnname: sigmanvalue: [2.0951]nsize: 1nnname: gammanvalue: [2.51412]nsize: 1nnname: cutoffnvalue: [3.77118]nsize: 1nnname: lambdanvalue: [45.5322]nsize: 1nnname: costheta0nvalue: [-0.33333333]nsize: 1nn#================================================================================n# Following parameters have transformation objects attached, n# Parameter value in PARAM SPACE: n#================================================================================n'
The output is generated by the last line, and it tells us the name,
value, size, data type and a description of each
parameter.
Tip
- You can provide a
pathargument to the method echo_model_params(path) to write the available parameters information to a file indicated by path
Warning
- The available parameters information can also by obtained using the kliff
cmdlntool:
$ kliff model --echo-params SW_StillingerWeber_1985_Si__MO_405512056662_006
Now that we know what parameters are available for fitting, we can optimize all or a subset of them to reproduce the training set.
model.set_opt_params(
A=[[5.0, 1.0, 20]], B=[["default"]], sigma=[[2.0951, "fix"]], gamma=[[1.5]]
)
model.echo_opt_params()
Parameter:A : [5.]
Parameter:B : [0.60222456]
Parameter:gamma : [1.5]
Here, we tell KLIFF to fit four parameters B, gamma, sigma,
and A of the SW model. The information for each fitting parameter
should be provided as a list of lists, where the size of the outer list
should be equal to the size of the parameter given by
model.echo_model_params(). For each inner list, you can provide
either one, two, or three items.
One item. You can use a numerical value (e.g.
gamma) to provide an initial guess of the parameter. Alternatively, the string'default'can be provided to use the default value in the model (e.g.B).Two items. The first item should be a numerical value and the second item should be the string
'fix'(e.g.sigma), which tells KLIFF to use the value for the parameter, but do not optimize it.Three items. The first item can be a numerical value or the string
'default', having the same meanings as the one item case. In the second and third items, you can list the lower and upper bounds for the parameters, respectively. A bound could be provided as a numerical values orNone. The latter indicates no bound is applied.
The call of model.echo_opt_params() prints out the fitting
parameters that we require KLIFF to optimize. The number 1 after the
name of each parameter indicates the size of the parameter.
Tip
- The parameters that are not included as a fitting parameter are fixed to the default
values in the model during the optimization.
Training set¶
KLIFF has a Dataset to deal with the
training data (and possibly test data). Additionally, we define the
energy_weight and forces_weight corresponding to each
configuration using Weight. In this
example, we set energy_weight to 1.0 and forces_weight to
0.1. For the silicon training set, we can read and process the files
by:
dataset_path = download_dataset(dataset_name="Si_training_set")
weight = Weight(energy_weight=1.0, forces_weight=0.1)
tset = Dataset.from_path(dataset_path, weight)
configs = tset.get_configs()
The configs in the last line is a list of
Configuration. Each configuration is an
internal representation of a processed extended xyz file, hosting
the species, coordinates, energy, forces, and other related information
of a system of atoms.
Calculator¶
Calculator is the central agent that
exchanges information and orchestrate the operation of the fitting
process. It calls the model to compute the energy and forces and provide
this information to the Loss function (discussed
below) to compute the loss. It also grabs the
parameters from the optimizer and update the parameters stored in the
model so that the up-to-date parameters are used the next time the model
is evaluated to compute the energy and forces. The calculator can be
created by:
calc = Calculator(model)
_ = calc.create(configs)
2025-05-16 21:17:01.932 | INFO | kliff.legacy.calculators.calculator:create:107 - Create calculator for 1000 configurations.
where calc.create(configs) does some initializations for each
configuration in the training set, such as creating the neighbor list.
Loss function¶
KLIFF uses a loss function to quantify the difference between the
training set data and potential predictions and uses minimization
algorithms to reduce the loss as much as possible. KLIFF provides a
large number of minimization algorithms by interacting with
SciPy. For physics-motivated potentials, any
algorithm listed on `scipy.optimize.minimize and `scipy.optimize.least_squares
can be used. In the following code snippet, we create a loss of energy
and forces and use 2 processors to calculate the loss. The
L-BFGS-B minimization algorithm is applied to minimize the loss, and
the minimization is allowed to run for a max number of 100 iterations.
steps = 100
loss = Loss(calc, nprocs=2)
# loss.minimize(method="L-BFGS-B", options={"disp": True, "maxiter": steps})
The minimization stops after running for 27 steps. After the minimization, we’d better save the model, which can be loaded later for the purpose of retraining or for function evaluations. If satisfied with the fitted model, you can also write it as a KIM model that can be used with LAMMPS, GULP, ASE, etc. via the kim-api.
model.echo_opt_params()
model.save("kliff_model.yaml")
model.write_kim_model()
# model.load("kliff_model.yaml")
2025-05-16 21:17:01.991 | INFO | kliff.models.kim:write_kim_model:657 - KLIFF trained model write to /home/amit/Projects/COLABFIT/kliff/kliff/docs/source/tutorials/SW_StillingerWeber_1985_Si__MO_405512056662_006_kliff_trained Parameter:A : [5.] Parameter:B : [0.60222456] Parameter:gamma : [1.5]
The first line of the above code generates the output. A comparison with
the original parameters before carrying out the minimization shows that
we recover the original parameters quite reasonably. The second line
saves the fitted model to a file named kliff_model.pkl on the disk,
and the third line writes out a KIM potential named
SW_StillingerWeber_1985_Si__MO_405512056662_006_kliff_trained.
For information about how to load a saved model, see Save and load a model.