Train a Stillinger-Weber potential#
In this tutorial, we train a Stillinger-Weber (SW) potential for silicon that is archived on OpenKIM_.
Before getting started to train the SW model, let’s first make sure it is installed.
If you haven’t already, follow installation
to install kim-api
and
kimpy
, and openkim-models
.
Then do $ kim-api-collections-management list
, and make sure
SW_StillingerWeber_1985_Si__MO_405512056662_006
is listed in one of the
collections.
Note
If you see ``SW_StillingerWeber_1985_Si__MO_405512056662_005`` (note the last three digits), you need to change ``model = KIMModel(model_name="SW_StillingerWeber_1985_Si__MO_405512056662_006")`` to the corresponding model name in your installation.
We are going to create potentials for diamond silicon, and fit the potentials to a
training set of energies and forces consisting of compressed and stretched diamond
silicon structures, as well as configurations drawn from molecular dynamics trajectories
at different temperatures.
Download the training set :download:Si_training_set.tar.gz <https://raw.githubusercontent.com/openkim/kliff/master/examples/Si_training_set.tar.gz>
.
(It will be automatically downloaded if not present.)
The data is stored in # extended xyz format, and see doc.dataset
for more
information of this format.
Warning
The ``Si_training_set`` is just a toy data set for the purpose to demonstrate how to use KLIFF to train potentials. It should not be used to train any potential for real simulations.
Let’s first import the modules that will be used in this example.
from kliff.calculators import Calculator
from kliff.dataset import Dataset
from kliff.dataset.weight import Weight
from kliff.loss import Loss
from kliff.models import KIMModel
from kliff.utils import download_dataset
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 1
----> 1 from kliff.calculators import Calculator
2 from kliff.dataset import Dataset
3 from kliff.dataset.weight import Weight
ModuleNotFoundError: No module named 'kliff'
Model#
We first create a KIM model for the SW potential, and print out all the available
parameters that can be optimized (we call this model parameters
).
model = KIMModel(model_name="SW_StillingerWeber_1985_Si__MO_405512056662_006")
model.echo_model_params()
#================================================================================
# Available parameters to optimize.
# Parameters in `original` space.
# Model: SW_StillingerWeber_1985_Si__MO_405512056662_006
#================================================================================
name: A
value: [15.28484792]
size: 1
name: B
value: [0.60222456]
size: 1
name: p
value: [4.]
size: 1
name: q
value: [0.]
size: 1
name: sigma
value: [2.0951]
size: 1
name: gamma
value: [2.51412]
size: 1
name: cutoff
value: [3.77118]
size: 1
name: lambda
value: [45.5322]
size: 1
name: costheta0
value: [-0.33333333]
size: 1
'#================================================================================\n# Available parameters to optimize.\n# Parameters in `original` space.\n# Model: SW_StillingerWeber_1985_Si__MO_405512056662_006\n#================================================================================\n\nname: A\nvalue: [15.28484792]\nsize: 1\n\nname: B\nvalue: [0.60222456]\nsize: 1\n\nname: p\nvalue: [4.]\nsize: 1\n\nname: q\nvalue: [0.]\nsize: 1\n\nname: sigma\nvalue: [2.0951]\nsize: 1\n\nname: gamma\nvalue: [2.51412]\nsize: 1\n\nname: cutoff\nvalue: [3.77118]\nsize: 1\n\nname: lambda\nvalue: [45.5322]\nsize: 1\n\nname: costheta0\nvalue: [-0.33333333]\nsize: 1\n\n'
The output is generated by the last line, and it tells us the name
, value
,
size
, data type
and a description
of each parameter.
Note
You can provide a ``path`` argument to the method ``echo_model_params(path)`` to write the available parameters information to a file indicated by ``path``.
Note
The available parameters information can also by obtained using the **kliff** `cmdlntool`: ``$ kliff model --echo-params SW_StillingerWeber_1985_Si__MO_405512056662_006``
Now that we know what parameters are available for fitting, we can optimize all or a subset of them to reproduce the training set.
model.set_opt_params(
A=[[5.0, 1.0, 20]], B=[["default"]], sigma=[[2.0951, "fix"]], gamma=[[1.5]]
)
model.echo_opt_params()
#================================================================================
# Model parameters that are optimized.
# Note that the parameters are in the transformed space if
# `params_transform` is provided when instantiating the model.
#================================================================================
A 1
5.0000000000000000e+00 1.0000000000000000e+00 2.0000000000000000e+01
B 1
6.0222455840000000e-01
sigma 1
2.0951000000000000e+00 fix
gamma 1
1.5000000000000000e+00
'#================================================================================\n# Model parameters that are optimized.\n# Note that the parameters are in the transformed space if \n# `params_transform` is provided when instantiating the model.\n#================================================================================\n\nA 1\n 5.0000000000000000e+00 1.0000000000000000e+00 2.0000000000000000e+01 \n\nB 1\n 6.0222455840000000e-01 \n\nsigma 1\n 2.0951000000000000e+00 fix \n\ngamma 1\n 1.5000000000000000e+00 \n\n'
Here, we tell KLIFF to fit four parameters B
, gamma
, sigma
, and A
of the
SW model. The information for each fitting parameter should be provided as a list of
list, where the size of the outer list should be equal to the size
of the parameter
given by model.echo_model_params()
. For each inner list, you can provide either one,
two, or three items.
One item. You can use a numerical value (e.g.
gamma
) to provide an initial guess of the parameter. Alternatively, the string'default'
can be provided to use the default value in the model (e.g.B
).Two items. The first item should be a numerical value and the second item should be the string
'fix'
(e.g.sigma
), which tells KLIFF to use the value for the parameter, but do not optimize it.Three items. The first item can be a numerical value or the string
'default'
, having the same meanings as the one item case. In the second and third items, you can list the lower and upper bounds for the parameters, respectively. A bound could be provided as a numerical values orNone
. The latter indicates no bound is applied.
The call of model.echo_opt_params()
prints out the fitting parameters that we
require KLIFF to optimize. The number 1
after the name of each parameter indicates
the size of the parameter.
Note
The parameters that are not included as a fitting parameter are fixed to the default values in the model during the optimization.
Training set#
KLIFF has a :class:~kliff.dataset.Dataset
to deal with the training data (and possibly
test data). Additionally, we define the energy_weight
and forces_weight
corresponding to each configuration using :class:~kliff.dataset.weight.Weight
. In
this example, we set energy_weight
to 1.0
and forces_weight
to 0.1
.
For the silicon training set, we can read and process the files by:
dataset_path = download_dataset(dataset_name="Si_training_set")
weight = Weight(energy_weight=1.0, forces_weight=0.1)
tset = Dataset(dataset_path, weight)
configs = tset.get_configs()
2023-08-01 22:27:08.799 | INFO | kliff.dataset.dataset:_read:398 - 1000 configurations read from /Users/mjwen.admin/Packages/kliff/docs/source/tutorials/Si_training_set
The configs
in the last line is a list of :class:~kliff.dataset.Configuration
.
Each configuration is an internal representation of a processed extended xyz file,
hosting the species, coordinates, energy, forces, and other related information of a
system of atoms.
Calculator#
:class:~kliff.calculator.Calculator
is the central agent that exchanges information
and orchestrate the operation of the fitting process. It calls the model to compute the
energy and forces and provide this information to the Loss function
_ (discussed below)
to compute the loss. It also grabs the parameters from the optimizer and update the
parameters stored in the model so that the up-to-date parameters are used the next time
the model is evaluated to compute the energy and forces. The calculator can be created
by:
calc = Calculator(model)
_ = calc.create(configs)
2023-08-01 22:27:09.207 | INFO | kliff.calculators.calculator:create:107 - Create calculator for 1000 configurations.
where calc.create(configs)
does some initializations for each
configuration in the training set, such as creating the neighbor list.
Loss function#
KLIFF uses a loss function to quantify the difference between the training set data and
potential predictions and uses minimization algorithms to reduce the loss as much as
possible. KLIFF provides a large number of minimization algorithms by interacting with
SciPy_. For physics-motivated potentials, any algorithm listed on
scipy.optimize.minimize
_ and scipy.optimize.least_squares
_ can be used. In the
following code snippet, we create a loss of energy and forces and use 2
processors
to calculate the loss. The L-BFGS-B
minimization algorithm is applied to minimize
the loss, and the minimization is allowed to run for a max number of 100 iterations.
steps = 100
loss = Loss(calc, nprocs=2)
loss.minimize(method="L-BFGS-B", options={"disp": True, "maxiter": steps})
2023-08-01 22:27:09.210 | INFO | kliff.loss:minimize:310 - Start minimization using method: L-BFGS-B.
2023-08-01 22:27:09.212 | INFO | kliff.loss:_scipy_optimize:429 - Running in multiprocessing mode with 2 processes.
RUNNING THE L-BFGS-B CODE
* * *
Machine precision = 2.220D-16
N = 3 M = 10
At X0 0 variables are exactly at the bounds
At iterate 0 f= 4.47164D+03 |proj g|= 4.47898D+03
At iterate 1 f= 1.20212D+03 |proj g|= 2.13266D+03
At iterate 1 f= 1.20212D+03 |proj g|= 2.13266D+03
At iterate 2 f= 2.16532D+02 |proj g|= 1.90519D+02
At iterate 2 f= 2.16532D+02 |proj g|= 1.90519D+02
At iterate 3 f= 2.07552D+02 |proj g|= 1.06071D+02
At iterate 3 f= 2.07552D+02 |proj g|= 1.06071D+02
At iterate 4 f= 1.70033D+02 |proj g|= 3.48082D+02
At iterate 5 f= 1.64800D+02 |proj g|= 3.74180D+02
At iterate 5 f= 1.64800D+02 |proj g|= 3.74180D+02
At iterate 6 f= 1.38087D+02 |proj g|= 1.31340D+02
At iterate 6 f= 1.38087D+02 |proj g|= 1.31340D+02
At iterate 7 f= 1.34855D+02 |proj g|= 1.45391D+01
At iterate 7 f= 1.34855D+02 |proj g|= 1.45391D+01
At iterate 8 f= 1.34599D+02 |proj g|= 1.58968D+01
At iterate 9 f= 1.32261D+02 |proj g|= 8.46707D+01
At iterate 10 f= 1.26954D+02 |proj g|= 2.36049D+02
At iterate 10 f= 1.26954D+02 |proj g|= 2.36049D+02
At iterate 11 f= 1.20788D+02 |proj g|= 2.42511D+02
At iterate 11 f= 1.20788D+02 |proj g|= 2.42511D+02
At iterate 12 f= 9.84653D+01 |proj g|= 2.90333D+02
At iterate 12 f= 9.84653D+01 |proj g|= 2.90333D+02
At iterate 13 f= 7.92970D+01 |proj g|= 1.27395D+02
At iterate 13 f= 7.92970D+01 |proj g|= 1.27395D+02
At iterate 14 f= 6.33426D+01 |proj g|= 1.12669D+02
At iterate 14 f= 6.33426D+01 |proj g|= 1.12669D+02
At iterate 15 f= 5.95658D+01 |proj g|= 2.50284D+02
At iterate 15 f= 5.95658D+01 |proj g|= 2.50284D+02
At iterate 16 f= 5.19898D+01 |proj g|= 2.97639D+02
At iterate 16 f= 5.19898D+01 |proj g|= 2.97639D+02
At iterate 17 f= 3.31620D+01 |proj g|= 2.39904D+02
At iterate 18 f= 2.00817D+01 |proj g|= 2.43105D+01
At iterate 19 f= 1.58825D+01 |proj g|= 1.94992D+02
At iterate 20 f= 1.00645D+01 |proj g|= 3.25943D+02
At iterate 20 f= 1.00645D+01 |proj g|= 3.25943D+02
At iterate 21 f= 4.82724D+00 |proj g|= 2.33796D+01
At iterate 21 f= 4.82724D+00 |proj g|= 2.33796D+01
At iterate 22 f= 3.26863D+00 |proj g|= 7.48010D+01
At iterate 22 f= 3.26863D+00 |proj g|= 7.48010D+01
At iterate 23 f= 2.81339D+00 |proj g|= 2.37520D+01
At iterate 23 f= 2.81339D+00 |proj g|= 2.37520D+01
At iterate 24 f= 2.53369D+00 |proj g|= 2.24782D+01
At iterate 25 f= 2.31427D+00 |proj g|= 4.19973D+01
At iterate 25 f= 2.31427D+00 |proj g|= 4.19973D+01
At iterate 26 f= 1.82162D+00 |proj g|= 5.03854D+01
At iterate 26 f= 1.82162D+00 |proj g|= 5.03854D+01
At iterate 27 f= 1.04312D+00 |proj g|= 2.46183D+01
At iterate 28 f= 7.95851D-01 |proj g|= 1.50873D+01
At iterate 28 f= 7.95851D-01 |proj g|= 1.50873D+01
At iterate 29 f= 7.40878D-01 |proj g|= 1.52873D+00
At iterate 29 f= 7.40878D-01 |proj g|= 1.52873D+00
At iterate 30 f= 7.05900D-01 |proj g|= 1.50051D+01
At iterate 30 f= 7.05900D-01 |proj g|= 1.50051D+01
At iterate 31 f= 6.95221D-01 |proj g|= 4.45629D+00
At iterate 31 f= 6.95221D-01 |proj g|= 4.45629D+00
At iterate 32 f= 6.94089D-01 |proj g|= 1.64352D-01
At iterate 32 f= 6.94089D-01 |proj g|= 1.64352D-01
At iterate 33 f= 6.94079D-01 |proj g|= 2.10362D-02
At iterate 33 f= 6.94079D-01 |proj g|= 2.10362D-02
At iterate 34 f= 6.94078D-01 |proj g|= 8.86005D-03
At iterate 35 f= 6.94078D-01 |proj g|= 8.83015D-03
At iterate 35 f= 6.94078D-01 |proj g|= 8.83015D-03
2023-08-01 22:27:43.444 | INFO | kliff.loss:minimize:312 - Finish minimization using method: L-BFGS-B.
At iterate 36 f= 6.94078D-01 |proj g|= 5.10514D-04
* * *
Tit = total number of iterations
Tnf = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip = number of BFGS updates skipped
Nact = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F = final function value
* * *
N Tit Tnf Tnint Skip Nact Projg F
3 36 44 37 0 0 5.105D-04 6.941D-01
F = 0.69407801330347585
CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
fun: 0.6940780133034758
hess_inv: <3x3 LbfgsInvHessProduct with dtype=float64>
jac: array([ 2.62567724e-05, -5.10513851e-04, 1.01474385e-05])
message: 'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
nfev: 176
nit: 36
njev: 44
status: 0
success: True
x: array([14.93863362, 0.58740265, 2.20146126])
The minimization stops after running for 27 steps. After the minimization, we’d better save the model, which can be loaded later for the purpose to do a retraining or evaluations. If satisfied with the fitted model, you can also write it as a KIM model that can be used with LAMMPS_, GULP_, ASE_, etc. via the kim-api_.
model.echo_opt_params()
model.save("kliff_model.yaml")
model.write_kim_model()
# model.load("kliff_model.yaml")
2023-08-01 22:27:43.455 | INFO | kliff.models.kim:write_kim_model:692 - KLIFF trained model write to `/Users/mjwen.admin/Packages/kliff/docs/source/tutorials/SW_StillingerWeber_1985_Si__MO_405512056662_006_kliff_trained`
#================================================================================
# Model parameters that are optimized.
# Note that the parameters are in the transformed space if
# `params_transform` is provided when instantiating the model.
#================================================================================
A 1
1.4938633615724747e+01 1.0000000000000000e+00 2.0000000000000000e+01
B 1
5.8740264694219135e-01
sigma 1
2.0951000000000000e+00 fix
gamma 1
2.2014612645628717e+00
The first line of the above code generates the output. A comparison with the original
parameters before carrying out the minimization shows that we recover the original
parameters quite reasonably. The second line saves the fitted model to a file named
kliff_model.pkl
on the disk, and the third line writes out a KIM potential named
SW_StillingerWeber_1985_Si__MO_405512056662_006_kliff_trained
.
.. seealso::
For information about how to load a saved model, see doc.modules
.