Train a linear regression potential#

In this tutorial, we train a linear regression model on the descriptors obtained using the symmetry functions.

from kliff.calculators import CalculatorTorch
from kliff.dataset import Dataset
from kliff.descriptors import SymmetryFunction
from kliff.models import LinearRegression
from kliff.utils import download_dataset

descriptor = SymmetryFunction(
    cut_name="cos", cut_dists={"Si-Si": 5.0}, hyperparams="set30", normalize=True
)


model = LinearRegression(descriptor)

# training set
dataset_path = download_dataset(dataset_name="Si_training_set")
dataset_path = dataset_path.joinpath("varying_alat")
tset = Dataset(dataset_path)
configs = tset.get_configs()

# calculator
calc = CalculatorTorch(model)
calc.create(configs, reuse=False)
2023-08-01 21:59:01.754 | INFO     | kliff.dataset.dataset:_read:398 - 400 configurations read from /Users/mjwen.admin/Packages/kliff/docs/source/tutorials/Si_training_set/varying_alat
2023-08-01 21:59:01.755 | INFO     | kliff.calculators.calculator_torch:_get_device:592 - Training on cpu
2023-08-01 21:59:01.756 | INFO     | kliff.descriptors.descriptor:generate_fingerprints:103 - Start computing mean and stdev of fingerprints.
2023-08-01 21:59:11.127 | INFO     | kliff.descriptors.descriptor:generate_fingerprints:120 - Finish computing mean and stdev of fingerprints.
2023-08-01 21:59:11.129 | INFO     | kliff.descriptors.descriptor:generate_fingerprints:128 - Fingerprints mean and stdev saved to `fingerprints_mean_and_stdev.pkl`.
2023-08-01 21:59:11.129 | INFO     | kliff.descriptors.descriptor:_dump_fingerprints:163 - Pickling fingerprints to `fingerprints.pkl`
2023-08-01 21:59:11.131 | INFO     | kliff.descriptors.descriptor:_dump_fingerprints:175 - Processing configuration: 0.
2023-08-01 21:59:11.199 | INFO     | kliff.descriptors.descriptor:_dump_fingerprints:175 - Processing configuration: 100.
2023-08-01 21:59:11.261 | INFO     | kliff.descriptors.descriptor:_dump_fingerprints:175 - Processing configuration: 200.
2023-08-01 21:59:11.325 | INFO     | kliff.descriptors.descriptor:_dump_fingerprints:175 - Processing configuration: 300.
2023-08-01 21:59:11.386 | INFO     | kliff.descriptors.descriptor:_dump_fingerprints:218 - Pickle 400 configurations finished.

We can train a linear regression model by minimizing a loss function as discussed in tut_nn. But linear regression model has analytic solutions, and thus we can train the model directly by using this feature. This can be achieved by calling the fit() function of its calculator.

# fit the model
calc.fit()


# save model
model.save("linear_model.pkl")
2023-08-01 21:59:11.626 | INFO     | kliff.models.linear_regression:fit:42 - Finished fitting model "LinearRegression"
Finished fitting model "LinearRegression"