Train a linear regression potential#

In this tutorial, we train a linear regression model on the descriptors obtained using the symmetry functions.

from kliff.calculators import CalculatorTorch
from kliff.dataset import Dataset
from kliff.descriptors import SymmetryFunction
from kliff.models import LinearRegression
from kliff.utils import download_dataset

descriptor = SymmetryFunction(
    cut_name="cos", cut_dists={"Si-Si": 5.0}, hyperparams="set30", normalize=True
)


model = LinearRegression(descriptor)

# training set
dataset_path = download_dataset(dataset_name="Si_training_set")
dataset_path = dataset_path.joinpath("varying_alat")
tset = Dataset(dataset_path)
configs = tset.get_configs()

# calculator
calc = CalculatorTorch(model)
calc.create(configs, reuse=False)

Out:

2022-04-28 10:49:35.846 | INFO     | kliff.dataset.dataset:_read:397 - 400 configurations read from /Users/mjwen/Applications/kliff/examples/Si_training_set/varying_alat
2022-04-28 10:49:35.848 | INFO     | kliff.calculators.calculator_torch:_get_device:417 - Training on cpu
2022-04-28 10:49:35.849 | INFO     | kliff.descriptors.descriptor:generate_fingerprints:104 - Start computing mean and stdev of fingerprints.
2022-04-28 10:49:56.708 | INFO     | kliff.descriptors.descriptor:generate_fingerprints:121 - Finish computing mean and stdev of fingerprints.
2022-04-28 10:49:56.712 | INFO     | kliff.descriptors.descriptor:generate_fingerprints:129 - Fingerprints mean and stdev saved to `fingerprints_mean_and_stdev.pkl`.
2022-04-28 10:49:56.712 | INFO     | kliff.descriptors.descriptor:_dump_fingerprints:164 - Pickling fingerprints to `fingerprints.pkl`
2022-04-28 10:49:56.734 | INFO     | kliff.descriptors.descriptor:_dump_fingerprints:176 - Processing configuration: 0.
2022-04-28 10:49:57.294 | INFO     | kliff.descriptors.descriptor:_dump_fingerprints:176 - Processing configuration: 100.
2022-04-28 10:49:57.906 | INFO     | kliff.descriptors.descriptor:_dump_fingerprints:176 - Processing configuration: 200.
2022-04-28 10:49:58.640 | INFO     | kliff.descriptors.descriptor:_dump_fingerprints:176 - Processing configuration: 300.
2022-04-28 10:49:59.157 | INFO     | kliff.descriptors.descriptor:_dump_fingerprints:219 - Pickle 400 configurations finished.

We can train a linear regression model by minimizing a loss function as discussed in Train a neural network potential. But linear regression model has analytic solutions, and thus we can train the model directly by using this feature. This can be achieved by calling the fit() function of its calculator.

# fit the model
calc.fit()


# save model
model.save("linear_model.pkl")

Out:

2022-04-28 10:49:59.693 | INFO     | kliff.models.linear_regression:fit:39 - fit model "LinearRegression" finished.
fit model "LinearRegression" finished.

Total running time of the script: ( 0 minutes 25.892 seconds)

Gallery generated by Sphinx-Gallery