{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n\n# MCMC sampling\n\nIn this example, we demonstrate how to perform uncertainty quantification (UQ) using\nparallel tempered MCMC (PTMCMC). We use a Stillinger-Weber (SW) potential for silicon\nthat is archived in OpenKIM_.\n\nFor simplicity, we only set the energy-scaling parameters, i.e., ``A`` and ``lambda`` as\nthe tunable parameters. Furthermore, these parameters are physically constrained to be\npositive, thus we will work in log parameterization, i.e. ``log(A)`` and ``log(lambda)``.\nThese parameters will be calibrated to energies and forces of a small dataset,\nconsisting of 4 compressed and stretched configurations of diamond silicon structure.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To start, let's first install the SW model::\n\n $ kim-api-collections-management install user SW_StillingerWeber_1985_Si__MO_405512056662_006\n\n.. seealso::\n This installs the model and its driver into the ``User Collection``. See\n `install_model` for more information about installing KIM models.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from multiprocessing import Pool\n\nimport numpy as np\nfrom corner import corner\n\nfrom kliff.calculators import Calculator\nfrom kliff.dataset import Dataset\nfrom kliff.dataset.weight import MagnitudeInverseWeight\nfrom kliff.loss import Loss\nfrom kliff.models import KIMModel\nfrom kliff.models.parameter_transform import LogParameterTransform\nfrom kliff.uq import MCMC, autocorr, mser, rhat\nfrom kliff.utils import download_dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before running MCMC, we need to define a loss function and train the model. More detail\ninformation about this step can be found in `tut_kim_sw` and\n`tut_params_transform`.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Instantiate a transformation class to do the log parameter transform\nparam_names = [\"A\", \"lambda\"]\nparams_transform = LogParameterTransform(param_names)\n\n# Create the model\nmodel = KIMModel(\n model_name=\"SW_StillingerWeber_1985_Si__MO_405512056662_006\",\n params_transform=params_transform,\n)\n\n# Set the tunable parameters and the initial guess\nopt_params = {\n \"A\": [[\"default\", -8.0, 8.0]],\n \"lambda\": [[\"default\", -8.0, 8.0]],\n}\n\nmodel.set_opt_params(**opt_params)\nmodel.echo_opt_params()\n\n# Get the dataset and set the weights\ndataset_path = download_dataset(dataset_name=\"Si_training_set_4_configs\")\n# Instantiate the weight class\nweight = MagnitudeInverseWeight(\n weight_params={\n \"energy_weight_params\": [0.0, 0.1],\n \"forces_weight_params\": [0.0, 0.1],\n }\n)\n# Read the dataset and compute the weight\ntset = Dataset(dataset_path, weight=weight)\nconfigs = tset.get_configs()\n\n# Create calculator\ncalc = Calculator(model)\nca = calc.create(configs)\n\n# Instantiate the loss function\nresidual_data = {\"normalize_by_natoms\": False}\nloss = Loss(calc, residual_data=residual_data)\n\n# Train the model\nloss.minimize(method=\"L-BFGS-B\", options={\"disp\": True})\nmodel.echo_opt_params()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To perform MCMC simulation, we use :class:`~kliff.uq.MCMC`.This class interfaces with\nptemcee_ Python package to run PTMCMC, which utilizes the affine invariance property\nof MCMC sampling. We simulate MCMC sampling at several different temperatures to\nexplore the effect of the scale of bias and overall error bars.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Define some variables that correspond to the dimensionality of the problem\nntemps = 4 # Number of temperatures to simulate\nndim = calc.get_num_opt_params() # Number of parameters\nnwalkers = 2 * ndim # Number of parallel walkers to simulate" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We start by instantiating :class:`~kliff.uq.MCMC`. This requires :class:`~kliff.loss.Loss`\ninstance to construct the likelihood function. Additionally, we can specify the prior\n(or log-prior to be more precise) via the ``logprior_fn`` argument, with the default\noption be a uniform prior that is bounded over a finite range that we specify via the\n``logprior_args`` argument.\n\n
When user uses the default uniform prior but doesn't specify the bounds, then the\n sampler will retrieve the bounds from the model\n (see :meth:`~kliff.models.KIMModel.set_opt_params`). Note that an error will be\n raised when the uniform prior extends to infinity in any parameter direction.
It has been shown that including temperatures higher than $T_0$ helps the\n convergence of walkers sampled at $T_0$.
As a default, the algorithm will set the number of walkers for each sampling\n temperature to be twice the number of parameters, but we can also specify it via\n the ``nwalkers`` argument.
The initial states $p_0$ need to be an array with shape ``(K, L, N,)``, where\n ``K``, ``L``, and ``N`` are the number of temperatures, walkers, and parameters,\n respectively.
:func:`~kliff.uq.mcmc_utils.mser` only compute the estimation of the burn-in time for\n one single temperature, walker, and parameter. Thus, we need to calculate the burn-in\n time for each temperature, walker, and parameter separately.
:func:`~kliff.uq.mcmc_utils.acorr` is a wrapper for emcee.autocorr.integrated_time_,\n As such, the shape of the input array for this function needs to be ``(L, M, N,)``,\n where ``L``, ``M``, and ``N`` are the number of walkers, steps, and parameters,\n respectively. This also implies that we need to perform the calculation for each\n temperature separately.
:func:`~kliff.uq.mcmc_utils.rhat` only computes the PSRF for one temperature, so that\n the calculation needs to be carried on for each temperature separately.
As an alternative, KLIFF also provides a wrapper to emcee_. This can be accessed by\n setting ``sampler=\"emcee\"`` when instantiating :class:`~kliff.uq.MCMC`. For further\n documentation, see :class:`~kliff.uq.EmceeSampler`.