Transforms ========== :py:mod:`~kliff.transforms` is a collection of commonly used functions, used to change or transform, the datasets/parameters. Transforms module is divided as, - Coordinate transforms: Mapping the coordinates of a configuration to invariant representations, which can be used in ML models. - Descriptors - Radial Graphs - Properties: Transform properties associated with the configurations. Often it takes input as a complete dataset, and aggregate statistics of property of entire dataset before transformations like normalization - Parameters ( only available for the physic based models for now ) Transform the parameter space for enabling better sampling/training. Configuration Transforms ------------------------ Descriptor ~~~~~~~~~~ The ``Descriptors`` module bridges the `libdescriptor `__ library with KLIFF’s data structures (i.e., ``Configuration``, ``NeighborList``). It provides: - ``show_available_descriptors()``: A helper function that prints all descriptor names. - ``Descriptor``: - Takes a ``cutoff``, ``species``, ``descriptor name``, and ``hyperparameters``. - Computes descriptors (``forward``) and their derivatives w.r.t. atomic coordinates (``backward``). - Can store results directly in the ``Configuration`` object’s fingerprint. - ``default_hyperparams``: Module containing collection of sane defaults for different descriptors .. tip:: This module relies on the optional dependency ``libdescriptor``. .. code-block:: bash conda install -c ipcamit libdescriptor .. code-block:: python from kliff.transforms.configuration_transforms.descriptors import show_available_descriptors show_available_descriptors() .. parsed-literal:: -------------------------------------------------------------------------------- Descriptors below are currently available, select them by `descriptor: str` attribute: -------------------------------------------------------------------------------- SymmetryFunctions Bispectrum SOAP .. code-block:: python from kliff.transforms.configuration_transforms.descriptors import Descriptor from kliff.transforms.configuration_transforms.default_hyperparams import symmetry_functions_set30 desc = Descriptor(cutoff=3.77, species=["Si"], descriptor="SymmetryFunctions", hyperparameters=symmetry_functions_set30()) This ``Descriptor`` module is designed to work as a thin wrapper over ``libdescriptor`` library, and provides ``forward`` and ``backward`` function for computing the descriptors, and their vector-Jacobian products for gradient. Given below is a brief overview of how typical ML potential evaluates forces, and how it is achieved in KLIFF. Theory of ML with descriptors ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Descriptors (:math:`\zeta`) are used in machine learning to transform raw input features (:math:`\mathbf{\zeta}`) into a higher-dimensional representation that captures more complex patterns and relationships. This transformation is particularly useful in various applications, including molecular dynamics, material science, and geometric deep learning. Forward Pass ^^^^^^^^^^^^ 1. Descriptor Calculation - The input features :math:`x` (e.g., atomic coordinates, molecular structures) are mapped to a higher-dimensional space using a function :math:`F`. - The output of this mapping is the descriptor :math:`\mathbf{\zeta}`: .. math:: \mathbf{\zeta} = F(\mathbf{x}) 2. Model Prediction: - The descriptor :math:`\zeta` is then used as input to a machine learning model (e.g., neural network) to make predictions: .. math:: y = \text{ML Model}(\mathbf{\zeta}) Backward Pass ''''''''''''' 1. Loss Calculation: - A loss function measures the difference between the model’s predictions and the ground truth: .. math:: \mathcal{L} = \text{Loss}(y, \text{ground truth}) 2. Derivative of Loss with Respect to Descriptors: - During backpropagation, the first step is to compute the derivative of the loss with respect to the descriptors: .. math:: \frac{\partial \mathcal{L}}{\partial \mathbf{\zeta}} = \nabla_\mathbf{\zeta} \mathcal{L} 3. Vector-Jacobian Product: - The next step is to compute the derivative of the descriptors with respect to the input coordinates :math:`\mathbf{x}`. This is represented by the Jacobian matrix: .. math:: J = \frac{\partial \mathbf{\zeta}}{\partial \mathbf{x}} = \nabla_x F(x) - To efficiently compute the gradient of the loss with respect to the input :math:`\mathbf{x}`, we use the vector-Jacobian product: .. math:: \frac{\partial \mathcal{L}}{\partial \mathbf{x}} = J \cdot \frac{\partial \mathcal{L}}{\partial \mathbf{\zeta}} 4. Gradient Flow: - The gradients are then used to update the model parameters during optimization (e.g., gradient descent): .. math:: \text{Parameters} \leftarrow \text{Parameters} - \eta \frac{\partial \mathcal{L}}{\partial x} where :math:`\eta` is the learning rate. Forces ^^^^^^ Forces for an ML model can be evaluated similary .. math:: \mathbf{\mathcal{F}} = - \frac{\partial E}{\partial \mathbf{\zeta}} \cdot \frac{\partial \mathbf{\zeta}}{\partial \mathbf{x}} See example below. KLIFF Descriptor ``backward`` and ``forward`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python # generate Si configuration from ase.build import bulk from kliff.dataset import Configuration import numpy as np Si_diamond = bulk("Si", a=5.44) Si_config = Configuration.from_ase_atoms(Si_diamond) # FORWARD: generating the descriptor $\zeta$ zeta = desc.forward(Si_config) # BACKWARD: vector-jacobian product against arbitrary vector (\partial L/\partial \zeta) dE_dZeta = np.random.random(zeta.shape) forces = - desc.backward(Si_config, dE_dZeta=dE_dZeta) print(forces) .. parsed-literal:: [[-0. -0. -0.] [-0. -0. -0.]] Radial Graphs ~~~~~~~~~~~~~ Similarly users can also generate radial graphs for graph neural networks. .. code-block:: python from kliff.transforms.configuration_transforms.graphs import RadialGraph graph_generator = RadialGraph(species=["Si"], cutoff=3.77, n_layers=1) # dummy energy, needed for eval Si_config._energy = 0.0 Si_config._forces = np.zeros_like(Si_config.coords) print(graph_generator.forward(Si_config)) .. parsed-literal:: PyGGraph(energy=0.0, forces=[2, 3], n_layers=1, coords=[54, 3], images=[54], species=[54], z=[54], cell=[9], contributions=[54], num_nodes=54, idx=-1, edge_index0=[2, 14]) Due to overloaded ``__call__`` method, you can also use the the RadialGraph module as a "function call". .. code-block:: python graph = graph_generator(Si_config) print(graph.keys()) .. parsed-literal:: ['cell', 'coords', 'energy', 'contributions', 'images', 'z', 'species', 'idx', 'forces', 'num_nodes', 'edge_index0', 'n_layers', 'shifts'] The meaning of these keys are defined below: +---------------------------------------+------------------------------+ | Parameter | Description | +=======================================+==============================+ | ``cell`` | The simulation cell | | | dimensions, typically a 3×3 | | | tensor representing the | | | periodic boundary conditions | | | (PBC). | +---------------------------------------+------------------------------+ | ``coords`` | Cartesian coordinates of the | | | atomic positions in the | | | structure. | +---------------------------------------+------------------------------+ | ``energy`` | Total energy of the system, | | | used as a target property in | | | training. | +---------------------------------------+------------------------------+ | ``contributions`` | Energy contributions from | | | individual atoms or | | | interactions (optional, | | | depending on model), | | | equivalent to batch index | +---------------------------------------+------------------------------+ | ``images`` | mapping from ghost atom | | | number to actual atom index | | | (for summing up forces) | +---------------------------------------+------------------------------+ | ``z`` | Atomic numbers of the | | | elements in the structure, | | | serving as node features. | +---------------------------------------+------------------------------+ | ``species`` | unique indexes for each | | | species of atom present | | | (from 0 to total number of | | | species present, i.e. for | | | H2O, ``species`` go from 0 | | | to 1, with H mapped to 0 and | | | O mapped to 1). | +---------------------------------------+------------------------------+ | ``idx`` | Internal index of the | | | configuration or dataset, | | | set to -1 as default. | +---------------------------------------+------------------------------+ | ``forces`` | Forces acting on each atom, | | | often used as labels in | | | force-predicting models (for | | | contributing atoms). | +---------------------------------------+------------------------------+ | ``num_nodes`` | Number of nodes (atoms) in | | | the graph representation of | | | the structure (including | | | contributing and | | | non-contributing atoms). | +---------------------------------------+------------------------------+ | ``edge_index{0 - n}`` | Connectivity information | | | (edges) in COO like format, | | | defining which atoms are | | | connected in the graph (2 x | | | N matrix). The storage | | | format is “staged graph” | | | where graph needed for each | | | convolution step | | | (``n = n_layers - 1``) gets | | | a corresponding edge index. | +---------------------------------------+------------------------------+ | ``n_layers`` | Number of layers in the | | | generated staged graph. | +---------------------------------------+------------------------------+ | ``shifts`` | vectors to add in the | | | position vectors of the | | | destination edge atom to get | | | correct vector in minimum | | | image convention like PBC. | | | When ``mic=False`` this | | | defaults to al zeros. | +---------------------------------------+------------------------------+ Minimum image convention vs staged graphs ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``RadialGraph`` module provides functionality to both generate the staged graphs for parallel convolution [see theory section] and conventional edge graphs for more efficient training. It be accessed via ``mic`` keyword argument in the module (default value ``False``). Major difference between the two is that staged graphs explicitly maps the ghost atoms, whereas the mic graph indexes the periodic images using a ``shifts`` vector. Example below highlights the difference between the two approach .. code-block::python from kliff.transforms.configuration_transforms.graphs import RadialGraph from ase.build import bulk from kliff.dataset import Configuration from ase.calculators.singlepoint import SinglePointCalculator import torch # RadialGraph is native torch module import numpy as np # Let us create a simple bulk Si configuration atoms = bulk("Si") calc = SinglePointCalculator(atoms, energy=0.0, forces=np.zeros_like(atoms.get_global_number_of_atoms())) atoms.calc = calc config = Configuration.from_ase_atoms(atoms) staged_graph_module = RadialGraph(species=["Si"], cutoff=3.77, n_layers=2) # to enable MIC, give mic=True, and omit the n_layers argument mic_graph_module = RadialGraph(species=["Si"], cutoff=3.77, mic=True) staged_graph = staged_graph_module(config) mic_graph = mic_graph_module(config) The major difference between the two graphs is now that staged graphs always uses non-periodic atoms, and contains one graph per convolution. Therefore it contains unique edge-pairs for each edge. MIC graph only uses the original periodic atom image and hence contains one graph duplicate edges, but unique shift vectors. .. code-block::python print(staged_graph.edge_index0) print(staged_graph.edge_index1) print(staged_graph.shifts[0:3,:]) .. parsed-literal:: tensor([[ 0, 0, 0, 0, 1, 1, 1, 1, 77, 117, 125, 126, 134, 174], [ 1, 77, 117, 125, 0, 126, 134, 174, 0, 0, 0, 1, 1, 1]]) tensor([[ 0, 0, 0, 0, 1, 1, 1, 1, 76, 77, 77, 77, 77, 78, 79, 86, 87, 116, 117, 117, 117, 117, 118, 119, 124, 125, 125, 125, 125, 126, 126, 126, 126, 127, 132, 133, 134, 134, 134, 134, 135, 164, 165, 172, 173, 174, 174, 174, 174, 175], [ 1, 77, 117, 125, 0, 126, 134, 174, 77, 0, 76, 78, 86, 77, 126, 77, 134, 117, 0, 116, 118, 164, 117, 126, 125, 0, 124, 132, 172, 1, 79, 119, 127, 126, 125, 134, 1, 87, 133, 135, 134, 117, 174, 125, 174, 1, 165, 173, 175, 174]]) tensor([[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]], dtype=torch.float64) .. code-block:: python print(mic_graph.edge_index0) print(mic_graph.edge_index1) print(mic_graph.shifts[0:3,:]) .. parsed-literal:: tensor([[0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0], [1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1]]) KeyError: 'edge_index1' tensor([[ 0.0000, 0.0000, 0.0000], [ 0.0000, -2.7150, -2.7150], [-2.7150, 0.0000, -2.7150]], dtype=torch.float64) However these differences are more cosmetic, and both graphs will yield equivalent results if used correctly: .. code-block:: python pos_vectors_ghost = staged_graph.coords[staged_graph.edge_index0[1]] - staged_graph.coords[staged_graph.edge_index0[0]] pos_vectors_mic_incorrect = mic_graph.coords[mic_graph.edge_index0[1]] - mic_graph.coords[mic_graph.edge_index0[0]] print("Naive vectors:", torch.allclose(pos_vectors_ghost, pos_vectors_mic_incorrect) print("Shift compensated vectors:", torch.allclose(pos_vectors_ghost, pos_vectors_mic_incorrect + mic_graph.shifts)) # add shift vectors to get correct result .. parsed-literal:: Naive vectors: False Shift compensated vectors: True .. tip:: Rule of thumb is that MIC graphs are better at training for their efficiency, staged graphs are better at large scale simulations due to their parallelism and domain decomposition.