kliff.trainer.base_trainer¶

class kliff.trainer.base_trainer.Trainer(training_manifest, model=None)[source]¶

Base class for all trainers.

This class is the base class for all trainers. It provides the basic structure for training a model. The derived classes should implement the required methods. This class will provide the basic functionality for training, such as setting up the work directory, saving the configuration, and setting up the indices for training and validation datasets. It will save hashes of the configuration fingerprints and training configuration to the work directory. This would ensure reproducibility of the training process, and easy restarting.

The core trainer class will provide the following functionality: - Set up the work directory - Set up the dataset - Set up the test train split Model, parameter transform and optimizer setup are left for the derived classes to implement.

env variables:: KLIFF_LMDB_MAP_SIZE: lmdb mmap size, defaults to 1e12

parse_manifest(manifest)[source]¶

It accepts the raw manifest dictionary, and processes it to the formatted manifest. This includes mapping the string fields to enums, and setting sane defaults for missing fields.

Parameters:: manifest (dict) – raw incoming configuration
Returns:: Processed manifest

config_to_dict()[source]¶: Convert the configuration to a dictionary.

classmethod from_file(filename)[source]¶

Load the manifest from a YAML file.

Parameters:: filename (Path) – name of the yaml file
Returns:: Trainer instance

get_trainer_hash()[source]¶: Get the hash of the current configuration. It will be used to create a unique directory for the current run. It will be the hash of the configuration dictionary string.

initialize()[source]¶: Initialize the trainer. Assigns the configuration objects, and call setup methods.

seed_all()[source]¶: Seed all the random number generators.

setup_workspace()[source]¶: Check all the existing runs in the root directory and see if it finished the run or not. If it is finished, it will start a new run. If it is not finished, it will resume the training. If the resume is not requested, it will start a new run.

setup_dataset()[source]¶: Set up the dataset based on the provided information. If the per atom prediction logging is requested, it will also assign a sequential index to each configuration for logging. TODO: ColabFit integration for extreme scale datasets.

save_config()[source]¶: Hash and save the configuration to the current run directory.

setup_dataset_transforms()[source]¶: Set up the dataset transforms based on the provided information. If the transforms are not provided, it will raise an error. If the transform is of type ASE, it will be loaded from the ASE library. If the transform is of type KLIFF, it will be loaded from the KLIFF library. Left for the derived classes to implement.

setup_model()[source]¶: Set up the model based on the provided information. If the model is not provided, it will be loaded from the model_path. If the model_path is not provided, it will raise an error. If the model_type is KIM, it will be loaded from the KIM model repository. If KIM type model is installed in CWD, it will be loaded from there, and model_path will be set to the KIM CWD. If model is of type TAR, it will be untarred and model_path will be set to the untarred directory. Left for the derived classes to implement.

setup_parameter_transforms()[source]¶: This method set up the transformed parameter space for models. It can be used for any model type in general, but as there exists a significant difference between how models handles their parameters, it is left for the subclass to implement. Although to ensure that initialize function remains consistent this method will not raise NotImplemented error, rather it will quietly pass. So be aware.

setup_optimizer()[source]¶: Set up the optimizer based on the provided information. If the optimizer is not provided, it will be loaded from the optimizer_name. If the optimizer_name is not provided, it will raise an error. If the optimizer_provider is scipy, it will be loaded from the scipy.optimize. If the optimizer_provider is torch, it will be loaded from the torch.optim. Left for the derived classes to implement.

setup_dataset_split()[source]¶

Simple test train split for now, will have more options like stratification: in the future.

log_per_atom_outputs(epoch, idx, predictions)[source]¶

Log the per atom outputs to the database. It saves dictionary of predictions and n_atoms for each configuration. The key for predictions is pred_{n}, where n is the index of the prediction. For more than one prediction, it will save pred_0, pred_1, pred_2, etc. The key for the indices is idx

Parameters:

epoch (int) – Current epoch
idx (Union[List[int], ndarray]) – List of indices of the configurations
predictions (List[ndarray]) – List of predictions for the configurations

loss(*args, **kwargs)[source]¶

checkpoint(*args, **kwargs)[source]¶

train_step(*args, **kwargs)[source]¶

validation_step(*args, **kwargs)[source]¶

get_optimizer(*args, **kwargs)[source]¶

train(*args, **kwargs)[source]¶

save_kim_model(*args, **kwargs)[source]¶

write_training_env_edn(path)[source]¶: Generate the training_env.edn file for the KIM API. This file will be used to accurately determine the training environment . The file will be saved in the current run directory. It saves the hash of the configuration, and list of all python dependencies from pip freeze.

exception kliff.trainer.base_trainer.TrainerError(message)[source]¶: Exceptions to be raised in Trainer and associated classes.