kliff.dataset#

class kliff.dataset.Configuration(cell, species, coords, PBC, energy=None, forces=None, stress=None, weight=None, identifier=None)[source]#

Class of atomic configuration. This is used to store the information of an atomic configuration, e.g. supercell, species, coords, energy, and forces.

Parameters
  • cell (ndarray) – A 3x3 matrix of the lattice vectors. The first, second, and third rows are a_1, a_2, and a_3, respectively.

  • species (List[str]) – A list of N strings giving the species of the atoms, where N is the number of atoms.

  • coords (ndarray) – A Nx3 matrix of the coordinates of the atoms, where N is the number of atoms.

  • PBC (List[bool]) – A list with 3 components indicating whether periodic boundary condition is used along the directions of the first, second, and third lattice vectors.

  • energy (Optional[float]) – energy of the configuration.

  • forces (Optional[ndarray]) – A Nx3 matrix of the forces on atoms, where N is the number of atoms.

  • stress (Optional[List[float]]) – A list with 6 components in Voigt notation, i.e. it returns \sigma=[\sigma_{xx},\sigma_{yy},\sigma_{zz},\sigma_{yz},\sigma_{xz},
\sigma_{xy}]. See: https://en.wikipedia.org/wiki/Voigt_notation

  • weight (Optional[Weight]) – an instance that computes the weight of the configuration in the loss function.

  • identifier (Union[str, Path, None]) – a (unique) identifier of the configuration

classmethod from_file(filename, weight=None, file_format='xyz')[source]#

Read configuration from file.

Parameters
  • filename (Path) – Path to the file that stores the configuration.

  • file_format (str) – Format of the file that stores the configuration (e.g. xyz).

to_file(filename, file_format='xyz')[source]#

Write the configuration to file.

Parameters
  • filename (Path) – Path to the file that stores the configuration.

  • file_format (str) – Format of the file that stores the configuration (e.g. xyz).

property cell: ndarray#

3x3 matrix of the lattice vectors of the configurations.

Return type

ndarray

property PBC: List[bool]#

A list with 3 components indicating whether periodic boundary condition is used along the directions of the first, second, and third lattice vectors.

Return type

List[bool]

property species: List[str]#

Species string of all atoms.

Return type

List[str]

property coords: ndarray#

A Nx3 matrix of the Cartesian coordinates of all atoms.

Return type

ndarray

property energy: Optional[float]#

Potential energy of the configuration.

Return type

Optional[float]

property forces: ndarray#

Return a Nx3 matrix of the forces on each atoms.

Return type

ndarray

property stress: List[float]#

Stress of the configuration. The stress is given in Voigt notation i.e \sigma=[\sigma_{xx},\sigma_{yy},\sigma_{zz},\sigma_{yz},\sigma_{xz},
\sigma_{xy}].

Return type

List[float]

property weight#

Get the weight class of the loss function.

property identifier: str#

Return identifier of the configuration.

Return type

str

property path: Optional[Path]#

Return the path of the file containing the configuration. If the configuration is not read from a file, return None.

Return type

Optional[Path]

get_num_atoms()[source]#

Return the total number of atoms in the configuration.

Return type

int

get_num_atoms_by_species()[source]#

Return a dictionary of the number of atoms with each species.

Return type

Dict[str, int]

get_volume()[source]#

Return volume of the configuration.

Return type

float

count_atoms_by_species(symbols=None)[source]#

Count the number of atoms by species.

Parameters

symbols (Optional[List[str]]) – species to count the occurrence. If None, all species present in the configuration are used.

Returns

with key the species string, and value the number of

atoms with each species.

Return type

{specie, count}

order_by_species()[source]#

Order the atoms according to the species such that atoms with the same species have contiguous indices.

class kliff.dataset.Dataset(path=None, weight=None, file_format='xyz')[source]#

A dataset of multiple configurations (Configuration).

Parameters
  • path (Optional[Path]) – Path of a file storing a configuration or filename to a directory containing multiple files. If given a directory, all the files in this directory and its subdirectories with the extension corresponding to the specified file_format will be read.

  • weight (Optional[Weight]) – an instance that computes the weight of the configuration in the loss function.

  • file_format – Format of the file that stores the configuration, e.g. xyz.

add_configs(path, weight=None)[source]#

Read configurations from filename and added them to the existing set of configurations. This is a convenience function to read configurations from different directory on disk.

Parameters
  • path (Path) – Path the directory (or filename) storing the configurations.

  • weight (Optional[Weight]) – an instance that computes the weight of the configuration in the loss function.

get_configs()[source]#

Get the configurations.

Return type

List[Configuration]

get_num_configs()[source]#

Return the number of configurations in the dataset.

Return type

int

kliff.dataset.read_extxyz(filename)[source]#

Read atomic configuration stored in extended xyz file_format.

Parameters

filename (Path) – filename to the extended xyz file

Returns

3x3 array, supercell lattice vectors species: species of atoms coords: Nx3 array, coordinates of atoms PBC: periodic boundary conditions energy: potential energy of the configuration; None if not provided in file forces: Nx3 array, forces on atoms; None if not provided in file stress: 1D array of size 6, stress on the cell in Voigt notation; None if not

provided in file

Return type

cell

kliff.dataset.write_extxyz(filename, cell, species, coords, PBC, energy=None, forces=None, stress=None)[source]#

Write configuration info to a file in extended xyz file_format.

Parameters
  • filename (Path) – filename to the extended xyz file

  • cell (ndarray) – 3x3 array, supercell lattice vectors

  • species (List[str]) – species of atoms

  • coords (ndarray) – Nx3 array, coordinates of atoms

  • PBC (List[bool]) – periodic boundary conditions

  • energy (Optional[float]) – potential energy of the configuration; If None, not write to file

  • forces (Optional[ndarray]) – Nx3 array, forces on atoms; If None, not write to file

  • stress (Optional[List[float]]) – 1D array of size 6, stress on the cell in Voigt notation; If None, not write to file