RDKit Tools#

Tools for using RDKit with AtomArray objects.

atomworks.io.tools.rdkit.BIOTITE_BOND_TYPE_TO_RDKIT: Final[dict[BondType, tuple[BondType, bool]]] = {BondType.ANY: (rdkit.Chem.rdchem.BondType.UNSPECIFIED, False), BondType.SINGLE: (rdkit.Chem.rdchem.BondType.SINGLE, False), BondType.DOUBLE: (rdkit.Chem.rdchem.BondType.DOUBLE, False), BondType.TRIPLE: (rdkit.Chem.rdchem.BondType.TRIPLE, False), BondType.QUADRUPLE: (rdkit.Chem.rdchem.BondType.QUADRUPLE, False), BondType.AROMATIC_SINGLE: (rdkit.Chem.rdchem.BondType.SINGLE, True), BondType.AROMATIC_DOUBLE: (rdkit.Chem.rdchem.BondType.DOUBLE, True), BondType.AROMATIC_TRIPLE: (rdkit.Chem.rdchem.BondType.TRIPLE, True)}#

Mapping from Biotite bond types to RDKit bond types.

Maps (biotite bond type) -> (rdkit bond type, is_aromatic)

atomworks.io.tools.rdkit.CONVERTIBLE_RDKIT_BOND_TYPES: Final[frozenset[BondType]] = frozenset({rdkit.Chem.rdchem.BondType.UNSPECIFIED, rdkit.Chem.rdchem.BondType.SINGLE, rdkit.Chem.rdchem.BondType.DOUBLE, rdkit.Chem.rdchem.BondType.TRIPLE, rdkit.Chem.rdchem.BondType.QUADRUPLE})#

Set of RDKit Bond types that can be converted to Biotite bond types.

class atomworks.io.tools.rdkit.ChEMBLNormalizer[source]#

Bases: object

Normalize an RDKit molecule like the ChEMBL structure pipeline does. This is useful for rescuing molecules that failed to be sanitized by RDKit alone.

References

normalize_in_place(mol: Mol) Mol[source]#
atomworks.io.tools.rdkit.RDKIT_BOND_TYPE_TO_BIOTITE: Final[dict[tuple[BondType, bool], BondType]] = {(rdkit.Chem.rdchem.BondType.UNSPECIFIED, False): BondType.ANY, (rdkit.Chem.rdchem.BondType.SINGLE, False): BondType.SINGLE, (rdkit.Chem.rdchem.BondType.SINGLE, True): BondType.AROMATIC_SINGLE, (rdkit.Chem.rdchem.BondType.DOUBLE, False): BondType.DOUBLE, (rdkit.Chem.rdchem.BondType.DOUBLE, True): BondType.AROMATIC_DOUBLE, (rdkit.Chem.rdchem.BondType.TRIPLE, False): BondType.TRIPLE, (rdkit.Chem.rdchem.BondType.TRIPLE, True): BondType.AROMATIC_TRIPLE, (rdkit.Chem.rdchem.BondType.QUADRUPLE, False): BondType.QUADRUPLE}#

Mapping from RDKit bond types to Biotite bond types. Maps (rdkit bond type, is_aromatic) -> biotite bond type

Unspecified bonds are mapped to ANY bond type.

atomworks.io.tools.rdkit.RDKIT_HYBRIDIZATION_TO_INT: Final[dict[HybridizationType, int]] = {rdkit.Chem.rdchem.HybridizationType.UNSPECIFIED: -1, rdkit.Chem.rdchem.HybridizationType.S: 0, rdkit.Chem.rdchem.HybridizationType.SP: 1, rdkit.Chem.rdchem.HybridizationType.SP2: 2, rdkit.Chem.rdchem.HybridizationType.SP3: 4, rdkit.Chem.rdchem.HybridizationType.SP2D: 3, rdkit.Chem.rdchem.HybridizationType.SP3D: 5, rdkit.Chem.rdchem.HybridizationType.SP3D2: 6, rdkit.Chem.rdchem.HybridizationType.OTHER: 7}#

Mapping from RDKit hybridization types to integers.

Reference: https://www.rdkit.org/docs/cppapi/classRDKit_1_1Atom.html#a58e40e30db6b42826243163175cac976

atomworks.io.tools.rdkit.add_hydrogens(mol: Mol, add_coords: bool = True) Mol[source]#

Add explicit hydrogens to an RDKit molecule.

atomworks.io.tools.rdkit.atom_array_from_rdkit(mol: Mol, *, set_coord_if_available: bool = True, conformer_id: int | None = None, remove_hydrogens: bool = True, remove_inferred_atoms: bool = False) AtomArray[source]#

Convert an RDKit molecule to a Biotite AtomArray object.

This function takes an RDKit Mol object and converts it into a Biotite AtomArray object, matching and preserving the (optional) atom-level annotations in mol._annotations and bond information.

Parameters:
  • mol (-) – The RDKit molecule to convert.

  • set_coord_if_available (-) – Whether to set the coordinates from the RDKit molecule if a conformer is available.

  • conformer_id (-) – The conformer ID to use for coordinates. If None, the first conformer is used.

  • remove_hydrogens (-) – Whether to remove any explicit hydrogen atoms.

  • remove_inferred_atoms (-) – Whether to remove any atoms that do not carry the rdkit_atom_id annotation.

Returns:

  • An AtomArray containing the atoms and bonds from the input Mol object.

Example

>>> mol = Chem.MolFromSmiles("CCO")
>>> atom_array = atom_array_from_rdkit(mol)
>>> print(atom_array)
AtomArray([Atom(element='6', coord=array([0.0, 0.0, 0.0]), ...)])
atomworks.io.tools.rdkit.atom_array_to_rdkit(atom_array: AtomArray, *, set_coord: bool | None = None, hydrogen_policy: Literal['infer', 'remove', 'keep'] = 'keep', annotations_to_keep: list[str] = ('chain_id', 'res_id', 'res_name', 'atom_name', 'hetero', 'element'), sanitize: bool = True, attempt_fixing_corrupted_molecules: bool = True, assume_metal_bonds_are_dative: bool = False) Mol[source]#

Generate an RDKit molecule from a Biotite AtomArray object.

Parameters:
  • atom_array (-) – The Biotite AtomArray to convert.

  • set_coord (-) – Whether to set atomic coordinates in the RDKit molecule. If None, coordinates are only set if they are not NaN.

  • hydrogen_policy (-) – Whether to infer hydrogens in the RDKit molecule.

  • annotations_to_keep (-) – List of atom annotations to preserve from the AtomArray.

  • sanitize (-) – Whether to sanitize the molecule during conversion. Default is True.

  • attempt_fixing_corrupted_molecules (-) – Whether to attempt fixing corrupted molecules during conversion. Default is True.

  • assume_metal_bonds_are_dative (-) – Whether to assume that all bonds with metals are dative bonds. Default is False. WARNING: This messes up RDKit conformer generation.

Returns:

RDKit Molecule generated from the AtomArray.

Return type:

  • rdkit.Chem.Mol

Note

Aromaticity, hybridization states, and other properties are automatically perceived by RDKit’s SanitizeMol during the conversion process.

atomworks.io.tools.rdkit.ccd_code_to_rdkit(ccd_code: str, *, ccd_mirror_path: PathLike | None = None, hydrogen_policy: Literal['infer', 'remove', 'keep'] = 'keep', **atom_array_to_rdkit_kwargs) Mol[source]#

Convert a CCD residue name to an RDKit molecule.

This function retrieves an RDKit molecule corresponding to a given CCD residue name. If ccd_dir is not provided, Biotite’s internal CCD is used. Otherwise, the specified local CCD directory is used.

By default, the function returns the ‘ideal’ conformer from the CCD entry.

Parameters:
  • ccd_code (str) – The CCD code to convert. I.e, ‘ALA’, ‘GLY’, ‘9RH’, etc.

  • ccd_mirror_path (PathLike) – Path to the local CCD directory. If None, Biotite’s internal CCD is used.

  • hydrogen_policy (Literal["infer", "remove", "keep"]) – Whether to infer hydrogens in the RDKit molecule.

  • **atom_array_to_rdkit_kwargs – Additional keyword arguments passed to the atom_array_to_rdkit function.

Returns:

The RDKit molecule corresponding to the given residue name.

Return type:

Mol

atomworks.io.tools.rdkit.change_metal_bonds_to_dative(mol: Mol, *, qualifying_bond_types: set[BondType] = {rdkit.Chem.rdchem.BondType.SINGLE}) Mol[source]#

Change all qualifying bonds to a metal to be dative bonds (coordination bonds). This is useful since most bonds between metals and organic atoms are dative.

Parameters:
  • mol (-) – The input RDKit molecule

  • qualifying_bond_types (-) – The bond types to qualify as dative bonds. By default, only single bonds qualify for conversion.

Returns:

The molecule with metal bonds converted to dative bonds

Return type:

Mol

atomworks.io.tools.rdkit.element_to_atomic_number(element: str) int[source]#

Convert an element string or atomic number to an atomic number.

Parameters:

element (-) – The element symbol (e.g., ‘C’) or atomic number.

Returns:

The atomic number of the element.

Return type:

  • int

Examples

>>> element_to_atomic_number("C")
6
>>> element_to_atomic_number("8")
8
>>> element_to_atomic_number(1)
1
atomworks.io.tools.rdkit.fix_charge_based_on_valence(mol: Mol) Mol[source]#

Attempt to fix the formal charge of an RDKit molecule by making it compatible with its valence state.

atomworks.io.tools.rdkit.fix_mol(mol: Mol, *, attempt_fix_by_calculating_charge_from_valence: bool = True, attempt_fix_by_normalizing_like_chembl: bool = True, attempt_fix_by_normalizing_like_rdkit: bool = True, in_place: bool = True, raise_on_failure: bool = True) Mol[source]#

Fix an RDKit molecule (in-place).

This function attempts to infer aromaticity, valences, implicit hydrogens, and formal charges to result in a molecule that can be successfully sanitized. It does not change the heavy atoms or bonds in the molecule.

# TODO: # - Add sanitization for aromatic systems with incorrect formal charges (that cannot be kekulized) (datamol-io/datamol#231) # - This may be done via Hueckel’s rule (https://en.wikipedia.org/wiki/H%C3%BCckel%27s_rule): Find all aromatic systems, compute Hueckel’s rule, # if the number of pi electrons is not equal to 4*n+2, where n is an integer, then the aromatic system is not valid with the given formal charges. # Try to adjust the formal charges on non-carbon atoms to make the aromatic system valid. # - Add sanifix4 style sanitization (datamol-io/datamol) # - Add attempt to fix valences by changing the bond orders (c.f. datamol-io/datamol) # - Add further ChEMBL style sanitization: chembl/ChEMBL_Structure_Pipeline

References

atomworks.io.tools.rdkit.get_chembl_normalizer() Normalizer[source]#

Cached ChEMBL normalizer.

atomworks.io.tools.rdkit.get_morgan_fingerprint_from_rdkit_mol(mol: Mol, *, radius: int = 2, n_bits: int = 2048) ExplicitBitVect[source]#

Generates the Morgan fingerprint for an RDKit molecule. Useful for calculating Tanimoto similarity between molecules, e.g. for similarity searches.

Default parameters are based on the AF-3 supplementary material:

> We measure ligand Tanimoto similarity using RDKit v.2023_03_3 Morgan fingerprints (radius 2, 2048 bits)

Parameters:
  • mol (-) – The RDKit molecule to generate a fingerprint for.

  • radius (-) – The radius of the fingerprint. Default is 2.

  • n_bits (-) – The number of bits in the fingerprint. Default is 2048.

Returns:

The Morgan fingerprint for the input molecule.

Return type:

  • ExplicitBitVect

References

atomworks.io.tools.rdkit.get_valence_checker() RDKitValidation[source]#

Cached RDKit valence checker.

atomworks.io.tools.rdkit.preserve_annotations(func: Callable[[Mol, ...], Mol]) Callable[[Mol, ...], Mol][source]#

Decorator to copy annotations from an RDKit molecule to a new molecule.

This decorator ensures that any custom annotations stored in the _annotations attribute of an RDKit molecule are preserved when the molecule is modified or converted.

Parameters:

func (-) – The function to be decorated. Must accept an RDKit molecule as positional argument or keyword argument with keyword ‘mol’.

Returns:

The decorated function that preserves annotations.

Return type:

callable

atomworks.io.tools.rdkit.remove_hydrogens(mol: Mol) Mol[source]#

Remove explicit hydrogens from an RDKit molecule.

atomworks.io.tools.rdkit.sdf_to_rdkit(sdf_path_or_buffer: StringIO | PathLike, *, sanitize: bool = True) Mol[source]#

Generate an RDKit molecule from an SDF file or buffer.

Parameters:
  • sdf_path_or_buffer (-) – Either a path to an SDF file or a StringIO buffer containing SDF-formatted molecule data

  • sanitize (-) – Whether to sanitize the molecule during parsing. Default is True.

Returns:

The RDKit molecule generated from the SDF data

Return type:

  • Mol

Raises:
  • - ValueError – If no valid molecule could be parsed from the input

  • - TypeError – If the input is neither a StringIO buffer nor a valid path

atomworks.io.tools.rdkit.smiles_to_rdkit(smiles: str, *, sanitize: bool = True, timeout: int = 5, generate_conformers: bool = True) Mol[source]#

Generate an RDKit molecule from a SMILES string.

This function creates an RDKit molecule object from a SMILES string, performs sanitization, and perceives aromaticity.

Parameters:
  • smiles (-) – The SMILES string representing the molecule.

  • sanitize (-) – Whether to sanitize the molecule.

  • timeout (-) – The timeout for the conformer generation.

  • generate_conformers (-) – Whether to generate and minimize conformers. If False, returns the molecule immediately after SMILES parsing.

Returns:

The RDKit molecule generated from the SMILES string.

Return type:

  • rdkit.Chem.Mol

Note

The returned molecule is sanitized and has aromaticity perceived. If generate_conformers=True, the molecule will also have (implicit) hydrogens added and conformers generated with UFF minimization.