RDKit Tools#
Tools for using RDKit with AtomArray objects.
- atomworks.io.tools.rdkit.BIOTITE_BOND_TYPE_TO_RDKIT: Final[dict[BondType, tuple[BondType, bool]]] = {BondType.ANY: (rdkit.Chem.rdchem.BondType.UNSPECIFIED, False), BondType.SINGLE: (rdkit.Chem.rdchem.BondType.SINGLE, False), BondType.DOUBLE: (rdkit.Chem.rdchem.BondType.DOUBLE, False), BondType.TRIPLE: (rdkit.Chem.rdchem.BondType.TRIPLE, False), BondType.QUADRUPLE: (rdkit.Chem.rdchem.BondType.QUADRUPLE, False), BondType.AROMATIC_SINGLE: (rdkit.Chem.rdchem.BondType.SINGLE, True), BondType.AROMATIC_DOUBLE: (rdkit.Chem.rdchem.BondType.DOUBLE, True), BondType.AROMATIC_TRIPLE: (rdkit.Chem.rdchem.BondType.TRIPLE, True)}#
Mapping from Biotite bond types to RDKit bond types.
Maps (biotite bond type) -> (rdkit bond type, is_aromatic)
- atomworks.io.tools.rdkit.CONVERTIBLE_RDKIT_BOND_TYPES: Final[frozenset[BondType]] = frozenset({rdkit.Chem.rdchem.BondType.UNSPECIFIED, rdkit.Chem.rdchem.BondType.SINGLE, rdkit.Chem.rdchem.BondType.DOUBLE, rdkit.Chem.rdchem.BondType.TRIPLE, rdkit.Chem.rdchem.BondType.QUADRUPLE})#
Set of RDKit Bond types that can be converted to Biotite bond types.
- class atomworks.io.tools.rdkit.ChEMBLNormalizer[source]#
Bases:
object
Normalize an RDKit molecule like the ChEMBL structure pipeline does. This is useful for rescuing molecules that failed to be sanitized by RDKit alone.
References
- atomworks.io.tools.rdkit.RDKIT_BOND_TYPE_TO_BIOTITE: Final[dict[tuple[BondType, bool], BondType]] = {(rdkit.Chem.rdchem.BondType.UNSPECIFIED, False): BondType.ANY, (rdkit.Chem.rdchem.BondType.SINGLE, False): BondType.SINGLE, (rdkit.Chem.rdchem.BondType.SINGLE, True): BondType.AROMATIC_SINGLE, (rdkit.Chem.rdchem.BondType.DOUBLE, False): BondType.DOUBLE, (rdkit.Chem.rdchem.BondType.DOUBLE, True): BondType.AROMATIC_DOUBLE, (rdkit.Chem.rdchem.BondType.TRIPLE, False): BondType.TRIPLE, (rdkit.Chem.rdchem.BondType.TRIPLE, True): BondType.AROMATIC_TRIPLE, (rdkit.Chem.rdchem.BondType.QUADRUPLE, False): BondType.QUADRUPLE}#
Mapping from RDKit bond types to Biotite bond types. Maps (rdkit bond type, is_aromatic) -> biotite bond type
Unspecified bonds are mapped to ANY bond type.
- atomworks.io.tools.rdkit.RDKIT_HYBRIDIZATION_TO_INT: Final[dict[HybridizationType, int]] = {rdkit.Chem.rdchem.HybridizationType.UNSPECIFIED: -1, rdkit.Chem.rdchem.HybridizationType.S: 0, rdkit.Chem.rdchem.HybridizationType.SP: 1, rdkit.Chem.rdchem.HybridizationType.SP2: 2, rdkit.Chem.rdchem.HybridizationType.SP3: 4, rdkit.Chem.rdchem.HybridizationType.SP2D: 3, rdkit.Chem.rdchem.HybridizationType.SP3D: 5, rdkit.Chem.rdchem.HybridizationType.SP3D2: 6, rdkit.Chem.rdchem.HybridizationType.OTHER: 7}#
Mapping from RDKit hybridization types to integers.
Reference: https://www.rdkit.org/docs/cppapi/classRDKit_1_1Atom.html#a58e40e30db6b42826243163175cac976
- atomworks.io.tools.rdkit.add_hydrogens(mol: Mol, add_coords: bool = True) Mol [source]#
Add explicit hydrogens to an RDKit molecule.
- atomworks.io.tools.rdkit.atom_array_from_rdkit(mol: Mol, *, set_coord_if_available: bool = True, conformer_id: int | None = None, remove_hydrogens: bool = True, remove_inferred_atoms: bool = False) AtomArray [source]#
Convert an RDKit molecule to a Biotite AtomArray object.
This function takes an RDKit Mol object and converts it into a Biotite AtomArray object, matching and preserving the (optional) atom-level annotations in mol._annotations and bond information.
- Parameters:
mol (-) – The RDKit molecule to convert.
set_coord_if_available (-) – Whether to set the coordinates from the RDKit molecule if a conformer is available.
conformer_id (-) – The conformer ID to use for coordinates. If None, the first conformer is used.
remove_hydrogens (-) – Whether to remove any explicit hydrogen atoms.
remove_inferred_atoms (-) – Whether to remove any atoms that do not carry the rdkit_atom_id annotation.
- Returns:
An AtomArray containing the atoms and bonds from the input Mol object.
Example
>>> mol = Chem.MolFromSmiles("CCO") >>> atom_array = atom_array_from_rdkit(mol) >>> print(atom_array) AtomArray([Atom(element='6', coord=array([0.0, 0.0, 0.0]), ...)])
- atomworks.io.tools.rdkit.atom_array_to_rdkit(atom_array: AtomArray, *, set_coord: bool | None = None, hydrogen_policy: Literal['infer', 'remove', 'keep'] = 'keep', annotations_to_keep: list[str] = ('chain_id', 'res_id', 'res_name', 'atom_name', 'hetero', 'element'), sanitize: bool = True, attempt_fixing_corrupted_molecules: bool = True, assume_metal_bonds_are_dative: bool = False) Mol [source]#
Generate an RDKit molecule from a Biotite AtomArray object.
- Parameters:
atom_array (-) – The Biotite AtomArray to convert.
set_coord (-) – Whether to set atomic coordinates in the RDKit molecule. If None, coordinates are only set if they are not NaN.
hydrogen_policy (-) – Whether to infer hydrogens in the RDKit molecule.
annotations_to_keep (-) – List of atom annotations to preserve from the AtomArray.
sanitize (-) – Whether to sanitize the molecule during conversion. Default is True.
attempt_fixing_corrupted_molecules (-) – Whether to attempt fixing corrupted molecules during conversion. Default is True.
assume_metal_bonds_are_dative (-) – Whether to assume that all bonds with metals are dative bonds. Default is False. WARNING: This messes up RDKit conformer generation.
- Returns:
RDKit Molecule generated from the AtomArray.
- Return type:
rdkit.Chem.Mol
Note
Aromaticity, hybridization states, and other properties are automatically perceived by RDKit’s SanitizeMol during the conversion process.
- atomworks.io.tools.rdkit.ccd_code_to_rdkit(ccd_code: str, *, ccd_mirror_path: PathLike | None = None, hydrogen_policy: Literal['infer', 'remove', 'keep'] = 'keep', **atom_array_to_rdkit_kwargs) Mol [source]#
Convert a CCD residue name to an RDKit molecule.
This function retrieves an RDKit molecule corresponding to a given CCD residue name. If ccd_dir is not provided, Biotite’s internal CCD is used. Otherwise, the specified local CCD directory is used.
By default, the function returns the ‘ideal’ conformer from the CCD entry.
- Parameters:
ccd_code (str) – The CCD code to convert. I.e, ‘ALA’, ‘GLY’, ‘9RH’, etc.
ccd_mirror_path (PathLike) – Path to the local CCD directory. If None, Biotite’s internal CCD is used.
hydrogen_policy (Literal["infer", "remove", "keep"]) – Whether to infer hydrogens in the RDKit molecule.
**atom_array_to_rdkit_kwargs – Additional keyword arguments passed to the atom_array_to_rdkit function.
- Returns:
The RDKit molecule corresponding to the given residue name.
- Return type:
Mol
- atomworks.io.tools.rdkit.change_metal_bonds_to_dative(mol: Mol, *, qualifying_bond_types: set[BondType] = {rdkit.Chem.rdchem.BondType.SINGLE}) Mol [source]#
Change all qualifying bonds to a metal to be dative bonds (coordination bonds). This is useful since most bonds between metals and organic atoms are dative.
- Parameters:
mol (-) – The input RDKit molecule
qualifying_bond_types (-) – The bond types to qualify as dative bonds. By default, only single bonds qualify for conversion.
- Returns:
The molecule with metal bonds converted to dative bonds
- Return type:
Mol
- atomworks.io.tools.rdkit.element_to_atomic_number(element: str) int [source]#
Convert an element string or atomic number to an atomic number.
- Parameters:
element (-) – The element symbol (e.g., ‘C’) or atomic number.
- Returns:
The atomic number of the element.
- Return type:
int
Examples
>>> element_to_atomic_number("C") 6 >>> element_to_atomic_number("8") 8 >>> element_to_atomic_number(1) 1
- atomworks.io.tools.rdkit.fix_charge_based_on_valence(mol: Mol) Mol [source]#
Attempt to fix the formal charge of an RDKit molecule by making it compatible with its valence state.
- atomworks.io.tools.rdkit.fix_mol(mol: Mol, *, attempt_fix_by_calculating_charge_from_valence: bool = True, attempt_fix_by_normalizing_like_chembl: bool = True, attempt_fix_by_normalizing_like_rdkit: bool = True, in_place: bool = True, raise_on_failure: bool = True) Mol [source]#
Fix an RDKit molecule (in-place).
This function attempts to infer aromaticity, valences, implicit hydrogens, and formal charges to result in a molecule that can be successfully sanitized. It does not change the heavy atoms or bonds in the molecule.
# TODO: # - Add sanitization for aromatic systems with incorrect formal charges (that cannot be kekulized) (datamol-io/datamol#231) # - This may be done via Hueckel’s rule (https://en.wikipedia.org/wiki/H%C3%BCckel%27s_rule): Find all aromatic systems, compute Hueckel’s rule, # if the number of pi electrons is not equal to 4*n+2, where n is an integer, then the aromatic system is not valid with the given formal charges. # Try to adjust the formal charges on non-carbon atoms to make the aromatic system valid. # - Add sanifix4 style sanitization (datamol-io/datamol) # - Add attempt to fix valences by changing the bond orders (c.f. datamol-io/datamol) # - Add further ChEMBL style sanitization: chembl/ChEMBL_Structure_Pipeline
References
- atomworks.io.tools.rdkit.get_morgan_fingerprint_from_rdkit_mol(mol: Mol, *, radius: int = 2, n_bits: int = 2048) ExplicitBitVect [source]#
Generates the Morgan fingerprint for an RDKit molecule. Useful for calculating Tanimoto similarity between molecules, e.g. for similarity searches.
- Default parameters are based on the AF-3 supplementary material:
> We measure ligand Tanimoto similarity using RDKit v.2023_03_3 Morgan fingerprints (radius 2, 2048 bits)
- Parameters:
mol (-) – The RDKit molecule to generate a fingerprint for.
radius (-) – The radius of the fingerprint. Default is 2.
n_bits (-) – The number of bits in the fingerprint. Default is 2048.
- Returns:
The Morgan fingerprint for the input molecule.
- Return type:
ExplicitBitVect
References
- atomworks.io.tools.rdkit.get_valence_checker() RDKitValidation [source]#
Cached RDKit valence checker.
- atomworks.io.tools.rdkit.preserve_annotations(func: Callable[[Mol, ...], Mol]) Callable[[Mol, ...], Mol] [source]#
Decorator to copy annotations from an RDKit molecule to a new molecule.
This decorator ensures that any custom annotations stored in the _annotations attribute of an RDKit molecule are preserved when the molecule is modified or converted.
- Parameters:
func (-) – The function to be decorated. Must accept an RDKit molecule as positional argument or keyword argument with keyword ‘mol’.
- Returns:
The decorated function that preserves annotations.
- Return type:
callable
- atomworks.io.tools.rdkit.remove_hydrogens(mol: Mol) Mol [source]#
Remove explicit hydrogens from an RDKit molecule.
- atomworks.io.tools.rdkit.sdf_to_rdkit(sdf_path_or_buffer: StringIO | PathLike, *, sanitize: bool = True) Mol [source]#
Generate an RDKit molecule from an SDF file or buffer.
- Parameters:
sdf_path_or_buffer (-) – Either a path to an SDF file or a StringIO buffer containing SDF-formatted molecule data
sanitize (-) – Whether to sanitize the molecule during parsing. Default is True.
- Returns:
The RDKit molecule generated from the SDF data
- Return type:
Mol
- Raises:
- ValueError – If no valid molecule could be parsed from the input
- TypeError – If the input is neither a StringIO buffer nor a valid path
- atomworks.io.tools.rdkit.smiles_to_rdkit(smiles: str, *, sanitize: bool = True, timeout: int = 5, generate_conformers: bool = True) Mol [source]#
Generate an RDKit molecule from a SMILES string.
This function creates an RDKit molecule object from a SMILES string, performs sanitization, and perceives aromaticity.
- Parameters:
smiles (-) – The SMILES string representing the molecule.
sanitize (-) – Whether to sanitize the molecule.
timeout (-) – The timeout for the conformer generation.
generate_conformers (-) – Whether to generate and minimize conformers. If False, returns the molecule immediately after SMILES parsing.
- Returns:
The RDKit molecule generated from the SMILES string.
- Return type:
rdkit.Chem.Mol
Note
The returned molecule is sanitized and has aromaticity perceived. If generate_conformers=True, the molecule will also have (implicit) hydrogens added and conformers generated with UFF minimization.