CCD Utilities#

atomworks.io.utils.ccd.aa_chem_comps() → frozenset[str][source]#: Set of amino acid chemical components. E.g. {‘ALA’, ‘ARG’, …}

atomworks.io.utils.ccd.atom_array_from_ccd_code(ccd_code: str, ccd_mirror_path: PathLike = None, **parse_ccd_cif_kwargs) → AtomArray[source]#

Retrieves and parses a component from the Chemical Component Dictionary.

First attempts to retrieve the component from a local mirror if provided and the code exists there. Falls back to Biotite’s built-in CCD if the code is not found in the local mirror or no mirror path is provided.

Parameters:

ccd_code – The three-letter code of the chemical component.
ccd_mirror_path – Path to the root of the CCD mirror directory.
**parse_ccd_cif_kwargs –
Additional keyword arguments passed to parse_ccd_cif(): coords: Type of coordinates to use (“model”, “ideal_pdbx”, “ideal_rdkit”, or None).

Defaults to “ideal_pdbx”.

add_properties: Whether to include RDKit-computed properties. Defaults to True. add_mapping: Whether to include external resource mappings. Defaults to False.

Returns:

The parsed atomic structure of the requested component.

Return type:

struc.AtomArray

Raises:

ValueError – If the CCD code is not found in the local mirror or Biotite’s built-in CCD.

Example

>>> atom_array = atom_array_from_ccd_code("ALA")

atomworks.io.utils.ccd.check_ccd_codes_are_available(ccd_codes: Iterable[str], ccd_mirror_path: PathLike = None, mode: Literal['warn', 'raise'] = 'warn') → bool[source]#: Checks if the provided CCD codes are available in the local mirror.

atomworks.io.utils.ccd.chem_comp_to_one_letter() → dict[str, str][source]#

Dictionary mapping the chemical components to their 1-letter code.

NOTE: Chemical components historically used to be 3-letter codes, but nowadays longer codes exist.

atomworks.io.utils.ccd.dna_chem_comps() → frozenset[str][source]#: Set of DNA chemical components. E.g. {‘DA’, ‘DC’, …}

atomworks.io.utils.ccd.get_available_ccd_codes(ccd_mirror_path: PathLike | None = None) → frozenset[str][source]#

Returns a frozenset of all CCD codes available.

If a mirror path is provided, it will be used to check the local mirror first. Otherwise, Biotite’s built-in CCD will be used.

atomworks.io.utils.ccd.get_available_ccd_codes_in_biotite() → frozenset[str][source]#: Set of all CCD codes available in Biotite’s built-in Chemical Component Dictionary.

atomworks.io.utils.ccd.get_available_ccd_codes_in_mirror(ccd_mirror_path: PathLike = None) → frozenset[str][source]#

Set of all CCD codes available in the local mirror.

Only counts codes when they adhere to the CCD mirror layout (e.g. …/H/HEM/HEM.cif)

atomworks.io.utils.ccd.get_ccd_component_from_biotite(ccd_code: str, **parse_ccd_cif_kwargs) → AtomArray[source]#

Retrieves a component from the Chemical Component Dictionary using Biotite’s built-in functionality.

Parameters:

ccd_code (-) – The three-letter code of the chemical component to retrieve.

Returns:

The atomic structure of the requested component.

Return type:

AtomArray

atomworks.io.utils.ccd.get_ccd_component_from_mirror(ccd_code: str, ccd_mirror_path: PathLike = None, **parse_ccd_cif_kwargs) → AtomArray[source]#

Retrieves and parses a component from a local mirror of the Chemical Component Dictionary.

Parameters:

ccd_code (-) – The three-letter code of the chemical component.
ccd_mirror_path (-) – Path to the root of the CCD mirror directory.
**parse_ccd_cif_kwargs (-) –
Additional keyword arguments passed to parse_ccd_cif(): - coords (Literal[“model”, “ideal_pdbx”, “ideal_rdkit”] | None):

Type of coordinates to use. Defaults to “ideal_pdbx”.
- add_properties (bool): Whether to include RDKit-computed properties. Defaults to True.
- add_mapping (bool): Whether to include external resource mappings, such as e.g. the ChEMBL ID.
  Defaults to False.

Returns:

The parsed atomic structure of the requested component.

Return type:

AtomArray

Example

>>> atom_array = get_ccd_component_from_mirror("ALA", coords="ideal_pdbx")

atomworks.io.utils.ccd.get_chain_type_from_ccd_code(ccd_code: str) → ChainType[source]#: Get the ChainType enum corresponding to a CCD code.

atomworks.io.utils.ccd.get_chain_type_from_chem_comp_type(chem_comp_type: str) → ChainType[source]#: Get the ChainType enum corresponding to a chemical component type.

atomworks.io.utils.ccd.get_chem_comp_leaving_atom_names(ccd_code: str, ccd_mirror_path: PathLike = None, mode: Literal['warn', 'raise'] = 'warn') → dict[str, tuple[str, ...]][source]#

Computes the canonical leaving groups for a given CCD entry based on the PDBs annotation of leaving atoms.

The returned dictionary maps the name of the atom to the names of the atoms that would become disconnected if the atom were removed.

Example

>>> get_chem_comp_leaving_atom_names("ALA")
{'N': ('H2',), 'C': ('OXT', 'HXT'), 'OXT': ('HXT',)}

atomworks.io.utils.ccd.get_chem_comp_type(ccd_code: str, mode: Literal['warn', 'raise'] = 'warn') → str[source]#

Get the chemical component type for a CCD code from the Chemical Component Dictionary (CCD).

Can be combined with CHEM_TYPES from atomworks.io_biotite.constants to determine if a component is a protein, nucleic acid, or carbohydrate.

Parameters:

ccd_code (str) – The CCD code for the component. E.g. ALA for alanine, NAP for N-acetyl-D-glucosamine.
mode (Literal["warn", "raise"]) – How to handle unknown chemical component types.

Example

>>> get_chem_comp_type("ALA")
'L-PEPTIDE LINKING'

atomworks.io.utils.ccd.get_std_to_alt_atom_name_map(ccd_code: str, ccd_mirror_path: PathLike = None) → dict[str, str][source]#: Get a map from standard atom names to alternative atom names.

atomworks.io.utils.ccd.get_unknown_ccd_code_for_chem_comp_type(chem_comp_type: str) → str[source]#: Get the CCD code for an unknown chemical component type.

atomworks.io.utils.ccd.na_chem_comps() → frozenset[str][source]#: Set of nucleic acid chemical components. E.g. {‘DA’, ‘DC’, …}

atomworks.io.utils.ccd.parse_ccd_cif(cif: CIFFile, coords: Literal['model', 'ideal_pdbx', 'ideal_rdkit'] | None | tuple[str, ...] = ('ideal_pdbx', 'model', 'ideal_rdkit'), add_properties: bool = False, add_mapping: bool = False) → AtomArray[source]#

Parses a Chemical Component Dictionary CIF file into a Biotite AtomArray structure.

Parameters:

cif (-) – The CIF file containing the component data.
coords (-) –
Type of coordinates to use. Defaults to (“ideal_pdbx”, “model”, “ideal_rdkit”). Can be a single coordinate type or a tuple of fallback preferences (e.g., (“ideal_pdbx”, “model”, “ideal_rdkit”)).
- ”model”: Use the coordinates that are found in a random (but fixed) pdb file.
- ”ideal_pdbx”: Use the idealized coordinates computed by the RCSB PDB (sometimes not available).
- ”ideal_rdkit”: Use the idealized coordinates computed by RDKit (sometimes unrealistic).
add_properties (-) – Whether to include RDKit-computed properties. Defaults to False. Properties are available under the properties attribute of the returned AtomArray.
add_mapping (-) – Whether to include external resource mappings, such as e.g. the ChEMBL ID. Defaults to False. Mappings are available under the mapping attribute of the returned AtomArray.

Returns:

The parsed atomic structure with requested annotations and properties.

Return type:

AtomArray

Example

>>> cif = pdbx.CIFFile.read("path/to/ALA.cif")
>>> atom_array = parse_ccd_cif(cif, coords="ideal_pdbx")
>>> # With fallback preferences:
>>> atom_array = parse_ccd_cif(cif, coords=["ideal_pdbx", "model", "ideal_rdkit"])

atomworks.io.utils.ccd.rna_chem_comps() → frozenset[str][source]#: Set of RNA chemical components. E.g. {‘A’, ‘C’, …}

CCD Utilities#

This Page