Non-RCSB Utilities#
Utility functions for handling non-RCSB CIF files. Such files do not follow the standard CIF format and thus may require special handling.
- atomworks.io.utils.non_rcsb.get_identity_assembly_gen_category(chain_ids: list[str]) CIFCategory [source]#
- atomworks.io.utils.non_rcsb.initialize_chain_info_from_atom_array(atom_array: AtomArray, infer_chain_type: bool = True, infer_chain_sequences: bool = True, use_chain_iids: bool = False) dict [source]#
Infer specified information (chain type and sequence information) from an AtomArray.
Required when the chain type or polymer sequence information is not explicitly provided in the CIF file. Such situations may arise when the CIF file is not from the RCSB PDB database (e.g., distillation).
WARNING: Use this function for computationally predicted structures or for inference; do not use for files from the RCSB PDB database.
- In particular, this function adds the following information to the chain_info_dict:
The RCSB entity ID for each chain (e.g., 1, 2, 3, etc.), if present in the AtomArray (under the entity_id atom site label)
The unprocessed one-letter entity canonical and non-canonical sequences.
(OptionallyA boolean flag indicating whether the chain is a polymer.
(Optionally) The chain type as an IntEnum (e.g., polypeptide(L), non-polymer, etc.)
(Optionally) The residue IDs and residue names, inferred from the AtomArray.
- Parameters:
atom_array (AtomArray) – The AtomArray object to infer chain information from.
infer_chain_type (bool, optional) – Whether to infer the chain type from the AtomArray. Defaults to True.
infer_chain_sequences (bool, optional) – Whether to infer the chain sequences from the AtomArray. Defaults to True.
use_chain_iids (bool, optional) – Whether to use the chain_iid annotation rather than the chain_id. Defaults to False.
- Returns:
A dictionary containing the specified chain information as keys
- Return type:
dict