Category Transforms#
Transforms operating on Biotite’s CIFBlock and CIFCategory objects.
These transforms are used to extract information from the CIFBlock and return a dictionary containing processed information.
- atomworks.io.transforms.categories.category_to_df(cif_block: CIFBlock, category: str) DataFrame | None [source]#
Convert a CIF block to a pandas DataFrame.
- atomworks.io.transforms.categories.category_to_dict(cif_block: CIFBlock, category: str) dict[str, ndarray] [source]#
Convert a CIF block to a dictionary.
- atomworks.io.transforms.categories.extract_crystallization_details(crystal_dict: dict) dict[str, list[float] | None] [source]#
Extracts crystallization details from the crystallization dictionary.
- Parameters:
crystal_dict – Dictionary for the exptl_crystal_grow CIF category.
- Returns:
“pH”: A list of two floats [min_pH, max_pH], or None if unavailable.
- Return type:
A dictionary with crystallization details. Currently includes
- atomworks.io.transforms.categories.get_ligand_of_interest_info(cif_block: CIFBlock) dict [source]#
Extract ligand of interest information from a CIF block.
- atomworks.io.transforms.categories.get_metadata_from_category(cif_block: CIFBlock, fallback_id: str | None = None) dict [source]#
Extract metadata from the CIF block. If the entry.id field is not present in the CIF block, the fallback_id is used instead (e.g., the filename of the CIF).
- From RCSB CIF files, this function extracts:
ID (e.g., PDB ID)
Method (e.g., X-ray, NMR, etc.)
Deposition date (initial)
Release date (smallest revision date)
Resolution (e.g., 5.0, 3.0, etc.)
- For custom CIF files (e.g., distillation), this function extracts:
Extra metadata (all other categories)
- Parameters:
cif_block (CIFBlock) – The CIF block to extract metadata from.
fallback_id (str) – A fallback ID to use if the entry.id field is not present in the CIF block.
- atomworks.io.transforms.categories.initialize_chain_info_from_category(cif_block: CIFBlock, atom_array: AtomArray) dict [source]#
Extracts chain entity-level information from the CIF block.
Requires the categories ‘entity’ and ‘entity_poly’ to be present in the CIF block.
- In particular, this function adds the following information to the chain_info_dict:
The RCSB entity ID for each chain (e.g., 1, 2, 3, etc.)
The chain type as an IntEnum (e.g., polypeptide(L), non-polymer, etc.)
The unprocessed one-letter entity canonical and non-canonical sequences.
A boolean flag indicating whether the chain is a polymer.
The EC numbers for the chain.
Note that three-letter sequence information is added to the chain_info_dict in a later step.
- Parameters:
cif_block (CIFBlock) – Parsed CIF block.
atom_array (AtomArray) – Atom array containing the chain information.
- Returns:
Dictionary containing the sequence details of each chain.
- Return type:
dict
- atomworks.io.transforms.categories.load_monomer_sequence_information_from_category(cif_block: CIFBlock, chain_info_dict: dict, atom_array: AtomArray, ccd_mirror_path: PathLike = None) dict [source]#
Load monomer sequence information into a chain_info_dict
- Uses:
The CIFCategory ‘entity_poly_seq’ as the sequence ground-truth for polymers.
The AtomArray as the ground-truth for non-polymers.
We must rely on the CIFCategory ‘entity_poly_seq’ for polymers, as the AtomArray may not contain the full sequence information (e.g., unresolved residues) For non-polymers, there’s no standard equivalent to ‘entity_poly_seq’, so we must use the AtomArray to get the sequence information.
When loading both polymer and non-polymer sequences, we also filter out unknown or otherwise ignored residues.
- Parameters:
cif_block (CIFBlock) – The CIF block containing the monomer sequence information.
chain_info_dict (dict) – The dictionary where the monomer sequence information will be stored.
atom_array (AtomArray) – The atom array used to get the sequence for non-polymers.
- Returns:
‘res_name’: The CCD residue names for each chain.
’res_id’: The residue IDs for each chain (does not perform re-indexing)
’processed_entity_non_canonical_sequence’: The processed non-canonical sequence for each chain.
’processed_entity_canonical_sequence’: The processed canonical sequence for each chain.
’has_sequence_heterogeneity’: A boolean flag indicating whether the chain has
- Return type:
The updated chain_info_dict with monomer sequence information. Adds the following keys