Bond Utilities#
Utility functions for the detection, and creation, of bonds in a structure.
- atomworks.io.utils.bonds.generate_inter_level_bond_hash(atom_array: AtomArray, lower_level_id: str, lower_level_entity: str | None = None) str [source]#
Generates a hash string representing the inter-level bonds within an AtomArray. When computing entities IDs, we must consider inter-level bonds at the atom- and residue-level to avoid ambiguity.
- Parameters:
atom_array (AtomArray) – The array of atoms containing bond and annotation information.
lower_level_id (str) – The level which to find, and hash, the inter-level bonds. For example, when computing molecule entities, we’d consider the inter-PN Unit bonds.
lower_level_entity (str } None) – An additional entity annotation to use when computing the hash. Optional; if None, then only residue ID, residue name, and atom name are used.
- Returns:
A hash string representing the inter-level bonds.
- Return type:
str
- atomworks.io.utils.bonds.get_coarse_graph_as_nodes_and_edges(atom_array: AtomArray, annotations: str | tuple[str]) tuple[ndarray, ndarray] [source]#
Returns the coarse-grained nodes and edges at the given annotation level based on the atom array’s bond connectivity.
- Parameters:
atom_array (-) – The atom array containing atomic information and bonds.
annotations (-) – A single annotation or a tuple of annotations to be used for node identification.
- Returns:
An array of unique nodes, each represented by a combination of annotations. - edges (np.ndarray): An array of edges, where each edge is a tuple of node indices representing a bond
between two nodes.
- Return type:
nodes (np.ndarray)
Example
>>> atom_array = cached_parse("5ocm")["atom_array"] >>> nodes, edges = get_coarse_graph(atom_array, ["chain_id", "transformation_id"]) >>> print(nodes) array([('A', '1'), ('F', '1'), ('G', '1'), ('H', '1'), ('I', '1'), ('W', '1'), ('X', '1'), ('Y', '1')], dtype=[('chain_id', '<U4'), ('transformation_id', '<U1')]) >>> print(edges) array([[0, 0], [1, 1], [2, 2], [3, 3], [5, 5], [6, 6]])
- atomworks.io.utils.bonds.get_connected_nodes(nodes: ndarray, edges: ndarray) list[list[Any]] [source]#
Returns connected nodes as a mapped list given corresponding arrays of nodes and edges.
Example
>>> nodes = np.array([("A", "1"), ("B", "1"), ("C", "1"), ("D", "1")]) >>> edges = np.array([[0, 1], [0, 2], [1, 2]]) >>> connected_nodes = get_connected_nodes(nodes, edges) >>> print(connected_nodes) [[("A", "1"), ("B", "1"), ("C", "1")], [("D", "1")]]
- atomworks.io.utils.bonds.get_inferred_polymer_bonds(atom_array: AtomArray) tuple[list[tuple[int, int, BondType]], ndarray] [source]#
Infers and returns polymer bonds between consecutive residues in an atom array based on chemical component types and chain types.
The function identifies bonds by looking at consecutive residues within the same chain and determining the appropriate bonding atoms based on either the chain type (as a fallback) or more detailed chemical component types. It also tracks leaving atoms that are displaced during bond formation. Leaving groups are inferred from the CCD entries for the chemical components. If a CCD code is missing from your local CCD mirror, leaving groups will not be inferred.
- Parameters:
atom_array (-) – The atom array containing the structure information. Must include annotations for chain_id, res_id, res_name, and atom_name. Optionally includes chain_type annotation.
- Returns:
- List of tuples containing (atom1_idx, atom2_idx,
bond_type) for each inferred polymer bond.
- leaving_atom_idxs (np.ndarray): Array of atom indices that represent leaving groups displaced during bond
formation.
- Return type:
polymer_bonds (np.array[[int, int, struc.BondType]])
Example
>>> # Create an atom array with two consecutive peptide residues >>> atom_array = AtomArray(length=10) >>> atom_array.chain_id = ["A"] * 10 >>> atom_array.res_id = [1] * 5 + [2] * 5 >>> atom_array.res_name = ["ALA"] * 5 + ["GLY"] * 5 >>> atom_array.atom_name = ["N", "CA", "C", "OXT", "CB"] + ["N", "CA", "C", "O", "H2"] >>> # Get the polymer bonds >>> bonds, leaving = get_inferred_polymer_bonds(atom_array) >>> print(bonds) # Shows C-N peptide bond between residues [(2, 5, <BondType.SINGLE>)] # C of ALA to N of GLY >>> print(leaving) # Shows leaving OXT from C and H2 from N (other hydrogen atom names not shown for simplicity) [array([3]), array([9])]
- atomworks.io.utils.bonds.get_struct_conn_bonds(atom_array: AtomArray, struct_conn_dict: dict[str, ndarray], add_bond_types: list[str] = ['covale'], raise_on_failure: bool = False) tuple[ndarray, ndarray] [source]#
Adds bonds from the ‘struct_conn’ category of a CIF block to an atom array. Only covalent bonds are considered.
- Parameters:
atom_array (AtomArray) – The atom array used to get atom indices.
struct_conn_dict (dict[str, np.ndarray]) –
The struct_conn category of a CIF block as a dictionary. E.g. (Only mandatory fields, as defined by the RCSB, are shown) ```
{ ‘conn_type_id’: array([‘disulf’, …]), ‘ptnr1_label_asym_id’: array([‘A’, …]), ‘ptnr1_label_comp_id’: array([‘CYS’, …]), ‘ptnr1_label_seq_id’: array([‘6’, …]), ‘ptnr1_label_atom_id’: array([‘SG’, …]), ‘ptnr1_symmetry’: array([‘1_555’, …]), ‘ptnr2_label_asym_id’: array([‘A’, …]), ‘ptnr2_label_comp_id’: array([‘CYS’, …]), ‘ptnr2_label_seq_id’: array([‘127’, …]), ‘ptnr2_label_atom_id’: array([‘SG’, …]), ‘ptnr2_symmetry’: array([‘1_555’, …]), }
``` However, in this function, we only require the following fields:
conn_type_id (e.g., “covale”)
ptnr1_label_asym_id (chain_id or chain_iid, e.g., “A” or “A_1”)
ptnr1_label_comp_id (residue name in the CCD, e.g., “CYS”)
ptnr1_label_seq_id (residue ID, e.g., “6”)
ptnr1_label_atom_id (atom name, e.g., “SG”)
ptnr2_label_asym_id
ptnr2_label_comp_id
ptnr2_label_seq_id
ptnr2_label_atom_id
add_bond_types (list[str]) – A list of bond types that should be added. Valid bond types are: [“covale”, “disulf”, “metalc”, “hydrog”]. Defaults to [“covale”], which is the use-case in structure-prediction, where we would a-priori know covalent bonds (except for disulfides).
raise_on_failure (bool) – If True, raise an error if specified bonds cannot be made (e.g., if the atoms are missing). Defaults to False.
NOTE – While chain_iid annotations are allowed, a given bond is expected to contain only one annotation type, i.e. both chain_id or both chain_iid
- Returns:
A List of bonds to be added to the atom array. leaving (np.ndarray): An array of indices of atoms that are leaving groups for bookkeeping.
- Return type:
bonds (np.array[[int, int, struc.BondType]])
References
- atomworks.io.utils.bonds.hash_atom_array(atom_array: AtomArray, annotations: tuple[str] = ('element', 'atom_name'), bond_order: bool = True, cast_aromatic_bonds_to_same_type: bool = False, use_md5: bool = False, md5_length: int | None = None) str [source]#
Computes a hash for an AtomArray based on the bond connectivity and the selected node annotations.
- Parameters:
atom_array (AtomArray) – The array of atoms to hash
annotations (tuple[str]) – The node annotations to include in the hash
bond_order (bool) – Whether to include bond order in the hash
cast_aromatic_bonds_to_same_type (bool) – Whether to treat all aromatic bonds as the same type
use_md5 (bool) – Whether to use MD5 hashing on the output
md5_length (int | None) – If using MD5, the number of characters to keep from the hash. If None, returns full hash.
- Returns:
The computed hash
- Return type:
str
- atomworks.io.utils.bonds.hash_graph(graph: Graph, node_attr: str | None = None, edge_attr: str | None = None, iterations: int = 3, digest_size: int = 16) str [source]#
Computes a hash for a given graph using the Weisfeiler-Lehman (WL) graph hashing algorithm and additionally adds a node and edge attribute hash, if specified, to deal with common edge cases where WL fails (e.g. disconnected graphs).
- Parameters:
graph (-) – The input graph to be hashed.
node_attr (-) – The node attribute to be used for hashing. If None, node attributes are ignored.
edge_attr (-) – The edge attribute to be used for hashing. If None, edge attributes are ignored.
iterations (-) – The number of iterations for the WL algorithm. Default is 3.
digest_size (-) – The size of the hash digest for WL. Default is 16.
- Returns:
The computed hash of the graph.
- Return type:
str
Example
>>> import networkx as nx >>> G = nx.gnm_random_graph(10, 15) >>> hash_graph(G) '504894f49dd84b17c391b163af69624b'