Bond Utilities#

Utility functions for the detection, and creation, of bonds in a structure.

atomworks.io.utils.bonds.generate_inter_level_bond_hash(atom_array: AtomArray, lower_level_id: str, lower_level_entity: str | None = None) str[source]#

Generates a hash string representing the inter-level bonds within an AtomArray. When computing entities IDs, we must consider inter-level bonds at the atom- and residue-level to avoid ambiguity.

Parameters:
  • atom_array (AtomArray) – The array of atoms containing bond and annotation information.

  • lower_level_id (str) – The level which to find, and hash, the inter-level bonds. For example, when computing molecule entities, we’d consider the inter-PN Unit bonds.

  • lower_level_entity (str } None) – An additional entity annotation to use when computing the hash. Optional; if None, then only residue ID, residue name, and atom name are used.

Returns:

A hash string representing the inter-level bonds.

Return type:

str

atomworks.io.utils.bonds.get_coarse_graph_as_nodes_and_edges(atom_array: AtomArray, annotations: str | tuple[str]) tuple[ndarray, ndarray][source]#

Returns the coarse-grained nodes and edges at the given annotation level based on the atom array’s bond connectivity.

Parameters:
  • atom_array (-) – The atom array containing atomic information and bonds.

  • annotations (-) – A single annotation or a tuple of annotations to be used for node identification.

Returns:

An array of unique nodes, each represented by a combination of annotations. - edges (np.ndarray): An array of edges, where each edge is a tuple of node indices representing a bond

between two nodes.

Return type:

  • nodes (np.ndarray)

Example

>>> atom_array = cached_parse("5ocm")["atom_array"]
>>> nodes, edges = get_coarse_graph(atom_array, ["chain_id", "transformation_id"])
>>> print(nodes)
array([('A', '1'), ('F', '1'), ('G', '1'), ('H', '1'), ('I', '1'),
       ('W', '1'), ('X', '1'), ('Y', '1')],
      dtype=[('chain_id', '<U4'), ('transformation_id', '<U1')])
>>> print(edges)
array([[0, 0],
       [1, 1],
       [2, 2],
       [3, 3],
       [5, 5],
       [6, 6]])
atomworks.io.utils.bonds.get_connected_nodes(nodes: ndarray, edges: ndarray) list[list[Any]][source]#

Returns connected nodes as a mapped list given corresponding arrays of nodes and edges.

Example

>>> nodes = np.array([("A", "1"), ("B", "1"), ("C", "1"), ("D", "1")])
>>> edges = np.array([[0, 1], [0, 2], [1, 2]])
>>> connected_nodes = get_connected_nodes(nodes, edges)
>>> print(connected_nodes)
[[("A", "1"), ("B", "1"), ("C", "1")], [("D", "1")]]
atomworks.io.utils.bonds.get_inferred_polymer_bonds(atom_array: AtomArray) tuple[list[tuple[int, int, BondType]], ndarray][source]#

Infers and returns polymer bonds between consecutive residues in an atom array based on chemical component types and chain types.

The function identifies bonds by looking at consecutive residues within the same chain and determining the appropriate bonding atoms based on either the chain type (as a fallback) or more detailed chemical component types. It also tracks leaving atoms that are displaced during bond formation. Leaving groups are inferred from the CCD entries for the chemical components. If a CCD code is missing from your local CCD mirror, leaving groups will not be inferred.

Parameters:

atom_array (-) – The atom array containing the structure information. Must include annotations for chain_id, res_id, res_name, and atom_name. Optionally includes chain_type annotation.

Returns:

List of tuples containing (atom1_idx, atom2_idx,

bond_type) for each inferred polymer bond.

  • leaving_atom_idxs (np.ndarray): Array of atom indices that represent leaving groups displaced during bond

    formation.

Return type:

  • polymer_bonds (np.array[[int, int, struc.BondType]])

Example

>>> # Create an atom array with two consecutive peptide residues
>>> atom_array = AtomArray(length=10)
>>> atom_array.chain_id = ["A"] * 10
>>> atom_array.res_id = [1] * 5 + [2] * 5
>>> atom_array.res_name = ["ALA"] * 5 + ["GLY"] * 5
>>> atom_array.atom_name = ["N", "CA", "C", "OXT", "CB"] + ["N", "CA", "C", "O", "H2"]
>>> # Get the polymer bonds
>>> bonds, leaving = get_inferred_polymer_bonds(atom_array)
>>> print(bonds)  # Shows C-N peptide bond between residues
[(2, 5, <BondType.SINGLE>)]  # C of ALA to N of GLY
>>> print(leaving)  # Shows leaving OXT from C and H2 from N (other hydrogen atom names not shown for simplicity)
[array([3]), array([9])]
atomworks.io.utils.bonds.get_struct_conn_bonds(atom_array: AtomArray, struct_conn_dict: dict[str, ndarray], add_bond_types: list[str] = ['covale'], raise_on_failure: bool = False) tuple[ndarray, ndarray][source]#

Adds bonds from the ‘struct_conn’ category of a CIF block to an atom array. Only covalent bonds are considered.

Parameters:
  • atom_array (AtomArray) – The atom array used to get atom indices.

  • struct_conn_dict (dict[str, np.ndarray]) –

    The struct_conn category of a CIF block as a dictionary. E.g. (Only mandatory fields, as defined by the RCSB, are shown) ```

    { ‘conn_type_id’: array([‘disulf’, …]), ‘ptnr1_label_asym_id’: array([‘A’, …]), ‘ptnr1_label_comp_id’: array([‘CYS’, …]), ‘ptnr1_label_seq_id’: array([‘6’, …]), ‘ptnr1_label_atom_id’: array([‘SG’, …]), ‘ptnr1_symmetry’: array([‘1_555’, …]), ‘ptnr2_label_asym_id’: array([‘A’, …]), ‘ptnr2_label_comp_id’: array([‘CYS’, …]), ‘ptnr2_label_seq_id’: array([‘127’, …]), ‘ptnr2_label_atom_id’: array([‘SG’, …]), ‘ptnr2_symmetry’: array([‘1_555’, …]), }

    ``` However, in this function, we only require the following fields:

    • conn_type_id (e.g., “covale”)

    • ptnr1_label_asym_id (chain_id or chain_iid, e.g., “A” or “A_1”)

    • ptnr1_label_comp_id (residue name in the CCD, e.g., “CYS”)

    • ptnr1_label_seq_id (residue ID, e.g., “6”)

    • ptnr1_label_atom_id (atom name, e.g., “SG”)

    • ptnr2_label_asym_id

    • ptnr2_label_comp_id

    • ptnr2_label_seq_id

    • ptnr2_label_atom_id

  • add_bond_types (list[str]) – A list of bond types that should be added. Valid bond types are: [“covale”, “disulf”, “metalc”, “hydrog”]. Defaults to [“covale”], which is the use-case in structure-prediction, where we would a-priori know covalent bonds (except for disulfides).

  • raise_on_failure (bool) – If True, raise an error if specified bonds cannot be made (e.g., if the atoms are missing). Defaults to False.

  • NOTE – While chain_iid annotations are allowed, a given bond is expected to contain only one annotation type, i.e. both chain_id or both chain_iid

Returns:

A List of bonds to be added to the atom array. leaving (np.ndarray): An array of indices of atoms that are leaving groups for bookkeeping.

Return type:

bonds (np.array[[int, int, struc.BondType]])

References

atomworks.io.utils.bonds.hash_atom_array(atom_array: AtomArray, annotations: tuple[str] = ('element', 'atom_name'), bond_order: bool = True, cast_aromatic_bonds_to_same_type: bool = False, use_md5: bool = False, md5_length: int | None = None) str[source]#

Computes a hash for an AtomArray based on the bond connectivity and the selected node annotations.

Parameters:
  • atom_array (AtomArray) – The array of atoms to hash

  • annotations (tuple[str]) – The node annotations to include in the hash

  • bond_order (bool) – Whether to include bond order in the hash

  • cast_aromatic_bonds_to_same_type (bool) – Whether to treat all aromatic bonds as the same type

  • use_md5 (bool) – Whether to use MD5 hashing on the output

  • md5_length (int | None) – If using MD5, the number of characters to keep from the hash. If None, returns full hash.

Returns:

The computed hash

Return type:

str

atomworks.io.utils.bonds.hash_graph(graph: Graph, node_attr: str | None = None, edge_attr: str | None = None, iterations: int = 3, digest_size: int = 16) str[source]#

Computes a hash for a given graph using the Weisfeiler-Lehman (WL) graph hashing algorithm and additionally adds a node and edge attribute hash, if specified, to deal with common edge cases where WL fails (e.g. disconnected graphs).

Parameters:
  • graph (-) – The input graph to be hashed.

  • node_attr (-) – The node attribute to be used for hashing. If None, node attributes are ignored.

  • edge_attr (-) – The edge attribute to be used for hashing. If None, edge attributes are ignored.

  • iterations (-) – The number of iterations for the WL algorithm. Default is 3.

  • digest_size (-) – The size of the hash digest for WL. Default is 16.

Returns:

The computed hash of the graph.

Return type:

  • str

Example

>>> import networkx as nx
>>> G = nx.gnm_random_graph(10, 15)
>>> hash_graph(G)
'504894f49dd84b17c391b163af69624b'