Atom Array Transforms#

Transforms operating predominantly on Biotite’s AtomArray objects. These operations should take as input, and return, AtomArray objects.

atomworks.io.transforms.atom_array.add_atomic_number_annotation(atom_array: AtomArray | AtomArrayStack) AtomArray | AtomArrayStack[source]#

Adds the atomic number (atomic_number) annotation to the AtomArray.

atomworks.io.transforms.atom_array.add_chain_iid_annotation(atom_array_stack: AtomArrayStack) AtomArrayStack[source]#

Adds the chain instance ID (chain_iid) annotation to the AtomArrayStack

atomworks.io.transforms.atom_array.add_chain_type_annotation(atom_array: AtomArray | AtomArrayStack, chain_info_dict: dict) AtomArray | AtomArrayStack[source]#

Adds a chain_type annotation to the AtomArray.

Parameters:
  • atom_array (-) – The full atom array.

  • chain_info_dict (-) – A dictionary mapping chain IDs to chain information.

Returns:

The AtomArray with the chain_type annotation added as an integer.

Return type:

  • AtomArray | AtomArrayStack

atomworks.io.transforms.atom_array.add_charge_from_ccd_codes(atom_array: AtomArray | AtomArrayStack) AtomArray | AtomArrayStack[source]#

Adds charge annotations to an atom array based on the Chemical Component Dictionary (CCD) codes.

Retrieves charge information from the CCD for each residue and assigns it to matching atoms. If a residue or atom is not found in the CCD, a charge of 0 is assigned. If a charge annotation is already present, it is overwritten.

Parameters:

atom_array – The atom array to which charge annotations will be added. Can be either an AtomArray or AtomArrayStack.

Returns:

The input atom array with added charge annotations.

WARNING: This function will assume that each residue in the atom array is exactly as in the CCD.

Therefore the charges will be incorrect if it is in a different protonation state, or has been ionized or else in the original structure. Use this function with caution!

NOTE: If you want to add charges to canonical amino acids based on the pH, take a look at the

‘hydride.estimate_amino_acid_charges’ function instead.

Example

>>> atom_array = load_any("6lyz.cif", model=1)
>>> atom_array_with_charges = add_charge_from_ccd_codes(atom_array)
atomworks.io.transforms.atom_array.add_hydrogen_atom_positions(atom_array: AtomArray | AtomArrayStack, residue_level_annots_to_copy_to_hydrogens: list[str] = []) AtomArray | AtomArrayStack[source]#

Add hydrogens using biotite supported hydride library.

Removes any existing hydrogens first, then adds new hydrogens using the hydride library.

Parameters:
  • atom_array – The atom array to which hydrogens will be added.

  • residue_level_annots_to_copy_to_hydrogens (list[str]) – A list of residue-level annotations that will be copied over to the newly-added hydrogens for each residue.

Returns:

The updated atom array with hydrogens added, preserving the input type.

atomworks.io.transforms.atom_array.add_id_and_entity_annotations(atom_array: AtomArray) AtomArray[source]#

Adds all 6 (‘chain’, ‘pn_unit’, ‘molecule’) x (‘id’, ‘entity’) annotations to the AtomArray.

atomworks.io.transforms.atom_array.add_iid_annotations_to_assemblies(assemblies_dict: dict[str | int, AtomArray | AtomArrayStack]) dict[str | int, AtomArray | AtomArrayStack][source]#

Adds chain, PN unit, and molecule IIDs to assembly AtomArrayStacks.

atomworks.io.transforms.atom_array.add_molecule_id_annotation(atom_array: AtomArray | AtomArrayStack) AtomArray | AtomArrayStack[source]#

Adds the molecule ID (molecule_id) annotation to the AtomArray.

atomworks.io.transforms.atom_array.add_molecule_iid_annotation(atom_array_stack: AtomArrayStack) AtomArrayStack[source]#

Adds the molecule instance ID (molecule_iid) annotation to the AtomArrayStack

atomworks.io.transforms.atom_array.add_pn_unit_id_annotation(atom_array: AtomArray | AtomArrayStack) AtomArray | AtomArrayStack[source]#

Adds the polymer/non-polymer unit ID (pn_unit_id) annotation to the AtomArray. Two covalently bonded ligands are considered one PN unit, but a ligand bonded to a protein is considered two PN units. See the README glossary for more details on how we define chains, pn_units, and molecules within this codebase.

Parameters:

atom_array (AtomArray) – The AtomArray to process.

Returns:

The AtomArray including the pn_unit_id annotation.

Return type:

atom_array (AtomArray)

atomworks.io.transforms.atom_array.add_pn_unit_iid_annotation(atom_array: AtomArray | AtomArrayStack) AtomArray | AtomArrayStack[source]#

Adds the polymer/non-polymer unit instance ID (pn_unit_iid) annotation to the AtomArray or AtomArrayStack.

atomworks.io.transforms.atom_array.add_polymer_annotation(atom_array: AtomArray | AtomArrayStack, chain_info_dict: dict) AtomArray | AtomArrayStack[source]#

Adds an annotation to the atom array to indicate whether a chain is a polymer.

Parameters:
  • atom_array (AtomArray) – The atom array containing the chain information.

  • chain_info_dict (dict) – Dictionary containing the sequence details of each chain.

Returns:

The updated atom array with the polymer annotation added.

Return type:

AtomArray

atomworks.io.transforms.atom_array.annotate_entities(atom_array: AtomArray, level: str, lower_level_id: str | list[str], lower_level_entity: str, add_inter_level_bond_hash: bool = True) tuple[AtomArray, dict][source]#

Annotates entities in an AtomArray at a given id level, based on the connectivity and annotations at the lower level.

The intended use is, for example:
  • For the molecule level, molecule_entities are generated for each molecule_id based on the connectivty

    at the pn_unit level.

  • For the pn_unit level, pn_unit_entities are generated for each pn_unit_id based on the connectivty

    at the chain level.

  • For the chain level, chain_entities are generated for each chain_id based on the connectivty at the residue

    level.

Parameters:
  • atom_array (-) – The AtomArray to process.

  • level (-) – The level at which to annotate entities (e.g., “chain”, “pn_unit”, “entity”)

  • lower_level_id (-) – A list of annotations to consider for determining segment boundaries at a lower level. E.g. “pn_unit_id”, “chain_id” or “res_id”.

  • lower_level_entity (-) – The annotation to use for identifying entities at the lower level. E.g. “pn_unit_entity”, “chain_entity” or “res_name”.

  • add_inter_level_bond_hash (-) – Whether to add a hash of the inter-level bonds to the entity hash. For some cases, this may be necessary to distinguish entities (e.g., when determining molecule-level entities). In others (e.g., for polymers), this may be overkill.

Returns:

A tuple containing:
  • atom_array (AtomArray): The updated AtomArray with the entity annotation.

  • entities_info (dict): A dictionary mapping entity IDs to lists of instance IDs.

Return type:

  • Tuple[AtomArray, dict]

Example

>>> atom_array = AtomArray(...)
>>> entities_at_level, entities_info = annotate_entities(
...     atom_array, level="chain", lower_level_id="res_id", lower_level_entity="res_name"
... )
>>> print(entities_at_level)
[0, 0, 1, 1, 2, 2]
>>> print(entities_info)
{0: [0, 1], 1: [2, 3], 2: [4, 5]}
atomworks.io.transforms.atom_array.ensure_atom_array_stack(atom_array_or_stack: AtomArray | AtomArrayStack) AtomArrayStack[source]#

Ensures that the input is an AtomArrayStack. If it is an AtomArray, it is converted to a stack.

atomworks.io.transforms.atom_array.is_any_coord_nan(atom_array: AtomArray | AtomArrayStack) ndarray[source]#

Returns a boolean mask of shape [n_atoms] indicating whether any coordinate is NaN for each atom in the AtomArray or AtomArrayStack.

atomworks.io.transforms.atom_array.keep_last_residue(atom_array: AtomArray | AtomArrayStack) AtomArray | AtomArrayStack[source]#

Removes duplicate residues in the atom array, keeping only the last occurrence.

Parameters:

atom_array (AtomArray) – The atom array containing the chain information.

Returns:

The atom array with duplicate residues removed.

Return type:

AtomArray

atomworks.io.transforms.atom_array.maybe_fix_non_polymer_at_symmetry_center(atom_array_stack: AtomArrayStack, clash_distance: float = 1.0, clash_ratio: float = 0.5) AtomArrayStack[source]#

In some PDB entries, non-polymer molecules are placed at the symmetry center and clash with themselves when transformed via symmetry operations. We should remove the duplicates in these cases, keeping the identity copy.

We consider a non-polymer to be clashing with itself if at least clash_ratio of its atoms clash with the symmetric copy.

Examples: — PDB ID 7mub has a potassium ion at the symmetry center that when reflected with the symmetry operation clashes with itself. — PDB ID 1xan has a ligand at a symmetry center that similarly when refelcted clashes with itself.

Parameters:
  • atom_array (AtomArray) – The atom array to be patched.

  • clash_distance (float) – The distance threshold for two atoms to be considered clashing.

  • clash_ratio (float) – The percentage of atoms that must clash for the molecule to be considered clashing.

Returns:

The patched atom array.

Return type:

AtomArray

atomworks.io.transforms.atom_array.mse_to_met(atom_array: AtomArray) AtomArray[source]#

Convert MSE residues (selenomethionine) to MET (methionine).

atomworks.io.transforms.atom_array.remove_ccd_components(atom_array: AtomArray | AtomArrayStack, ccd_codes_to_remove: list[str]) AtomArray | AtomArrayStack[source]#

Remove atoms from the AtomArray or AtomArrayStack that have CCD codes in the ccd_codes_to_remove list.

Parameters:
  • atom_array (AtomArray) – The array of atoms.

  • ccd_codes_to_remove (list) – A list of CCD codes to be removed from the atom array.

Returns:

The filtered atom array.

Return type:

AtomArray

atomworks.io.transforms.atom_array.remove_hydrogens(atom_array: AtomArray | AtomArrayStack) AtomArray | AtomArrayStack[source]#

Removes hydrogens from the AtomArray or AtomArrayStack.

atomworks.io.transforms.atom_array.remove_nan_coords(atom_array: AtomArray | AtomArrayStack) AtomArray | AtomArrayStack[source]#

Returns a copy of the AtomArray or AtomArrayStack with rows where any coordinate is NaN removed.

atomworks.io.transforms.atom_array.remove_waters(atom_array: AtomArray | AtomArrayStack) AtomArray | AtomArrayStack[source]#

Removes waters from the AtomArray or AtomArrayStack.

atomworks.io.transforms.atom_array.replace_negative_res_ids_with_auth_seq_id(atom_array: AtomArray) AtomArray[source]#

Replaces res_id values of -1 with the corresponding auth_seq_id values.

When loading from the PDB, this step is generally not needed; however, some AF-3 predictions have negative res_ids without labeling chains as non-polymeric via the entity_id field.

Parameters:

atom_array (AtomArray) – The atom array to fix.

Returns:

The updated atom array with negative res_ids replaced by auth_seq_ids.

Return type:

AtomArray

atomworks.io.transforms.atom_array.resolve_arginine_naming_ambiguity(atom_array: AtomArray, raise_on_error: bool = True) AtomArray[source]#

Arginine naming ambiguities are fixed (ensuring NH1 is always closer to CD than NH2)

atomworks.io.transforms.atom_array.subset_atom_array(atom_array: AtomArray | AtomArrayStack, keep: ndarray) AtomArray | AtomArrayStack[source]#

Subsets an AtomArray or AtomArrayStack by a boolean mask.

atomworks.io.transforms.atom_array.update_nonpoly_seq_ids(atom_array: AtomArray, chain_info_dict: dict) AtomArray[source]#

Updates the sequence IDs of non-polymeric chains in the atom array to the author sequence IDs.

Parameters:
  • atom_array (AtomArray) – The atom array containing the chain information.

  • chain_info_dict (dict) – Dictionary containing the sequence details of each chain.

Returns:

The updated atom array with the sequence IDs updated for non-polymeric chains.

Return type:

AtomArray