parser#

Entrypoint for parsing atomic-level structure files (e.g., PDB, CIF) into Biotite-compatible data structures.

atomworks.io.parser.parse(filename: PathLike | StringIO | BytesIO, *, file_type: Literal['cif', 'pdb'] | None = None, ccd_mirror_path: PathLike | None = None, cache_dir: PathLike | None = None, save_to_cache: bool = False, load_from_cache: bool = False, add_missing_atoms: bool = True, add_id_and_entity_annotations: bool = True, add_bond_types_from_struct_conn: list[str] = ['covale'], remove_ccds: list[str] | None = None, remove_waters: bool = True, fix_ligands_at_symmetry_centers: bool = True, fix_arginines: bool = True, fix_formal_charges: bool = True, fix_bond_types: bool = True, convert_mse_to_met: bool = False, hydrogen_policy: Literal['keep', 'remove', 'infer'] = 'keep', model: int | None = None, build_assembly: Literal['first', 'all'] | list[str] | tuple[str] | None = 'all', extra_fields: list[str] | Literal['all'] | None = None, keep_cif_block: bool = False) dict[str, Any][source]#

Entrypoint for general parsing of atomic-level structure files.

Can either:
  • Directly load structure from file, using the specified keyword arguments;

  • Load the structure from a cached directory, re-building bioassemblies on-the-fly if necessary; or

  • Perform analogous cleaning/processing steps on an existing AtomArray or AtomArrayStack.

We categorize arguments into two groups:
  • Wrapper arguments: Arguments that are used within the wrapping parse method (e.g., caching)

  • CIF parsing arguments: Arguments that control structure parsing and are ultimately are passed

    to the _parse_from_atom_array method (regardless of file type, we convert to an AtomArray before parsing)

Parameters:
  • filename (PathLike | io.StringIO | io.BytesIO) – Either a Path or buffer to the file. This may be any format of atomic-level structure (e.g. .cif, .bcif, .cif.gz, .pdb), although .cif files are strongly recommended.

  • *** (*** Parsing arguments) –

  • file_type (Literal["cif", "pdb"] | None, optional) – The file type of the structure file. If not provided, the file type will be inferred automatically.

  • load_from_cache (bool, optional) – Whether to load pre-compiled results from cache. Defaults to False.

  • cache_dir (PathLike, optional) – Directory path to save pre-compiled results. Defaults to None.

  • save_to_cache (bool, optional) – Whether to save the results to cache when building the structure. Defaults to False.

  • ***

  • ccd_mirror_path (str, optional) – Path to the local mirror of the Chemical Component Dictionary (recommended). If not provided, Biotite’s built-in CCD will be used.

  • add_missing_atoms (bool, optional) – Whether to add missing atoms to the structure (from entirely or partially unresolved residues). Defaults to True.

  • add_id_and_entity_annotations (bool, optional) – Whether to add identifier and entity annotations to the structure. Defaults to True.

  • add_bond_types_from_struct_conn (list, optional) – A list of bond types to add to the structure from the struct_conn category. Defaults to [“covale”]. This means that we will only add covalent bonds to the structure (excluding metal coordination and disulfide bonds).

  • remove_ccds (list, optional) – A list of CCD codes (e.g. ALA, HEM, …) to remove from the structure. Defaults to crystallization aids. NOTE: Exclusion of polymer residues and common multi-chain ligands must be done with care to avoid sequence gaps.

  • remove_waters (bool, optional) – Whether to remove water molecules from the structure. Defaults to True.

  • fix_ligands_at_symmetry_centers (bool, optional) – Whether to patch non-polymer residues at symmetry centers that clash with themselves when transformed. Defaults to True.

  • fix_arginines (bool, optional) – Whether to fix arginine naming ambiguity, see the AF-3 supplement for details. Defaults to True.

  • fix_formal_charges (bool, optional) – Whether to fix formal charges on atoms involved in inter-residue bonds. Defaults to True.

  • fix_bond_types (bool, optional) – Whether to correct for nucleophilic additions on atoms involved in inter-residue bonds. Defaults to True.

  • convert_mse_to_met (bool, optional) – Whether to convert selenomethionine (MSE) residues to methionine (MET) residues. Defaults to False.

  • hydrogen_policy (Literal, optional) – Whether to keep, remove or infer hydrogens using biotite-hydride (will remove existing hydrogens and infer fresh). Defaults to “keep”. Options: “keep”, “remove”, “infer”.

  • model (int, optional) – The model number to parse for files with multiple models (e.g., NMR). Defaults to all models (None).

  • build_assembly (string, list, or tuple, optional) – Specifies which assembly to build, if any. Options are None (e.g., asymmetric unit), “first”, “all”, or a list or tuple of assembly IDs. Defaults to “all”.

  • extra_fields (list, optional) – A list of extra fields to include in the AtomArrayStack. Defaults to None. “all” includes all fields. only support cif files.

  • keep_cif_block (bool, optional) – Whether to keep the CIF block in the result. Defaults to False.

Returns:

A dictionary containing the following keys:
chain_info: A dictionary mapping chain ID to sequence, type (as an IntEnum), RCSB entity,

EC number, and other information.

ligand_info: A dictionary containing ligand of interest information. asym_unit: An AtomArrayStack instance representing the asymmetric unit. assemblies: A dictionary mapping assembly IDs to AtomArrayStack instances. metadata: A dictionary containing metadata about the structure

(e.g., resolution, deposition date, etc.).

extra_info: A dictionary with information for cross-compatibility and caching.

Should typically not be used directly.

Return type:

dict