parser#
Entrypoint for parsing atomic-level structure files (e.g., PDB, CIF) into Biotite-compatible data structures.
- atomworks.io.parser.parse(filename: PathLike | StringIO | BytesIO, *, file_type: Literal['cif', 'pdb'] | None = None, ccd_mirror_path: PathLike | None = None, cache_dir: PathLike | None = None, save_to_cache: bool = False, load_from_cache: bool = False, add_missing_atoms: bool = True, add_id_and_entity_annotations: bool = True, add_bond_types_from_struct_conn: list[str] = ['covale'], remove_ccds: list[str] | None = None, remove_waters: bool = True, fix_ligands_at_symmetry_centers: bool = True, fix_arginines: bool = True, fix_formal_charges: bool = True, fix_bond_types: bool = True, convert_mse_to_met: bool = False, hydrogen_policy: Literal['keep', 'remove', 'infer'] = 'keep', model: int | None = None, build_assembly: Literal['first', 'all'] | list[str] | tuple[str] | None = 'all', extra_fields: list[str] | Literal['all'] | None = None, keep_cif_block: bool = False) dict[str, Any] [source]#
Entrypoint for general parsing of atomic-level structure files.
- Can either:
Directly load structure from file, using the specified keyword arguments;
Load the structure from a cached directory, re-building bioassemblies on-the-fly if necessary; or
Perform analogous cleaning/processing steps on an existing AtomArray or AtomArrayStack.
- We categorize arguments into two groups:
Wrapper arguments: Arguments that are used within the wrapping parse method (e.g., caching)
- CIF parsing arguments: Arguments that control structure parsing and are ultimately are passed
to the _parse_from_atom_array method (regardless of file type, we convert to an AtomArray before parsing)
- Parameters:
filename (PathLike | io.StringIO | io.BytesIO) – Either a Path or buffer to the file. This may be any format of atomic-level structure (e.g. .cif, .bcif, .cif.gz, .pdb), although .cif files are strongly recommended.
*** (*** Parsing arguments) –
file_type (Literal["cif", "pdb"] | None, optional) – The file type of the structure file. If not provided, the file type will be inferred automatically.
load_from_cache (bool, optional) – Whether to load pre-compiled results from cache. Defaults to False.
cache_dir (PathLike, optional) – Directory path to save pre-compiled results. Defaults to None.
save_to_cache (bool, optional) – Whether to save the results to cache when building the structure. Defaults to False.
*** –
ccd_mirror_path (str, optional) – Path to the local mirror of the Chemical Component Dictionary (recommended). If not provided, Biotite’s built-in CCD will be used.
add_missing_atoms (bool, optional) – Whether to add missing atoms to the structure (from entirely or partially unresolved residues). Defaults to True.
add_id_and_entity_annotations (bool, optional) – Whether to add identifier and entity annotations to the structure. Defaults to True.
add_bond_types_from_struct_conn (list, optional) – A list of bond types to add to the structure from the struct_conn category. Defaults to [“covale”]. This means that we will only add covalent bonds to the structure (excluding metal coordination and disulfide bonds).
remove_ccds (list, optional) – A list of CCD codes (e.g. ALA, HEM, …) to remove from the structure. Defaults to crystallization aids. NOTE: Exclusion of polymer residues and common multi-chain ligands must be done with care to avoid sequence gaps.
remove_waters (bool, optional) – Whether to remove water molecules from the structure. Defaults to True.
fix_ligands_at_symmetry_centers (bool, optional) – Whether to patch non-polymer residues at symmetry centers that clash with themselves when transformed. Defaults to True.
fix_arginines (bool, optional) – Whether to fix arginine naming ambiguity, see the AF-3 supplement for details. Defaults to True.
fix_formal_charges (bool, optional) – Whether to fix formal charges on atoms involved in inter-residue bonds. Defaults to True.
fix_bond_types (bool, optional) – Whether to correct for nucleophilic additions on atoms involved in inter-residue bonds. Defaults to True.
convert_mse_to_met (bool, optional) – Whether to convert selenomethionine (MSE) residues to methionine (MET) residues. Defaults to False.
hydrogen_policy (Literal, optional) – Whether to keep, remove or infer hydrogens using biotite-hydride (will remove existing hydrogens and infer fresh). Defaults to “keep”. Options: “keep”, “remove”, “infer”.
model (int, optional) – The model number to parse for files with multiple models (e.g., NMR). Defaults to all models (None).
build_assembly (string, list, or tuple, optional) – Specifies which assembly to build, if any. Options are None (e.g., asymmetric unit), “first”, “all”, or a list or tuple of assembly IDs. Defaults to “all”.
extra_fields (list, optional) – A list of extra fields to include in the AtomArrayStack. Defaults to None. “all” includes all fields. only support cif files.
keep_cif_block (bool, optional) – Whether to keep the CIF block in the result. Defaults to False.
- Returns:
- A dictionary containing the following keys:
- chain_info: A dictionary mapping chain ID to sequence, type (as an IntEnum), RCSB entity,
EC number, and other information.
ligand_info: A dictionary containing ligand of interest information. asym_unit: An AtomArrayStack instance representing the asymmetric unit. assemblies: A dictionary mapping assembly IDs to AtomArrayStack instances. metadata: A dictionary containing metadata about the structure
(e.g., resolution, deposition date, etc.).
- extra_info: A dictionary with information for cross-compatibility and caching.
Should typically not be used directly.
- Return type:
dict