Parser#

The parser is the core entry point for converting structural and sequence files (mmCIF, PDB, FASTA, SMILES, etc.) into Biotite’s AtomArray API. It supports extensive options for annotation, filtering, and caching.

Example Usage#

from atomworks.io.parser import parse
result = parse(filename="/databases/rcsb/cif/ne/3nez.cif.gz")
print(result["chain_info"])

Returned Dictionary#

  • chain_info: Mapping of chain IDs to sequence, type, and metadata

  • ligand_info: Information about ligands in the structure

  • asym_unit: AtomArrayStack of the asymmetric unit

  • assemblies: Mapping of assembly IDs to AtomArrayStacks

  • metadata: Structure-level metadata

  • extra_info: Internal-use information for caching and compatibility

Parsing Arguments#

Name

Type

Default

Description

filename

PathLike / io.StringIO / io.BytesIO

Path to the structural file. Supports .cif, .cif.gz, .pdb, etc.

add_missing_atoms

bool

True

Add missing atoms to the structure. Useful for unresolved residues. Also adds intra- and inter-residue bonds.

add_id_and_entity_annotations

bool

True

Add id and entity annotations at chain, pn-unit, and molecule level to the AtomArray.

add_bond_types_from_struct_conn

list[str]

[“covale”]

List of bond types to add from struct_conn. Default is only covalent bonds.

remove_ccds

list[str] or None

crystallization aids

CCD codes to remove from the structure.

remove_waters

bool

True

Remove water molecules from the structure.

fix_ligands_at_symmetry_centers

bool

True

Patch non-polymer residues at symmetry centers that clash with themselves.

fix_arginines

bool

True

Fix arginine naming ambiguity.

fix_formal_charges

bool

True

Fix formal charges on atoms involved in inter-residue bonds.

convert_mse_to_met

bool

False

Convert selenomethionine (MSE) to methionine (MET).

remove_hydrogens

bool or None

None

Remove hydrogens from structure. Deprecated; use hydrogen_policy instead.

hydrogen_policy

“keep” / “remove” / “infer”

“keep”

Whether to keep, remove, or infer hydrogens.

model

int or None

None

Model number for NMR entries.

build_assembly

“first” / “all” / list[str] / tuple[str] / None

“all”

Which assembly to build: None, “first”, “all”, or specific IDs.

extra_fields

list[str] / “all” / None

None

Extra fields to include in the AtomArrayStack.

Caching Arguments#

Name

Type

Default

Description

load_from_cache

bool

False

Load pre-compiled results from cache.

save_to_cache

bool

False

Save parsed results to cache for faster future retrieval.

cache_dir

PathLike / None

None

Directory for cached results. Required if caching is enabled.