Pipelines#
This module contains pipeline implementations for different molecular structure prediction tasks.
AF3 Pipeline#
- atomworks.ml.pipelines.af3.build_af3_transform_pipeline(*, is_inference: bool, protein_msa_dirs: list[dict], rna_msa_dirs: list[dict], n_recycles: int = 5, crop_size: int = 384, crop_center_cutoff_distance: float = 15.0, crop_contiguous_probability: float = 0.5, crop_spatial_probability: float = 0.5, max_atoms_in_crop: int | None = None, undesired_res_names: list[str] = ['144', '15P', '1PE', '2F2', '2JC', '3HR', '3SY', '7N5', '7PE', '9JE', 'AAE', 'ABA', 'ACE', 'ACN', 'ACT', 'ACY', 'AZI', 'BAM', 'BCN', 'BCT', 'BDN', 'BEN', 'BME', 'BO3', 'BTB', 'BTC', 'BU1', 'C8E', 'CAD', 'CAQ', 'CBM', 'CCN', 'CIT', 'CL', 'CLR', 'CM', 'CMO', 'CO3', 'CPT', 'CXS', 'D10', 'DEP', 'DIO', 'DMS', 'DN', 'DOD', 'DOX', 'EDO', 'EEE', 'EGL', 'EOH', 'EOX', 'EPE', 'ETF', 'FCY', 'FJO', 'FLC', 'FMT', 'FW5', 'GOL', 'GSH', 'GTT', 'GYF', 'HED', 'IHP', 'IHS', 'IMD', 'IOD', 'IPA', 'IPH', 'LDA', 'MB3', 'MEG', 'MES', 'MLA', 'MLI', 'MOH', 'MPD', 'MRD', 'MSE', 'MYR', 'N', 'NA', 'NH2', 'NH4', 'NHE', 'NO3', 'O4B', 'OHE', 'OLA', 'OLC', 'OMB', 'OME', 'OXA', 'P6G', 'PE3', 'PE4', 'PEG', 'PEO', 'PEP', 'PG0', 'PG4', 'PGE', 'PGR', 'PLM', 'PO4', 'POL', 'POP', 'PVO', 'SAR', 'SCN', 'SEO', 'SIN', 'SO4', 'SPD', 'SPM', 'SR', 'STE', 'STO', 'STU', 'TAR', 'TBU', 'TME', 'TRS', 'UNK', 'UNL', 'UNX', 'UPL', 'URE'], conformer_generation_timeout: float = 5.0, use_element_for_atom_names_of_atomized_tokens: bool = False, max_n_template: int = 20, n_template: int = 4, template_max_seq_similarity: float = 60.0, template_min_seq_similarity: float = 10.0, template_min_length: int = 10, template_allowed_chain_types: list[ChainType] = [ChainType.POLYPEPTIDE_L, ChainType.RNA], template_distogram_bins: Tensor = tensor([3.2500, 4.5338, 5.8176, 7.1014, 8.3851, 9.6689, 10.9527, 12.2365, 13.5203, 14.8041, 16.0878, 17.3716, 18.6554, 19.9392, 21.2230, 22.5068, 23.7905, 25.0743, 26.3581, 27.6419, 28.9257, 30.2095, 31.4932, 32.7770, 34.0608, 35.3446, 36.6284, 37.9122, 39.1959, 40.4797, 41.7635, 43.0473, 44.3311, 45.6149, 46.8986, 48.1824, 49.4662, 50.7500]), template_default_token: str = '<G>', template_lookup_path: PathLike | None = None, template_base_dir: PathLike | None = None, max_msa_sequences: int = 10000, n_msa: int = 10000, dense_msa: bool = True, msa_cache_dir: PathLike | str | None = None, sigma_data: float = 16.0, diffusion_batch_size: int = 48, run_confidence_head: bool = False, return_atom_array: bool = True, pad_dna_p_skip: float = 0.0, b_factor_min: float | None = None, b_factor_max: float | None = None) Transform [source]#
Build the AF3 pipeline with specified parameters.
This function constructs a pipeline of transforms for processing protein structures in a manner similar to AlphaFold 3. The pipeline includes steps for removing hydrogens, adding annotations, atomizing residues, cropping, adding templates, encoding features, and generating reference molecule features.
- Parameters:
crop_size (int, optional) – The size of the crop. Defaults to 384.
crop_center_cutoff_distance (float, optional) – The cutoff distance for spatial cropping. Defaults to 15.0.
crop_contiguous_probability (float, optional) – The probability of using contiguous cropping. Defaults to 0.5.
crop_spatial_probability (float, optional) – The probability of using spatial cropping. Defaults to 0.5.
conformer_generation_timeout (float, optional) – The timeout for conformer generation in seconds. Defaults to 10.0.
- Returns:
A composed pipeline of transforms.
- Return type:
- Raises:
AssertionError – If the crop probabilities do not sum to 1.0, if the crop size is not positive,
or if the crop center cutoff distance is not positive. –
Note
The cropping method is chosen randomly based on the provided probabilities. The pipeline includes steps for processing the structure, adding annotations, and generating features required for AF3-like predictions.
References
AlphaFold 3 Supplementary Information. https://static-content.springer.com/esm/art%3A10.1038%2Fs41586-024-07487-w/MediaObjects/41586_2024_7487_MOESM1_ESM.pdf