Pipelines#

This module contains pipeline implementations for different molecular structure prediction tasks.

AF3 Pipeline#

atomworks.ml.pipelines.af3.build_af3_transform_pipeline(*, is_inference: bool, protein_msa_dirs: list[dict], rna_msa_dirs: list[dict], n_recycles: int = 5, crop_size: int = 384, crop_center_cutoff_distance: float = 15.0, crop_contiguous_probability: float = 0.5, crop_spatial_probability: float = 0.5, max_atoms_in_crop: int | None = None, undesired_res_names: list[str] = ['144', '15P', '1PE', '2F2', '2JC', '3HR', '3SY', '7N5', '7PE', '9JE', 'AAE', 'ABA', 'ACE', 'ACN', 'ACT', 'ACY', 'AZI', 'BAM', 'BCN', 'BCT', 'BDN', 'BEN', 'BME', 'BO3', 'BTB', 'BTC', 'BU1', 'C8E', 'CAD', 'CAQ', 'CBM', 'CCN', 'CIT', 'CL', 'CLR', 'CM', 'CMO', 'CO3', 'CPT', 'CXS', 'D10', 'DEP', 'DIO', 'DMS', 'DN', 'DOD', 'DOX', 'EDO', 'EEE', 'EGL', 'EOH', 'EOX', 'EPE', 'ETF', 'FCY', 'FJO', 'FLC', 'FMT', 'FW5', 'GOL', 'GSH', 'GTT', 'GYF', 'HED', 'IHP', 'IHS', 'IMD', 'IOD', 'IPA', 'IPH', 'LDA', 'MB3', 'MEG', 'MES', 'MLA', 'MLI', 'MOH', 'MPD', 'MRD', 'MSE', 'MYR', 'N', 'NA', 'NH2', 'NH4', 'NHE', 'NO3', 'O4B', 'OHE', 'OLA', 'OLC', 'OMB', 'OME', 'OXA', 'P6G', 'PE3', 'PE4', 'PEG', 'PEO', 'PEP', 'PG0', 'PG4', 'PGE', 'PGR', 'PLM', 'PO4', 'POL', 'POP', 'PVO', 'SAR', 'SCN', 'SEO', 'SIN', 'SO4', 'SPD', 'SPM', 'SR', 'STE', 'STO', 'STU', 'TAR', 'TBU', 'TME', 'TRS', 'UNK', 'UNL', 'UNX', 'UPL', 'URE'], conformer_generation_timeout: float = 5.0, use_element_for_atom_names_of_atomized_tokens: bool = False, max_n_template: int = 20, n_template: int = 4, template_max_seq_similarity: float = 60.0, template_min_seq_similarity: float = 10.0, template_min_length: int = 10, template_allowed_chain_types: list[ChainType] = [ChainType.POLYPEPTIDE_L, ChainType.RNA], template_distogram_bins: Tensor = tensor([3.2500, 4.5338, 5.8176, 7.1014, 8.3851, 9.6689, 10.9527, 12.2365, 13.5203, 14.8041, 16.0878, 17.3716, 18.6554, 19.9392, 21.2230, 22.5068, 23.7905, 25.0743, 26.3581, 27.6419, 28.9257, 30.2095, 31.4932, 32.7770, 34.0608, 35.3446, 36.6284, 37.9122, 39.1959, 40.4797, 41.7635, 43.0473, 44.3311, 45.6149, 46.8986, 48.1824, 49.4662, 50.7500]), template_default_token: str = '<G>', template_lookup_path: PathLike | None = None, template_base_dir: PathLike | None = None, max_msa_sequences: int = 10000, n_msa: int = 10000, dense_msa: bool = True, msa_cache_dir: PathLike | str | None = None, sigma_data: float = 16.0, diffusion_batch_size: int = 48, run_confidence_head: bool = False, return_atom_array: bool = True, pad_dna_p_skip: float = 0.0, b_factor_min: float | None = None, b_factor_max: float | None = None) Transform[source]#

Build the AF3 pipeline with specified parameters.

This function constructs a pipeline of transforms for processing protein structures in a manner similar to AlphaFold 3. The pipeline includes steps for removing hydrogens, adding annotations, atomizing residues, cropping, adding templates, encoding features, and generating reference molecule features.

Parameters:
  • crop_size (int, optional) – The size of the crop. Defaults to 384.

  • crop_center_cutoff_distance (float, optional) – The cutoff distance for spatial cropping. Defaults to 15.0.

  • crop_contiguous_probability (float, optional) – The probability of using contiguous cropping. Defaults to 0.5.

  • crop_spatial_probability (float, optional) – The probability of using spatial cropping. Defaults to 0.5.

  • conformer_generation_timeout (float, optional) – The timeout for conformer generation in seconds. Defaults to 10.0.

Returns:

A composed pipeline of transforms.

Return type:

Transform

Raises:
  • AssertionError – If the crop probabilities do not sum to 1.0, if the crop size is not positive,

  • or if the crop center cutoff distance is not positive.

Note

The cropping method is chosen randomly based on the provided probabilities. The pipeline includes steps for processing the structure, adding annotations, and generating features required for AF3-like predictions.

References

RF2AA Pipeline#