FASTA Tools#

Convenience utils for working with (generalized) FASTA files.

atomworks.io.tools.fasta.one_letter_to_ccd_code(seq: list[str], chain_type: ChainType, ccd_mirror_path: PathLike = None, check_ccd_codes: bool = True) list[str][source]#

Convert a sequence of one-letter codes or parenthesized full CCD IDs to full CCD IDs.

This function takes a list of either one-letter amino acid codes or parenthesized CCD IDs and converts them to their corresponding full CCD (Chemical Component Dictionary) IDs. It handles both standard amino acids and non-standard chemical components.

Parameters:
  • seq (list[str]) – A list of one-letter codes or parenthesized CCD IDs.

  • chain_type (ChainType) – The type of chain (e.g., POLYPEPTIDE_L, DNA, RNA) to determine the correct conversion for one-letter codes.

  • check_ccd_codes (bool) – If True, check if the CCD IDs are available in the CCD mirror.

Returns:

A list of full CCD IDs corresponding to the input sequence.

Return type:

  • list[str]

Raises:

- ValueError – If a non-standard chemical component ID is not found in the processed CCD.

Example

>>> seq = ["A", "C", "(SEP)", "G", "H"]
>>> chain_type = ChainType.POLYPEPTIDE_L
>>> one_letter_to_ccd_code(seq, chain_type)
['ALA', 'CYS', 'SEP', 'GLY', 'HIS']
atomworks.io.tools.fasta.split_generalized_fasta_sequence(sequence: str) list[str][source]#

Splits a sequence at each letter, keeping groups with parentheses intact.

Parameters:

sequence (-) – The input sequence to be split.

Returns:

A list of individual letters and/or groups with parentheses.

Return type:

  • List[str]

Example

>>> split_generalized_fasta_sequence("ABC(DEF)GH(IJ)K")
['A', 'B', 'C', '(DEF)', 'G', 'H', '(IJ)', 'K']