FASTA Tools#
Convenience utils for working with (generalized) FASTA files.
- atomworks.io.tools.fasta.one_letter_to_ccd_code(seq: list[str], chain_type: ChainType, ccd_mirror_path: PathLike = None, check_ccd_codes: bool = True) list[str] [source]#
Convert a sequence of one-letter codes or parenthesized full CCD IDs to full CCD IDs.
This function takes a list of either one-letter amino acid codes or parenthesized CCD IDs and converts them to their corresponding full CCD (Chemical Component Dictionary) IDs. It handles both standard amino acids and non-standard chemical components.
- Parameters:
seq (list[str]) – A list of one-letter codes or parenthesized CCD IDs.
chain_type (ChainType) – The type of chain (e.g., POLYPEPTIDE_L, DNA, RNA) to determine the correct conversion for one-letter codes.
check_ccd_codes (bool) – If True, check if the CCD IDs are available in the CCD mirror.
- Returns:
A list of full CCD IDs corresponding to the input sequence.
- Return type:
list[str]
- Raises:
- ValueError – If a non-standard chemical component ID is not found in the processed CCD.
Example
>>> seq = ["A", "C", "(SEP)", "G", "H"] >>> chain_type = ChainType.POLYPEPTIDE_L >>> one_letter_to_ccd_code(seq, chain_type) ['ALA', 'CYS', 'SEP', 'GLY', 'HIS']
- atomworks.io.tools.fasta.split_generalized_fasta_sequence(sequence: str) list[str] [source]#
Splits a sequence at each letter, keeping groups with parentheses intact.
- Parameters:
sequence (-) – The input sequence to be split.
- Returns:
A list of individual letters and/or groups with parentheses.
- Return type:
List[str]
Example
>>> split_generalized_fasta_sequence("ABC(DEF)GH(IJ)K") ['A', 'B', 'C', '(DEF)', 'G', 'H', '(IJ)', 'K']