This function assembles a protein specification from several FASTA files. It converts the amino acid sequence stored in each FASTA file to a molecular formula. If the protein has multiple chains, each chain must be specified with its own header line, i.e., a line starting with >. For each chain, a water molecule is added to the formula.

define_proteins(..., .disulfides = 0L)

Arguments

...

FASTA files containing protein sequences. Argument names will be used as protein names.

.disulfides

A vector describing the number of disulfide bridges, recycled to the number of the given FASTA files. For each disulfide bridge, two hydrogen atoms are subtracted from the molecular formula.

Value

A data frame that describes one protein per row and comprises three columns:

protein_data

Nested column storing the FASTA file name (file) and number of disulfides (disulfides) used for calculating its formula.

protein_formula

Formula calculated from its sequence.

Examples

mab_sequence <- system.file("extdata", "mab_sequence.fasta", package = "fragquaxi") define_proteins(mab = mab_sequence)
#> # A tibble: 1 x 3 #> protein_name protein_data protein_formula #> <chr> <list> <mol> #> 1 mab <tibble [1 × 2]> C6464 H9982 N1706 O2014 S44
define_proteins(mab_sequence, .disulfides = 16)
#> # A tibble: 1 x 3 #> protein_name protein_data protein_formula #> <int> <list> <mol> #> 1 1 <tibble [1 × 2]> C6464 H9950 N1706 O2014 S44