This function assembles a protein specification from several FASTA files.
It converts the amino acid sequence stored in each FASTA file to a
molecular formula. If the protein has multiple chains, each chain must be
specified with its own header line, i.e., a line starting with >
. For
each chain, a water molecule is added to the formula.
define_proteins(..., .disulfides = 0L)
... | FASTA files containing protein sequences. Argument names will be used as protein names. |
---|---|
.disulfides | A vector describing the number of disulfide bridges, recycled to the number of the given FASTA files. For each disulfide bridge, two hydrogen atoms are subtracted from the molecular formula. |
A data frame that describes one protein per row and comprises three columns:
protein_data
Nested column storing the FASTA file name
(file
) and number of disulfides (disulfides
) used for calculating its
formula.
protein_formula
Formula calculated from its sequence.
mab_sequence <- system.file("extdata", "mab_sequence.fasta", package = "fragquaxi") define_proteins(mab = mab_sequence)#> # A tibble: 1 x 3 #> protein_name protein_data protein_formula #> <chr> <list> <mol> #> 1 mab <tibble [1 × 2]> C6464 H9982 N1706 O2014 S44define_proteins(mab_sequence, .disulfides = 16)#> # A tibble: 1 x 3 #> protein_name protein_data protein_formula #> <int> <list> <mol> #> 1 1 <tibble [1 × 2]> C6464 H9950 N1706 O2014 S44