Laboratory of Computational Proteomics

Laboratory of Computational Proteomics
Research: Proteomics

Center for Health Informatics and Bioinformatics

High Performance Computing Facility

Identification of proteins and characterization of their post-translational modifications.

Mass spectrometry-based protein identification has become an invaluable tool for elucidating protein function, and several methods have been developed for protein identification, including sequence collection searching with masses of peptides or their fragments, spectral library searching, and de novo sequencing (Fig. 1). The first step in protein identification is to find peaks in the mass spectra that correspond to peptides and their fragments. It is important to find all the relevant peaks and at the same time minimizing the number of background peaks. This can be achieved by scanning the spectra for peaks of the expected width and selecting peaks above a signal to noise threshold, and then picking the monoisotopic peak for each isotope cluster. After picking the peaks, spectra with low information content that could not produce any meaningful results can be removed to increase the speed of subsequent analysis.

In all mass spectrometry-based identification methods, a score is calculated to quantify the match between the observed mass spectrum and the collection of possible sequences. These scores are highly dependent on the details of the algorithm used, and they are not always easy to interpret because the interpretation of the score depends on properties of the data and the search results. Therefore, it is desirable to convert the score to a measure that is easy to interpret, such as the probability that the result is random and false. For this conversion, the distribution of random and false scores is needed. Estimates of this distribution can be generated using either simulations, collecting statistics during the search, or direct calculations.

Figure 1. Mass spectrometry based workflows for protein identification : (a) searching a protein sequence collection with peptide mass information; (b) searching a protein sequence collection with peptide fragment mass information; (c) searching a spectrum library with peptide fragment mass information; (d) de novo sequencing.

References

D. Fenyö, J. Eriksson, R. Beavis, "Mass spectrometric protein identification using the global proteome machine", Methods Mol Biol 673 (2010) 189-202.

D. Fenyö, B.S.Phinney, R.C. Beavis, "Determining the Overall Merit of Protein Identification Data Sets: rho-Diagrams and rho-Scores". J Proteome Res 6 (2007) 1997-2004.

R. Craig, J.C. Cortens, D. Fenyö, R.C. Beavis, "Using Annotated Peptide Mass Spectrum Libraries for Protein Identification" Journal of Proteome Research, 5 (2006) 1843-1849.

J. Eriksson, D. Fenyö "Protein Identification in Complex Mixtures", J Proteome Research 4 (2005) 387-93.

D. Fenyö, R.C. Beavis, "A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes", Analytical Chemistry 75 (2003) 768-74.

H.I. Field, D. Fenyö, R.C. Beavis "RADARS, a bioinformatics solution that automates proteome mass spectral analysis, optimises protein identification, and archives data in a relational database", Proteomics 2 (2002) 36-47.

J. Eriksson, B.T. Chait, D. Fenyö "A statistical basis for testing the significance of mass spectrometric protein identification results" Anal Chem 72 (2000) 999-1005.

D. Fenyö, J. Qin, B.T. Chait "Protein Identification using Mass Spectrometric Information" Electrophoresis 19 (1998) 998-1005.

Quantitation of proteins and peptides

Mass spectrometry (MS)-based quantitative proteomics has been applied to solve a wide variety of biological problems, and several MS-based workflows have been developed for protein and peptide quantitation (Fig. 1). In mass spectrometric quantitation methods it is usually assumed that the measured signal has a linear dependence on the amount of material in the sample for the entire range of amounts being studied. A prerequisite for accurate quantitation is that unwanted experimental variations in sample extraction, preparation, and analysis be minimized, and it is therefore critical that each step in the workflow is optimized for reproducibility.

Figure 1. Workflows for mass spectrometry-based protein and peptide quantitation. (a) Metabolic labeling. (b) Protein labeling. (c) Chimeric recombinant protein labeling. (d) Peptide labeling. (e) Isobaric peptide labeling. (f) Synthetic peptide labeling. (g) Label-free quantitation (intensity of precursor ions). (h) Label-free quantitation (standard curve). (i) Label-free quantitation (intensity of fragment ions).

When quantitation of proteins in complex samples is based on the intensity of peptide precursor and fragment ions, interference can distort the measurements. It is important to detect and correct for these interferences. We used computer simulations as a tool to investigate the feasibility of correction for interference in MRM analyses. In our simulations, it was assumed that the expected relative intensity of the transitions for a peptide is known. Hypothetical interference was added to one or more transitions, and random noise was added to all transitions. The distribution of the noise was obtained from repeated measurements. Interference was detected by measuring the deviation of the intensity ratios of transitions from the expected ratios, and detecting outliers. The transitions with interference were removed and the peptide quantity was calculated using only the transitions without interference.

Figure 2. Correction for interference. The effect of using different interference detection thresholds. Panels A and B: The corrected relative error in the quantitation as a function of the relative error before correction. Panels: C and D: The distribution of the corrected error in quantitation for relative error ranges of 0.3-0.7 and 1.3-1.7, respectively.

References

G. Zhang, B.M. Ueberheide, S. Waldemarson, S. Myung, K. Molloy, J. Eriksson, B.T. Chait, T.A. Neubert, D. Fenyö, "Protein quantitation using mass spectrometry", Methods Mol Biol 673 (2010) 211-22.

G. Zhang, D. Fenyö, T.A. Neubert, "Evaluation of the Variation in Sample Preparation for Comparative Proteomics Using Stable Isotope Labeling by Amino Acids in Cell Culture", J Proteome Res 8 (2009) 1285-1292.