Challenges in deriving high-confidence protein identifications from data gathered by a HUPO plasma proteome collaborative study

States, David J; Omenn, Gilbert S; Blackwell, Thomas W; Fermin, Damian; Eng, Jimmy; Speicher, David W; Hanash, Samir M

doi:10.1038/nbt1183

Analysis
Published: 08 March 2006

Challenges in deriving high-confidence protein identifications from data gathered by a HUPO plasma proteome collaborative study

David J States¹,
Gilbert S Omenn¹,
Thomas W Blackwell¹,
Damian Fermin¹,
Jimmy Eng²,
David W Speicher³ &
…
Samir M Hanash^1,2

Nature Biotechnology volume 24, pages 333–338 (2006)Cite this article

1833 Accesses
260 Citations
6 Altmetric
Metrics details

Abstract

The Human Proteome Organization (HUPO) recently completed the first large-scale collaborative study to characterize the human serum and plasma proteomes. The study was carried out in different locations and used diverse methods and instruments to compare and integrate tandem mass spectrometry (MS/MS) data on aliquots of pooled serum and plasma from healthy subjects. Liquid chromatography (LC)-MS/MS data sets from 18 laboratories were matched to the International Protein Index database, and an initial integration exercise resulted in 9,504 proteins identified with one or more peptides, and 3,020 proteins identified with two or more peptides. This article uses a rigorous statistical approach to take into account the length of coding regions in genes, and multiple hypothesis-testing techniques. On this basis, we now present a reduced set of 889 proteins identified with a confidence level of at least 95%. We also discuss the importance of such an integrated analysis in providing an accurate representation of a proteome as well as the value such data sets contain for the high-confidence identification of protein matches to novel exons, some of which may be localized in alternatively spliced forms of known plasma proteins and some in previously nonannotated gene sequences.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Distribution of protein identifications.**

**Figure 2: Number of peptides identified as a function of protein concentration.**

**Figure 3: Distribution of peptides identified for β-2-glycoprotein 1.**

**Figure 4: Bar plot of the distribution of ORFs types by gene.**

**Figure 5: Novel ORFs in the *APOE* gene.**

Tutorial: best practices and considerations for mass-spectrometry-based protein biomarker discovery and validation

Article 09 July 2021

Oxonium ion scanning mass spectrometry for large-scale plasma glycoproteomics

Article Open access 20 July 2023

IceR improves proteome coverage and data completeness in global and single-cell proteomics

Article Open access 09 August 2021

References

Mann, M. Mass spectrometry-based proteomics. Nature 422, 198–207 (2003).
Article Google Scholar
Sadygov, R., Cociorva, D. & Yates, J.R. Large-scale database searching using tandem mass spectra: looking up the answer in the back of the book. Nat. Methods 1, 195–202 (2004).
Article CAS Google Scholar
Olsen, J. & Mann, M. Improved peptide identification in proteomics by two consecutive stages of mass spectrometric fragmentation. Proc. Natl. Acad. Sci. USA 101, 13417–13422 (2004).
Article CAS Google Scholar
Orchard, S., Hermjakob, H. & Apweiler, R. Annotating the human proteome. Mol. Cell. Proteomics 4, 435–440 (2005).
Article CAS Google Scholar
Hanash, S. & Celis, J.E. The human proteome organization: a mission to advance proteome knowledge. Mol. Cell. Proteomics 1, 413–414 (2002).
Article CAS Google Scholar
Omenn, G.S. The Human Proteome Organization plasma proteome project pilot phase: reference specimens, technology platform comparisons, and standardized data submissions and analyses. Proteomics 4, 1235–1240 (2004).
Article CAS Google Scholar
Omenn, G.S. et al. Overview of the HUPO Plasma Proteome Project: Results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core dataset of 3020 proteins and a publicly-available database. Proteomics 5, 3226–3245 (2005).
Article CAS Google Scholar
Kersey, P. et al. The International Protein Index: an integrated database for proteomics experiments. Proteomics 4, 1985–1988 (2004).
Article CAS Google Scholar
Adamski, M. et al. Data management and preliminary data analysis in the pilot phase of the HUPO Plasma Proteome Project. Proteomics 5, 3246–3261 (2005).
Article CAS Google Scholar
Carr, S. et al. The need for guidelines in publication of peptide and protein identification data. Mol. Cell. Proteomics 3, 531–533 (2004).
Article CAS Google Scholar
Cargile, B.J., Bundy, J.L. & Stephenson, J.L. Potential for false positive identifications from large databases through tandem mass spectrometry. J. Proteome Res. 3, 1082–1085 (2004).
Article CAS Google Scholar
Eriksson, J. & Fenyo, D. Protein identification in complex mixtures. J. Proteome Res. 4, 387–393 (2005).
Article CAS Google Scholar
Fenyo, D. & Beavis, R.C. A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. Anal. Chem. 75, 768–774 (2003).
Article Google Scholar
Keller, A., Nesvizhskii, A.I., Kolker, E. & Aebersold, R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383–5392 (2002).
Article CAS Google Scholar
Nesvizhskii, A.I., Keller, A., Kolker, E. & Aebersold, R. A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem. 75, 4646–4658 (2003).
Article CAS Google Scholar
Sadygov, R.G. & Yates, J.R. A hypergeometric probability model for protein identification and validation using tandem mass spectral data and protein sequence databases. Anal. Chem. 75, 3792–3798 (2003).
Article CAS Google Scholar
Shen, Y. et al. Ultra-high-efficiency strong cation exchange LC/RPLC/MS/MS for high dynamic range characterization of the human plasma proteome. Anal. Chem. 76, 1134–1144 (2004).
Article CAS Google Scholar
Perkins, D.N., Pappin, D.J., Creasy, D.M. & Cottrell, J.S. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567 (1999).
Article CAS Google Scholar
Beer, I., Barnea, E., Ziv, T. & Admon, A. Improving large-scale proteomics by clustering of mass spectrometry data. Proteomics 4, 950–960 (2004).
Article CAS Google Scholar
Eng, J.K., McCormack, A.L. & Yates, J.R.I. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976–989 (1994).
Article CAS Google Scholar
Haab, B.B. et al. Immunoassay and antibody microarray analysis of the HUPO reference specimens: systematic variation between sample types and calibration of mass spectrometry data. Proteomics 5, 3278–3291 (2005).
Article CAS Google Scholar
Ishihama, Y. et al. Exponentially modified protein abundance index (emPAI) for estimation of absolute protein amount in proteomics by the number of sequenced peptides per protein. Mol. Cell. Proteomics 4, 1265–1272 (2005).
Article CAS Google Scholar
O'Brien, T.J. et al. The CA 125 gene: an extracellular superstructure dominated by repeat sequences. Tumour Biol. 22, 348–366 (2001).
Article CAS Google Scholar
Bendtsen, J.D., Nielsen, H., vonHeijne, G. & Brunak, S. Improved predication of signal peptides: SignalP 3.0. J. Mol. Biol. 340, 783–795 (2004).
Article Google Scholar
Miyakis, S., Giannakopoulos, B. & Krilis, S.A. Beta 2 glycoprotein I–function in health and disease. Thromb. Res. 114, 335–346 (2004).
Article CAS Google Scholar
Tang, H.Y. et al. A novel four-dimensional strategy combining protein and peptide separation methods enables detection of low-abundance proteins in human plasma and serum proteomes. Proteomics 5, 3329–3342 (2005).
Article CAS Google Scholar
Wang, H. et al. Intact-protein based high-resolution three-dimensional quantitative analysis system for proteome profiling of biological fluids. Mol. Cell. Proteomics 4, 618–625 (2005).
Article CAS Google Scholar
Misek, D.E. et al. A wide range of protein isoforms in serum and plasma uncovered by a quantitative Intact Protein Analysis System (IPAS). Proteomics 5, 3343–3351 (2005).
Article CAS Google Scholar
Choudhary, J.S., Blackstock, W.P., Creasy, D.M. & Cottrell, J.S. Interrogating the human genome using uninterpreted mass spectrometry data. Proteomics 1, 651–667 (2001).
Article CAS Google Scholar
Kuster, B., Mortensen, P., Andersen, J.S. & Mann, M. Mass spectrometry allows direct identification of proteins in large genomes. Proteomics 1, 641–650 (2001).
Article CAS Google Scholar
Kreahling, J. & Graveley, B.R. The origins and implications of Alternative splicing. Trends Genet. 20, 1–4 (2004).
Article CAS Google Scholar
Link, A.J. et al. Direct analysis of protein complexes using mass spectrometry. Nat. Biotechnol. 17, 676–682 (1999).
Article CAS Google Scholar
Liu, H., Sadygov, R.G. & Yates, J.R. A model for random sampling and estimation of relative protein abundance in shotgun proteomics. Anal. Chem. 76, 4193–4201 (2004).
Article CAS Google Scholar
Washburn, M.P., Wolters, D. & Yates, J.R. Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat. Biotechnol. 19, 242–247 (2001).
Article CAS Google Scholar
Ghaemmaghami, S. et al. Global analysis of protein expression in yeast. Nature 425, 737–741 (2003).
Article CAS Google Scholar
Anderson, N.L. et al. The human plasma proteome: a nonredundant list developed by combination of four separate sources. Mol. Cell. Proteomics 3, 311–316 (2004).
Article CAS Google Scholar
Chan, K.C. et al. Analysis of the human serum proteome. Clin. Proteomics 1, 101–225 (2004).
Article Google Scholar
Zhou, M. et al. An investigation in the human serum “interactome”. Electrophoresis 25, 1289–1298 (2004).
Article CAS Google Scholar
Jaffe, J.D., Berg, H.C. & Church, G.M. Proteogenomic mapping as a complementary method to perform genome annotation. Proteomics 4, 59–77 (2004).
Article CAS Google Scholar
Oyama, M. et al. Analysis of small human proteins reveals the translation of upstream open reading frames of mRNAs. Genome Res. 14, 2048–2052 (2004).
Article CAS Google Scholar

Download references

Acknowledgements

The collaborative HUPO Plasma Protein study and the data analysis presented here have been supported by a trans-National Institutes of Health grant supplement 84982 administered by the National Cancer Institute, by pharmaceutical and technology company sponsors and by voluntary efforts of collaborating laboratories.

Author information

Authors and Affiliations

University of Michigan, 100 Washtenaw Rd., Palmer Commons 2035B, Ann Arbor, 48109, Michigan, USA
David J States, Gilbert S Omenn, Thomas W Blackwell, Damian Fermin & Samir M Hanash
Fred Hutchinson Cancer Research Center, 1100 Fairview Ave. N., PO Box 19024, Seattle, 98109, Washington, USA
Jimmy Eng & Samir M Hanash
The Wistar Institute, 3601 Spruce St., Philadelphia, 19104, Pennsylvania, USA
David W Speicher

Authors

David J States
View author publications
You can also search for this author in PubMed Google Scholar
Gilbert S Omenn
View author publications
You can also search for this author in PubMed Google Scholar
Thomas W Blackwell
View author publications
You can also search for this author in PubMed Google Scholar
Damian Fermin
View author publications
You can also search for this author in PubMed Google Scholar
Jimmy Eng
View author publications
You can also search for this author in PubMed Google Scholar
David W Speicher
View author publications
You can also search for this author in PubMed Google Scholar
Samir M Hanash
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Samir M Hanash.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Fig. 1

Accrual of identifications as a function of sampling. (PDF 20 kb)

Supplementary Fig. 2

Complement component 3 isoforms. (PDF 20 kb)

Supplementary Table 1

Numbers of protein identificaitons by specifmen and by methodologies applied in individual laboratories. (PDF 90 kb)

Supplementary Table 2

List of high-confidence protein identifications. (PDF 116 kb)

Supplementary Table 3

Intragenic peptides not in an annotated exon. (PDF 15 kb)

Supplementary Notes (PDF 25 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

States, D., Omenn, G., Blackwell, T. et al. Challenges in deriving high-confidence protein identifications from data gathered by a HUPO plasma proteome collaborative study. Nat Biotechnol 24, 333–338 (2006). https://doi.org/10.1038/nbt1183

Download citation

Published: 08 March 2006
Issue Date: 01 March 2006
DOI: https://doi.org/10.1038/nbt1183

This article is cited by

A new estimation of protein-level false discovery rate
- Guanying Wu
- Xiang Wan
- Baohua Xu
BMC Genomics (2018)
Innovative methods for biomarker discovery in the evaluation and development of cancer precision therapies
- Ijeoma Adaku Umelo
- Brunella Costanza
- Vincent Castronovo
Cancer and Metastasis Reviews (2018)
Quantitative, multiplexed workflow for deep analysis of human blood plasma and biomarker discovery by mass spectrometry
- Hasmik Keshishian
- Michael W Burgess
- Steven A Carr
Nature Protocols (2017)
Characterisation of the circulating acellular proteome of healthy sheep using LC-MS/MS-based proteomics analysis of serum
- Saul Chemonges
- Rajesh Gupta
- Pawel Sadowski
Proteome Science (2016)
The ever-expanding myokinome: discovery challenges and therapeutic implications
- Martin Whitham
- Mark A. Febbraio
Nature Reviews Drug Discovery (2016)

Challenges in deriving high-confidence protein identifications from data gathered by a HUPO plasma proteome collaborative study

Abstract

Access options

Similar content being viewed by others

Tutorial: best practices and considerations for mass-spectrometry-based protein biomarker discovery and validation

Oxonium ion scanning mass spectrometry for large-scale plasma glycoproteomics

IceR improves proteome coverage and data completeness in global and single-cell proteomics

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Supplementary Fig. 1

Supplementary Fig. 2

Supplementary Table 1

Supplementary Table 2

Supplementary Table 3

Supplementary Notes (PDF 25 kb)

Rights and permissions

About this article

Cite this article

This article is cited by

A new estimation of protein-level false discovery rate

Innovative methods for biomarker discovery in the evaluation and development of cancer precision therapies

Quantitative, multiplexed workflow for deep analysis of human blood plasma and biomarker discovery by mass spectrometry

Characterisation of the circulating acellular proteome of healthy sheep using LC-MS/MS-based proteomics analysis of serum

The ever-expanding myokinome: discovery challenges and therapeutic implications

Search

Quick links

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links