Proteogenomics, the integration of proteomics with genomics, is an emerging approach that promises to advance clinical & translational research. By combining genomic and proteomic information, leading scientists are gaining new insights due to a more complete and unified understanding of complex biological processes.
With the increased speed and reduced cost of genomic analyses, genomic data is more widely available than ever before. While this data is invaluable in providing an organism’s blueprint, not all DNA is transcribed, and the presence of RNA transcripts do not necessarily describe the abundance of the various proteins present in a biological system. For this reason, proteomics and genomics are complementary approaches where each provides data that educates the other.
Many technologies must come together to move proteogenomics-based research forward. Genomics data is commonly derived from whole genome, exome and RNA sequencers, expressed sequencing tags, and microarrays. Likewise, mass spectrometry (MS) is the key tool for the identification and quantification of proteins. After proteolytic digestion of the protein under study, peptide fragments are analyzed using MS to determine their sequence as well as their post-translational modifications (PTMs). These data are then used to mine genomic databases to identify the source proteins. Bioinformatics software next becomes essential in integrating the data obtained from both ‘omics approaches.
Growing Area of Research Interest
Research has begun to highlight the value of proteogenomics in revealing biological function and broadening understanding of basic biology. Recognizing this trend, Mary F. Lopez, COO of Nuclea Biotechnologies, Inc., cites three examples in GEN that illustrate how proteogenomics could, in her words, “provide better prospects for revealing biomarkers, assessing disease states, and identifying the complex mechanisms behind biological function.”
In their 2012 study published in Aging, scientists at the Thermo Scientific Biomarker Research Initiates in Mass Spectrometry Center (BRIMS) and the Buck Institute for Research in Aging combined genomics with proteomics to study the expression of histone-variant H2A and its impact on cellular aging. When the histone expression of normal cells was compared to those under genotoxic stress, the authors found that peptide sequences associated with H2A in the stressed cells were depleted compared to normal cells. However, PCR revealed that genes associated with H2A production were highly expressed in the stressed cells. Multiplexed SRM assays developed on a triple quadrupole mass spectrometer were performed to quantify histone H2A proteins. SRM assay development software was used to predict candidate peptides, select fragment ions, and build the instrument method. The study confirmed that DNA damage alters the expression of H2A histones, leading to their depletion in senescent cells, which might contribute to cellular aging.
Proteogenomics and Oncology Research
Scientists are relying on proteogenomics to determine relationships between cancer subtypes and response to treatments, and to identify biomarkers useful in early disease detection and prognosis. Pavlou et al. integrated gene expression and proteomics data to find biomarkers to predict risk for breast cancer recurrence in individuals with estrogen receptor (ER)-positive tumors. The authors analyzed publicly available gene expression data to find survival-associated genes and compared these with breast cancer proteome data to produce a list of candidate biomarkers. Mass spectrometry was used to identify and quantify proteins associated with ER-negative versus ER-positive tumors. Two proteins were found useful in discriminating between ER-positive patients with a high and low risk of cancer recurrence: fKaryopherin alpha 2 (KPNA2) and Cyclin-dependent kinase 1 (CDK1).
Identifying and classifying cancer subtypes using a combination of genetic and proteomic profiling may also result in better treatments and prognosis compared to “site-of-origin” based classifications. In her third example, Lopez points to research by Zhang et al . Using shotgun proteomics, the authors analyzed the proteomes of colon and rectal tumors previously characterized in the Cancer Genome Atlas (TCGA). They identified and validated 796 single amino-acid variants (SAAVs), corresponding to 64 somatic tumor variants.
The Future of Proteogenomics
Proteogenomic research has the power to reveal insights that could unlock the mysteries of complex biological processes. In the future, scientists are looking toward the additional integration of metabolomics data to produce an even more complete picture of an organism and its biological state. Bringing these disciplines together will require collaboration of scientists with a variety of ‘omics expertise, along with further advances in bioinformatics tools able to integrate large amounts of quantitative data with known biological pathways.
For more useful information about proteogenomics, see the resources below.