Proteins, which are the workhorses of the cell, are made up of long, interconnected strings of amino acids that fold into a wide variety of 3D shapes. Understanding the precise shape of a protein facilitates efforts to figure out its function, its potential role in a disease, and even how to target it with therapies. To gain such understanding, researchers often try to predict a protein’s precise 3D chemical structure using basic principles of physics—including quantum mechanics. But while nature does this in real time zillions of times a day, computational approaches have not been able to do this—until now.
Of the roughly 170,000 proteins mapped so far, most have had their structures deciphered using powerful imaging techniques such as x-ray crystallography and cryo–electron microscopy (cryo-EM). But researchers estimate that there are at least 200 million proteins in nature, and, as amazing as these imaging techniques are, they are laborious, and it can take many months or years to solve 3D structure of a single protein. So, a breakthrough certainly was needed.
In 2020, researchers with the company Deep Mind, London, developed an artificial intelligence (AI) program that rapidly predicts most protein structures as accurately as x-ray crystallography and cryo-EM can map them. The AI program, called AlphaFold, predicts a protein’s structure by computationally modeling the amino acid interactions that govern its 3D shape.
Getting there wasn’t easy. While a complete de novo calculation of protein structure still seemed out of reach, investigators reasoned that they could kick start the modeling if known structures were provided as a training set to the AI program. Using a computer network built around 128 machine learning processors, the AlphaFold system was created by first focusing on the 170,000 proteins with known structures in a reiterative process called deep learning. The process, which is inspired by the way neural networks in the human brain process information, enables computers to look for patterns in large collections of data. In this case, AlphaFold learned to predict the underlying physical structure of a protein within a matter of days. This breakthrough has the potential to accelerate the fields of structural biology and protein research, fueling progress throughout the sciences.
On top of all that’s impacting protein synthesis at the genomics level, translated proteins undergo yet another set of regulations called post-translational modificationsExit Disclaimer (PTMs). PTMs transform a newly-formed protein into a fully “decorated” mature entity, dramatically changing its biological function in response to a cell’s needs. These mature proteins can then perform a myriad of tasks needed by the cell, such as moving about the cell or activating a signaling cascade.
Combining information gathered from genomics with transcriptomics and proteomics, a strategy we refer to as proteogenomics, allows researchers to take a more comprehensive look at each alteration of a tumor, including the identification, localization, and functional analysis of resultant proteins and their relationship to the larger tumor environment.
Proteogenomics works to enhance the cancer genome biology by helping prioritize genomic alterations, subtyping tumors with proteomic features, illuminating alterations to PTMs responsible for the dysregulation of cancer signaling networks, and improving the understanding of drug response and resistance to therapies.
NCI’s CPTAC program does just that, gleaning powerful information on cancer development and progression. Through the study of the tumor micro-environment and immune landscape, CPTAC researchers have been able to proteogenomically characterize numerous cancer types using proteomics, phosphoproteomics, methylomics, acetylomics and glycomics analysis in conjunction with the well-established sequencing approaches. Researchers have discovered new tumor subtypes, tumor micro-environment variations and new potential proteins for targeted drug therapy.