Human Genome Project for Proteins with Neil Kelleher, PhD
Millions of molecular proteins are swimming through our body's cells and many studies have discovered that these proteins are the main drivers of all human diseases. Scientists are now mapping proteins the way the Human Genome Project mapped genes. Neil Kelleher, PhD, director of Northwestern Proteomics, is at the forefront of the Human Proteoform Project and explains how it could lead to more targeted and effective diagnostics and treatments for diseases.
"Proteins are a critical connection between the genome and our diseases. All human diseases involve proteoform biology, so therefore, proteoform measurement is the key linkage between our genome and our blueprints of life and then actually living life."
- Director of Chemistry of Life Processes Institute
- Director of Northwestern Proteomics
- Professor of Weinberg College of Arts and Sciences
- Professor of Medicine in the Division of Hematology and Oncology and Biochemistry and Molecular Genetics
- Member of Northwestern University Clinical and Translational Sciences Institute
- Member of the Robert H. Lurie Comprehensive Cancer Center
- Member of the Simpson Querrey Institute for Epigenetics
A recent study published in Science Advanceshe large undertaking will characterize known proteoforms (specific protein molecules) and aims to systematically discover and analyze new ones in human tissues, cells and fluids.
- For precision genomics, regenerative medicine and all goals to improve human health in the next century, Kelleher says we require more knowledge about proteins. Kelleher thinks scientists will begin mapping proteoforms at a more rapid pace over the next five to 10 years to better understand human disease and biology.
In the short term the team plans to work with the U.S. government and other public sectors to launch the project before, much like the Human Genome Project, they seek contributions from the private sector.
Kelleher details projects he is working on in collaboration with Feinberg scientists in the areas of neurology, neurodegeneration, cardiology and immunobiology. In his role as director of Chemistry of Life Processes Institute, he strives to help basic scientists on the Evanston campus understand unmet clinical needs at the medical school.
Additional Reading & Resources:
- "Proteoform: a single term describing protein complexity" in Nature Methods
- All publications from Kelleher's lab
- Video featuring Kelleher's lab on the Chicago campus
Subscribe to Feinberg School of Medicine podcasts here:
Recorded on Nov. 17, 2021.
Erin Spain, MS: This is Breakthroughs, a podcast from Northwestern University Feinberg School of Medicine. I'm Erin Spain, host of the show. Millions of molecular proteins are swimming through our body's cells. And many studies have discovered that these proteins are the main drivers of all human diseases. Scientists are now mapping proteins the way the Human Genome Project mapped genes. Northwestern's Neil Kelleher is at the forefront of the Human Proteome Form project and is here to explain how it could lead to more targeted and effective diagnostics and treatments for diseases and how proteoforms research at Feinberg is expanding. Welcome to the show, Dr. Kelleher.
Neil Kelleher, PhD: Thanks, Erin. Great to be with you.
Erin Spain, MS: So explain exactly what a protein is, and what the proteoforms is and the critical roles that proteins play in our bodies.
Neil Kelleher, PhD: So it turns out that the genes are the instructions for how to make proteins in our bodies. And if you just point to anywhere in your body, you'll be pointing to proteins, your hair, your eyeballs, your bones. I mean, we are built up of proteins, and therefore it shouldn't surprise anybody that they are central to when we get disease. They're mediated by proteins. They're, proteins are going wrong, especially in neurodegeneration. So like Alzheimer's, Parkinson's and ALS, or Lou Gehrig's disease, it's just, these are proteinopathies, people even call them at the molecular level. Proteins are strings of amino acids, and they can be extremely long. They can be 500 amino acids long, a thousand. Some of the biggest genes, like titin, is a gene that it's one of the biggest ones. But then what happens is they fold up and they get decorated. That's a protein molecule. And you know, a lot of people have heard about Alpha Fold or Google's collaborative response to fold them up, but you cannot predict all the decorations. And that's the molecular decorations. That is what can turn an enzyme on or off, which an enzyme is a kind of protein. So these decorations are absolutely critical to understand how we work at the molecular level, at the protein level for human biology.
Erin Spain, MS: So you just published a paper in Science Advances with your coauthors defining the Human Proteome Form project. Tell me about that paper. Why is it so important?
Neil Kelleher, PhD: The Science Advances paper that came out on November 12. That was years in the making, and we at the Consortium for Top-Down Proteomics, which is anchored by Northwestern and our proteoforms informatics team, is top shelf. We have a lot of great, you know, 60 people at Northwestern Proteomics, and we've for years anchored the consortium and brought more people in. And it is, there is a crescendo happening, Erin. You're absolutely right. People all across sectors, government, private sector academics especially are now jumping on, and it's really exciting to see. And this paper is, was, designed as a watershed moment. It's like here would be the framework, and it outlines the case for, you know, the proteoforms are involved in all human disease. The proteome is comprised of proteoforms. Let's get back to basics, and proteins are a critical connection between the genome and our diseases. All human diseases involve proteoform biology, so therefore proteoform measurement is the key linkage between our genome and our blueprints of life and then actually living life. And this will connect up what is absolutely required to get more precise about all our biology, which means better therapeutics, better process of creating drugs to intervene, more precise ways to intervene, also informed by precision genomics and regenerative medicine. All the goals for our century require this as the next obvious step, and that's what the paper was meant to do. And we we've sort of kept our powder dry for some years, but this is the decade. There's a question why now? And there's a lot of readiness across sectors like the private sector just put in almost three billion by some counts of investment for already next generation proteomics of single molecule, single cell proteomics. These things are moving aggressively, and so there is a role for a government-funded project to fill that the reference proteome will be available, and we can all work in parallel to accomplish what took 20 years in the Genome Project, to do the project, and then disruptive genomics, next gen sequencing called NGS. All of that was after the project was done in 2002. So it was like the genome project was done, a dollar per base was reached in 1994. Then the project was done shake hands with Bill Clinton in 2000. 2002 to 2008, immense gains through the private sector. 300,000 jobs a year created. Just all these incredible things. And that's really what the proteom — the wild frontier. You know, we gotta domesticate it.
Erin Spain, MS: How do you analyze proteins, with proteomics? What specialized equipment is needed?
Neil Kelleher, PhD: So the proteoform is just that specific molecule, and how do we assert? How do we detect, analyze, characterize these decorations that are captured by this word, proteoform. The proteoform captures all the sources of variation of a protein molecule, and it does so precisely and how we measure those is currently with an approach called mass spectrometry. So we need to systematically discover and characterize proteoforms at about a hundred fold less expense than it currently costs. So that's one expression of this project, the Human Proteoform Project. And currently mass spectrometry is really the only way to assert and discover the proteoforms. What you do after that could unlock so many other possibilities that may or may not use mass spectrometry.
Erin Spain, MS: So you use something called top-down proteomics. Explain top-down proteomics versus bottom-up proteomics.
Neil Kelleher, PhD: I like a stamp collectors metaphor. So if each proteoform is a stamp and that's what it's created in our bodies from the 20,300 human genes and there's going to be 50, 100 million of them, is our estimate. OK. Then we have to collect stamps in different cell types in the blood. We have to really determine these molecules because that's our biology. That's what is the proteome, or the collection of all proteins is the proteome, like the collection of all genes is the genome. So the way that we do this is top-down proteomics. It's the direct, no inference required. And you first weigh the protein form down to the last hydrogen atom. We can tell the molecular composition of that proteoform. And we do that. That's the top-down philosophy. So we first weigh it and then we controllably fragment it into pieces, and we weigh all those pieces — that's top down. And in a stamp analogy, it's equivalent to taking a picture of the stamp. And so, you know the whole thing, and then you can characterize each little pixel on the stamp — that's top down. Imagine now, here's what bottom-up proteomics is, which is a fantastic five billion, four or five billion dollar-a-year-market. It's got all, it's how we know the low resolution draft of the proteome, and it's how we've discovered so much about all the protein machines that drive our biology in proteomics. So what we're saying is we need a new ecosystem on top of, bottom-up. So it's a proteoform-resolved and proteoform-enabled world that we envisage. And the way to do it, or the difference is, to complete the stamp metaphor, what we do in bottom-up proteomics is we take a collection of, say, 10,000 stamps from the blood. We draw blood. There's 10,000 proteoforms in there. OK, what we do is we shred them into an average of 50 pieces and then we take a giant fan onto the table with all those pieces and we splay all the pieces out. We collect maybe 10 percent, eight to 10 percent of the pieces of the stamps back, and we're trying to put them back together each. You know, and it's just its limitations are becoming clear. So the top-down is every stamp is kept in its lane. Every proteoform is kept in its lane properly for characterization.
Erin Spain, MS: Is this more difficult to accomplish the top-down proteomics?
Neil Kelleher, PhD: Yeah. And there was one guy in 1999, I suppose, or one person, that was me. I started my lab with this idea like weigh every protein. And some people really liked it. Some thought, OK, well, he's in trouble for tenure. He might not make it, but it's in the tradition of American science to throw out really ambitious goals. Or in Chicago too, you know, like Burnham's, "Make no small plans."
Erin Spain, MS: This is becoming a much more accepted way to do science when it comes to proteomics. Is this the method that will be used in the Human Proteoform Project, the top-down proteomics?
Neil Kelleher, PhD: Well, in the first phase of the project, yes. Anybody's crystal ball maybe goes three, four, five years. And like in the genome project, there was disruptive elements at work and that's what we want is we want cheaper, better, stronger to deliver value and understanding to all the stakeholders of the project, and the American people are some of those. So we want to imagine that we can do the project, and that would be first phase would be used top-down proteomics to assert proteoforms. Once they're discovered, though, and characterized, there's the atlas, the human proteoform atlas. And then all sorts of other technologies can use that information, including bottom-up. Now I can devise assays knowing the decorations on the proteoforms. So everything becomes proteoform enabled, and then next generation proteomics, just like next generation sequencing, we can now sequence a genome for $1,000. It's absolutely incredible. So that's what we need in this decade for the proteome.
Erin Spain, MS: Here at Feinberg, you're doing some work in medical research with proteomics. You have collaborators in organ transplantation and cardiology, and you're finding ways to integrate proteomics into that research and clinical care. Tell me about some of those projects happening right now at Feinberg and the advances you've made.
Neil Kelleher, PhD: Let me just make a point about the proteoform measurement impacting clinical care. Because that, in proteomics, if you go back to 2002, you know, cancer detection early through blood biomarkers, right? That has proven to be very difficult, and the last 20 years has borne that out. But there's one area where actually proteoforms measurement, as the vehicle, has already led to a clinically-deployed assay in over 3,000 hospitals worldwide. And it's called the Biotyper assay. And it's the cheapest way to determine the bacterial pathogen in someone's infection. So if someone has a bacterial infection, there's a clear way you look at a bunch of, what's called ribosomal proteoforms, and those actually tell you what bacteria. So that proves the business and science case that is often overlooked in our field. But other complex Mendelian disease or, you know, that's a fancy way, is to say most complex disease involves multiple genes. It's highly complicated biology mediated by proteoforms. And that point of looking at complex disease is what we do every day at Northwestern Proteomics. And we have four disease areas that we work on neurology, you mentioned cardiology, immunobiology, which is the transplant center, has been a fantastic partner. The CTC at Northwestern is just absolutely spectacular, so many collaborations there. And then neurodegeneration is the fourth area. So we show the value of proteoform measurement in basic and in clinical research. So patient cohorts, we have a fantastic collaboration with Joshua Letvitsky, and also Don Lloyd-Jones and John Wilkins in cardiology. And we have an initiative called the NPI, the Northwestern Proteomics Initiative, that's been supported by Eric Nielsen, the dean and the upper administration. And that will expand the number of targets that we can look at to discover, and that the proteoforms, like I can give you one in oncology and KRAS. So KRAS is a gene. It's an oncogene and it's mutated in so many cancers, but like 90 percent of pancreatic cancer. And we've mapped 37 proteoforms of KRAS and we're showing which ones matter in the dynamics of the disease. And then KRAS is also a drug target. The only way to monitor how much the drug is binding to the target is through proteoform and measurement. And we've done that with collaboration with the National Cancer Institute, as well as collaborators here at Northwestern. We have at any given time like 120 labs that we collaborate with at Northwestern Proteomics. It's about 60 people.
Erin Spain, MS: You wear a lot of hats here at Northwestern. You mentioned Northwestern Proteomics. You're also the director of the Chemistry of Life Processes Institute. Tell me about the different roles that you're playing right here at Northwestern, how they all kind of work together.
Neil Kelleher, PhD: There's just a natural alignment there because both want to embrace Feinberg over the 10 or so miles that separate us geographically and do more with the investments that we have. Basic scientists on the Evanston campus need to understand unmet clinical need, and I'm among them. I absolutely, when I see a gap, I know what I have to build my tools for, whether they're diagnostic tools or chemical tools, you know, synthesis of new drug molecules. I need to be on point and on target to compete for resources out in the ecosystem funding for science. And that's what the function of the institute is. And though my personal passion is the diagnostic side using proteoforms as biomarkers, I have a whole set of chemists and engineers under the roof of CLP that have the same needs, so the interaction with the school of medicine is so critical.
Erin Spain, MS: So what can we expect over the next five years, 10 years from the Human Proteoform Project.
Neil Kelleher, PhD: Getting support for the project in the US federal government is an obvious thing we're trying to do. Other funders around the world were able to come in on the genome project, like the UK's Wellcome Trust, hooking these things up and getting the project on the rails. And I think the next five years will see the expansion of the proteoforms discovery effort, dropping it into an atlas, putting that atlas in a form that people can absorb and start to use immediately. We don't have to wait 10 years, now. But also there's a clear script now that scientists run, which is much more collaborative, much more diverse and embracing inclusion. And that is through the consortium. We have a egalitarian way to make sure that progress is done aggressively, but in a process that listens to the community and all those things can happen in the first five years. You know, you go to the five to 10 year thing, that's when the private sector can really start to contribute. And unlike the Genome Project, you know, I'm trying to have that disruption, you know, trying to get the project going, but then try to embrace all those ideas that would really just help us get to the goal line. And there was, as you know, in the Genome Project, a public and a private effort, and it actually took the collaboration between them. It really did that. There's a whole story and the listeners, some of whom know the history there, they'll know exactly what I'm referring to. I think leveraging both investments is, you know, that balance point can be struck. And I think over the years of the project we'll deal with that and we'll deal with it in a really thoughtful way.
Erin Spain, MS: Thank you, Dr. Kelleher, for coming on the show, explaining this project to us, the human proteoforms project. We're excited to hear what happens next.
Neil Kelleher, PhD: Erin, thanks so much for the opportunity to share what's going on at Northwestern Proteomics, at Northwestern Medicine, Northwestern as a whole. And you know, we want to make Chicago a destination for the wild and wacky world of proteomics in the next few years.
Erin Spain, MS: Thanks for listening. And be sure to subscribe to the show on Apple Podcasts or wherever you listen to podcasts and rate and review us. Also for medical professionals, this episode of Breakthroughs is available for CME Credit. Go to our website, feinberg.northwestern.edu and search CME.
Continuing Medical Education Credit
Physicians who listen to this podcast may claim continuing medical education credit after listening to an episode of this program.
Academic/Research, Multiple specialties
At the conclusion of this activity, participants will be able to:
- Identify the research interests and initiatives of Feinberg faculty.
- Discuss new updates in clinical and translational research.
The Northwestern University Feinberg School of Medicine is accredited by the Accreditation Council for Continuing Medical Education (ACCME) to provide continuing medical education for physicians.
Credit Designation Statement
The Northwestern University Feinberg School of Medicine designates this Enduring Material for a maximum of 0.25 AMA PRA Category 1 Credit(s)™. Physicians should claim only the credit commensurate with the extent of their participation in the activity.
Neil Kelleher, PhD, has nothing to disclose. Course director, Robert Rosa, MD, has nothing to disclose. Planning committee member, Erin Spain, has nothing to disclose. Feinberg School of Medicine's CME Leadership and Staff have nothing to disclose: Clara J. Schroedl, MD, Medical Director of CME, Sheryl Corey, Manager of CME, Allison McCollum, Senior Program Coordinator, Katie Daley, Senior Program Coordinator, and Rhea Alexis Banks, Administrative Assistant 2.