"Bridging the gap between immunology and data science allowed me to make new breakthroughs that were not readily possible."
This story is part of an ABE series on data science in biotechnology.
As a popular saying goes, “necessity is the mother of invention.” And that saying appears to hold true for Marvin Gee: When he set out to get his PhD in immunology, he did not expect his work to immerse him in the field of data science.
“Data science wasn't necessarily a focus of my prior work but certainly now has been one of my work's major priorities,” Gee explains. “The development of this focus was born out of a necessity to understand the continuously growing datasets generated from novel technologies.”
Now as co-founder of the biotech startup 3T Biosciences, Gee straddles the world of bioscience and data science. His company uses machine learning models to identify cancer targets, predict how patients might respond, and develop new therapies. “What is especially key in this setup is our ability to test these predictions and continue to feed our computational models to improve on accuracy and specificity,” Gee explains.
Gee’s work earned him a spot in Forbes 30 under 30 in 2018. He is also an alumnus of the Amgen Scholars Program (at Caltech in 2011), a cousin program to the Amgen Biotech Experience that gives undergraduate students hands-on summer research experience.
When Gee first began his PhD research at Stanford University in the immunology program, he wanted to understand how T-cells (a specialized cell that is part of the human immune system) respond to different proteins in cancer cells. He developed a novel technology that could query vast synthetic libraries of proteins from yeast to identify specific T-cell responses among patients. This generated a huge amount of data, as described in a 2018 Cell paper about the work.
“By necessity, computation, and particularly the advancements in machine learning, helped with the development of tools to analyze the data and predict targets of T-cell responses with high accuracy and specificity,” Gee says. “By taking the advancements of one field into another with a unique perspective, we were able to make significant advances in cancer target discovery.”
To bring this data science perspective into his work, Gee had to dive into new fields during his immunology PhD. He took multiple courses in computer science and bioinformatics. “I'm a strong believer in being a perpetual scholar,” he says. “What was really rewarding was directly taking those basic skills and applying them directly to my research. It not only opened up new ways of thinking about a problem, but it also gave me an opportunity to start conversations with others who are well versed in data science. Bridging the gap between immunology and data science allowed me to make new breakthroughs that were not readily possible.
Now in his work at 3T Bioscience, Gee and his team continue to keep up with the latest developments and technologies in data science. And he sees this trend across the biotech industry. “Biotech companies are increasingly embracing novel technologies in the computational space for the application of drug discovery, therapeutic development, and predicting clinical responses,” he explains. “Companies that are using novel technologies in this space have an opportunity to cover new ground or cover ground more quickly than competitors.”
These technologies also present unique challenges. “Data quality is incredibly important in development of accurate models,” he says, making adoption of novel methods difficult, as they require rigorous validation.
For students considering paths in biotechnology, Gee says it’s important to first have a fundamental understanding of biology and, particularly, of the experiments that are needed to generate data. “The more you understand about an experiment, the more you understand about the underlying assumptions that need to be made to analyze data, generate models, and make reasonable conclusions,” he says. Being able to speak both the languages of biology and data science puts scientists in a position to best interpret data and make key decisions.