AI After Drug Development

Abhishaike Mahajan

Coming up with drug candidates isn’t the bottleneck on finding new treatments. Can AI help with the things that are?

Clara Collier: I’m very excited to be talking to Abhishaike Mahajan. He works for Noetik, a startup that uses AI to develop cancer therapies, and he’s also one of my favorite bloggers. I’m interested in the intersection of biology and machine learning, and I’m also very, very confused by it. Abhi's writing has been extraordinarily helpful for me in untangling this whole topic. So, to start:  Where do you think the real applications of the technology are? What excites you — and why? 

Abhishaike Mahajan: Maybe I can start with my own background. I spent two and a half years at Anthem, a U.S. health insurance company, on a research team applying machine learning to electronic health records to predict who is likely to develop chronic health conditions two years out — essentially,  risk stratification models. I also spent some time there doing causal inference work, trying to answer questions like: If you're on one antihyperglycemic medication and you're, say, a Hispanic male in his 50s, what should the second-line therapy be? Right now, it's very much based on "vibes." All of this was done with a data set of 50 million U.S. clinical claims that the company  had access to. 

In my second job, I spent two and a half years at Dyno Therapeutics, developing better viruses for genetic therapy delivery. This involved a lot of mucking around with AlphaFold and other protein structure prediction and design models. 

And now, my current role revolves around learning better representations of tumor microenvironments, in the hopes that we can predict which patient populations are most likely to respond to a specific drug. 

The career trajectory I have encapsulates many of the individual fields that have popped up in biological machine learning. If you look at the life cycle of machine learning in applied biology, it can be preclinical, clinical stage or postclinical. My work at the health insurance company was postclinical, dealing only with approved drugs. My current work at Noetik is clinical stage, where most of our interest is in investigational cancer drugs in Phases 1 through 3, although most of the current machine learning in biology that you’ll see is focused on the preclinical stage. The preclinical stage is a very machine learning-shaped problem, since imagination is a very high priority. There is no ground truth. You need to come up with a molecule, a protein, whatever. 

My work at Dyno Therapeutics was closest to preclinical. There are viruses that exist in nature which have coevolved alongside humans that are typically used when delivering genetic therapies. The founding of this company was based on a paper that applied saturation mutagenesis to this particular virus — exploring the mutational landscape, like single amino acid substitutions — to try and figure out whether we can improve this virus's ability to, say, infect the brain, where we may want to deliver the virus’ genetic payload. The answer is yes.

Now the next question is: how much can this be scaled? Dyno was started on that premise. From there on, you are into unobserved territory. You are in evolutionarily divergent territory. You can’t really depend on there being other viruses that closely relate to your existing ones, so you place a very high premium on your models being able to generate new things entirely. I worked with a lot of protein structure generation models — if you have a particular target in mind, can you hallucinate something that will bind to that target and then graft that onto the virus? Or can you use the model to modify larger sections of the virus?

Abhishaike Mahajan

Clara: Is this like BindCraft? 

Abhi: This is BindCraft, yes. But BindCraft exists on a continuum of many different binder generation models. It’s arguably the best one — perhaps the point where the field was kind of solved, though others have reasonable disagreements with that, especially given how good the most recent models are. That was mostly preclinical work, since we were primarily testing in monkeys or mice. 

Clara: There was an essay that went semi-viral recently by Claus Wilke: We Still Can't Predict Much of Anything in Biology." He specifically talks about BindCraft there, arguing  that we've had many generations of predictive computational tools claiming that we can do these things with proteins. BindCraft, in this tradition, is very good at what it does, but it has many gaps. You  still need extensive testing and a lot of research taste to make use of it. 

Abhi: I can see the argument that it’s perhaps not good at designing binders for intrinsically disordered proteins or proteins that have never been experimentally characterized before. Still, for the set of targets that are experimentally well characterized, where you  just want to generate interesting binders,  I'm inclined to think that it’s  transformative. 

Prior to BindCraft, the state-of-the-art method was generating hundreds of thousands of random peptides, folding them with your target of interest using AlphaFold, and then picking the ones with the best confidence metric. To some degree, before BindCraft, your only alternative was pure computational brute force. In that sense, all BindCraft is doing is compressing the workload; it’s not unlocking brand new capabilities. But I would argue that compressing workload, especially at that scale, is pretty transformative. 

Now, if we’re to claim that BindCraft results are, at the end of the day, no better than simply doing phage display, then, yes, I can see that. But some recent work from earlier this year suggests that these models can also design binders for receptors that are  traditionally  very hard to design against — like GPCRs, which phage display struggles with. None of these approaches is  a silver bullet, but it does feel like we’re getting close! 

Clara: I feel like there’s an archetype in this discourse: the crotchety computational biologist who has seen it all before and is very skeptical of the potential of these machine learning tools to make a difference in developing new, viable treatments, either because the technology itself isn’t good enough or because there are too many other bottlenecks. I know and respect a lot of these people! But I also know and respect many others who are incredibly excited. I’m trying clarify what heuristics I should be paying attention to when evaluating how people talk about these technologies. 

Abhi: Obviously, take my opinion with a grain of salt. The big reason I wanted to work at a company that is closer to clinical stage machine learning is that, ultimately, most of the problems with drug development relate to the clinical stage. It is very rare that we don't have enough interesting ideas at the very beginning. Rather, the problem is that testing each one of those ideas is enormously expensive. If your goal is to make drug discovery better, you can do one of two things. You can either improve the physical shot itself to begin with — by creating new and unique binders,  like BindCraft —  or you can improve your ability to shoot in the first place. I see clinical stage companies as very focused on the latter, which I think is of much higher importance. 

So why isn't everyone trying to improve how well they can shoot? The answer is that no one really knows how to do it well; it’s a much harder problem. For example, Noetik, the company I work at now, has taken a bet on patient stratification being something that's deeply underappreciated in oncology. We have models that understand tumor microenvironments better than humans do, so we’re using those models to improve patient stratification. The current cancer drug failure rate is 95%; if we can drop that down to 70%, we’ve made it far more economical for pharmaceutical companies to throw far more shots. That’s success to me. 

Clara: What are some concrete examples of clinical companies that are doing clinical stage work, in addition to Noetik? What does that look like in practice? 

Abhi: The most salient example I can think of is a startup called Unlearn.AI. They sell a purely computational platform that conducts synthetic control trials; essentially, they’re saying:  “We’ve run many patients with Alzheimer’s through controlled trials before. Do we really need to run one again? Or, can we use the hundreds of trials that have been run previously to create a synthetic version of the controlled trial population you would otherwise be creating?” They are selling that as a service. That’s an example of a clinical stage company. 

Clara: How much data would you need for something like that to be feasible? 

Abhi: This is why most people don't do it — you need a lot of data points. Most worryingly of all, you need human data points, which are the hardest data points to gather. I think  the reason that there aren’t that many clinical stage machine learning biology companies is because it's such a capital intensive effort, and the people who have the legible pedigree necessary to raise for such a thesis are usually more curious about the preclinical stage stuff because it's more intellectually satisfying.

Clara: So, for companies that want to do clinical stage work, do they always need to collect all of the data themselves? If the whole draw is that an enormous number of Alzheimer's trials have already happened, how easy is it to actually collect all that  data from those trials and synthesize it? Do those trials even have all of the data points that you would need for a synthetic trial? 

Abhi: Moving away from Unlearn.AI —  because I don't know their data collection strategy — the historical issue with collecting data from prior clinical trials is that it’s really hard. Maybe the people behind the trial did collect some raw biological material from your sample of interest — for cancer that's a tumor resection, for inflammatory bowel disease that's an intestinal lesion —  but they usually care about one or two specific biomarkers, and that's all they extracted. The  sample itself may have since been discarded. Or, maybe, they stored the biological material, but it’s logistically complicated to access; , there isn’t exactly a marketplace for this. 

There is a company I mention in my recent post, Artera AI, which got FDA approval for a black box biomarker that can stratify patients based on whether they will or will not respond to an anti-androgen therapy for prostate cancer. They developed that model based entirely on retrospective specimens from previous Phase 3 prostate cancer trials run by other companies. I know very little about the logistical nightmare of getting  those samples, but I imagine it was intense. The moat in creating models like this is being able to transact with many different companies at once, all of whom don't really want to share their data with you. 

Clara: What are the class of problems that you think these models are good for, that the current system is doing a bad job addressing? 

Abhi: I would vaguely gesture at human simulation — can we predict which patients will respond to a particular drug? It’s obviously a very narrow slice of human simulation. But for all problems where you need to drug a human and observe whatever happens to them, there is some model that claims to model exactly that behavior such that you don't need the human at all. For example, Alzheimer's clinical trials can take decades  to reach their primary endpoint. But maybe there is some biomarker, which is not human legible, that could be gleaned from the patient data, say, a few months in. So there are companies popping up that claim that they can  create prognostic biomarkers during the course of a clinical trial, which they will then sell to other companies. 

Clara: So it's a way of making the process of figuring out which drugs you should test in humans more efficient. 

Abhi: Yes, I would argue that this is the claim of most clinical stage machine learning companies. 

Clara: This is central to the problem that people who are worried about drug R&D efficiency are concerned about  that actual clinical trials are too slow and too expensive. 

Abhi: They feel closely aligned. I know Jack [Scannell] is a historic skeptic of a lot of machine learning biology companies, and I would agree that what I've said so far is the story that has been sold. It feels very logical to me that, at some point, it will bear fruit. It is not obviously true that any of it has borne fruit yet. 

But you can look at the mountain of papers that are published after clinical trials. A cancer trial goes on, ends up failing, and then there's a paper that comes out afterward that tries to figure out why. Usually, some small fraction of patients have responded to the drug, and the paper is trying to understand this. It’s always a complicated answer — maybe a particular cytokine group was super upregulated in the responder group and all of them are white males in their 30s. Useful insights rarely come out of this work, but you can see the inklings of a computational system that could predict all of this in advance. None of it is truly outside the bounds of knowledge. It is compressible; we just don't know how to compress it. 

Going back to the question of how do you know whether you have enough data? I don't think anyone knows,  for any machine learning problem, how much data is enough. You just need to test it out and see what happens. That's an unsatisfying answer, but the realistic situation we're in is that the alternative is having no computational way to simulate a human at all, which is where we are today. I would argue that's worse. 

Clara: What are the data sets that you most wish you had? 

Abhi: The gold standard in my head is whole proteome spatial proteomics. That is a type of data set that I don't think is possible to create today — if it is, it would cost a massive amount of money. This is very specific to cancer, but it would be the ability to take a tumor biopsy and visualize every single human protein that exists within the spatial context of the system. It's almost like an X by Y by 20,000 image of what is going on inside a tumor. Maybe even more than 20,000 if we’re considering isoforms. If I were to add more, I would like multiple biopsies from the same tumor, because within a tumor there exist multitudes. 

Clara: Why is this impossible to get now? 

Abhi: My impression is that it is just incredibly expensive to generate the data; even if you could generate it, you could not do so at sufficient scale to  actually make sense of it. To be clear,  when I say spatial proteomics data would be useful for me, I am not saying it would be useful to me as a human with a human brain. It would be very useful for a model to ingest. 

Clara: You've written a lot about why you think that cancer is an especially exciting field for machine learning applications; because there are so many different genes and interactions involved, it does not lend itself to a human manually figuring out a target.

Abhi: I'm not sure I would go so far as to say cancer as a field is uniquely suited to machine learning. I would say that I work in cancer because it is a field where the emotional impact is quite high. Cancer is a terrifying disease. The truth is that if you solve cancer, in absolute numbers you don't actually improve human lifespan by all that much — something else gets you in the end. But dying from cancer is such a uniquely horrifying experience that it feels like a worthy life goal to spend at least a few years working on it. 

As a field, cancer is nice because it  has had so much money infused into it. And rightly so, because it is a disease that affects a lot of people. This  has led to the realization  that the disease is so complicated that you need very smart academic oncologists to look over individual patients and recommend treatments, creating a  prebuilt culture of personalized medicine. Outside of oncology, and perhaps autoimmune disease, I don't think any other disease is given the same level of respect with regards to how extremely complicated it can be to treat.

There is also a lot of unmet need in cancer, and people are willing to accept big swings. In Type 2 diabetes, there's not a ton of unmet need — GLP-1s solve a significant fraction, as do insulin and metformin. But for pancreatic cancer, where everyone who gets the  diagnosis has a high chance of dying in the next six months, there's a strong argument to make that you can try crazier things. It feels like a natural fit for machine learning to play a role. 

Clara: There’s more cultural willingness to experiment there.

Abhi: Yes, that’s a better way of encapsulating it.. 

Clara: Do you think the structure of the problem biologically lends itself better to computational methods, or do you think it's just a factor of "people really freaking want to cure cancer"? 

Abhi: I think the second point is definitely the highest order bit. When I wrote the article you mentioned, many people made very valid points that cancer is certainly not alone in being an extremely complicated disease with a lot of patient heterogeneity — there’s inflammatory bowel disease, endometriosis.  

The human body is complicated. Every disease is complicated. The flu is complicated. HIV is complicated. It would be hard to drive a stake in the sand and say this is more complicated than that. The only people who do that are people who want funding for their disease area. This is hand wavy, but a better way of conceptualizing whether a disease is hard or easy to cure is: how many "knobs" does biology give you access to for controlling that condition? 

It turns out that for bacterial infections, you get this very nice set of knobs: foreign invaders have a completely different set of biology that you can uniquely interact with. The way at least a few antibiotics work is by jamming up cellular machinery that only exists in bacteria. On the other end of the spectrum, cancer has no unique cellular machinery and is largely identical to you. There are aspects of cancer that are unique, such as  its metabolism, but cancer also grows very fast so it can quickly evolve beyond whatever unique aspect it has. There is no knob that it gives you access to that it is not willing to evolve away from. 

HIV gives you a very nice knob in the sense that it's easy to jam up its ability to infect cells, but it doesn't give you access to the knob that allows it to sequester itself into cells for decades. That's why you can have chronic HIV — it doesn't do anything to you, but it just lives in your body because there's no lever you can pull to get it removed from every one of your cells. 

The thesis of another article I'm writing is: If you look at all the crazy drugs that have come out — PrEP, CRISPR, CAR-T — they are all fundamentally reliant on taking advantage of existing biology that was already there. CRISPR is based on a bacterial immune system. CAR-T is based on reconfiguring existing T-cell receptors. The vast majority of medicines rely on conserved evolutionary biology. I would argue that a disease is harder or easier to cure depending on how many of those knobs are made available to you. 

Clara: Where do you see this field in five years? Which things that people are doing now are going to yield real results? 

Abhi: I think those who are focusing on clinical stage stuff will be deemed particularly prescient. Right now, when I look out at the field, a lot of the companies starting up are clinical stage machine learning biology companies. People are realizing we don't need better preclinical research; we need to shift our aims to the clinical stage. 

This is low conviction, but I think the focus on human data will grow larger and larger. People will be more willing to let go of the perfect ability to perturb everything in their system in exchange for working with human data. 

There’s also something about temporality I’m curious about. I've written about Recursion Pharmaceuticals before. Since their founding, they have used this particular assay called "cell painting" to visualize things — applying dyes that bind to DNA (blue light) or cell membranes (red light). I wrote about a change they made recently where they dropped that entire assay. Now they are taking raw, “bright-field” images — literally shining a bright light on the cell and looking at it through a microscope. 

They’re doing this because they noticed it made no difference to their models. Models are getting really good; they don't need human legible color labels. More interestingly, the application of those dyes usually required killing the cell first (“fixing it”). If you're just imaging the cells with light, you gain access to a temporal dimension where you can see how the cells react from hour to hour, from day to day, from week to week. There's a multibillion-dollar company generating petabytes of this sort of data month after month. I don't know what they'll do with it, but it feels to me like there is unique biology there that has never been explored. 

Clara: If you're excited about clinical stage machine learning and the bottleneck is data, what could people be working on to resolve that  bottleneck? 

Abhi: Starting a company which generates that same sort of data but cheaply and better than everyone else. I think there is a lot of value in tool or service companies that take existing assays and make them much easier to run at scale. 

Have you heard of a company called Plasmidsaurus? I've written about them too. They are very famous in the biology world. All they do is sequence plasmids —  genetic material that you can shove into new cells. Traditionally, creating plasmids is an error prone process, so labs use error prone plasmids and the outcome of that research is pure garbage. This company popped up four or five years ago, and their claim is this: Send us your plasmids. We will sequence them for $15, make sure they're absolutely correct, and send you back the information. 

It's an astronomical success story. I think they started with $100k in seed funding, and now it's a $40 million annual recurring revenue business. They took an existing workflow and rapidly automated it. 

Clara: Wow. Why aren't there more companies like this? 

Abhi: It’s an interesting story. Plasmidsaurus primarily sequences via Nanopore sequencers, which are super cheap to run. Nanopore sequencing came out around 2010-2015. Oxford Nanopore, the company behind it, was filled with great scientific minds, but was terrible at teaching people how to use their machines. So, for a brief time, there was an arbitrage period where people who really understood how to use this machine could charge to run it for other people who had no idea how to work with it. The founder of Plasmidsaurus, Mark, was doing a postdoc at Caltech where he was running Nanopore sequencing all day. He knew better than almost anyone else on earth how to work with these sequencers and even worked with the company to establish good protocols. He just used his knowledge, performed that workflow for other people, and scaled it up.

Clara: Are there other areas like this, where a Mark could step in and make the whole process more efficient? 

Abhi: I think in an ideal world, more people would start up something in high-throughput proteomics. There’s Olink, there’s Nautilus Biotech. It’s just that you need a very specific skill set to build a company like that, and not many of those people are interested in starting a company. This is why I'm very happy that Focused Research Organization enthusiasts, like Convergent Research, have taken a strong dip into funding measurement companies. It’s a hard thing to make a business out of, but if they do their job well, it pays off civilizational dividends. 

Clara: Didn’t FROs get started because Adam Marblestone was really interested in connectomics? He wanted that data set.  

Abhi: And now there’s E11 Bio. I think Adam has incredible taste in the types of companies that are best suited to funding via the FRO mechanism. 

Clara: Why is it so difficult to build a data measurement company, then, if there's obviously a market? 

Abhi: There are distinct steps in the biology data life cycle. At the very beginning, there is raw academic research. You see that, for example, some particular biological interaction occurs when two peptides interlock together and one lights up. That's the true measurement signal. The second step is turning that into a company where you go out and say, “I can measure XYZ for you at scale.” That's where people get stuck, because for a lot of interesting data modalities, there's no clear market that exists in advance. 

For example, Nautilus Biotech has  a bizarrely good biology podcast. In some episodes they ask, "What is the utility of our high-throughput proteomics platform?" And the answer they give is pretty honest: "We’re still figuring it out, but here are some ideas." There isn't a clear use case for it today, but they argue that's because this type of data has not existed before. If you give a scientist a huge amount of proteomics data, all the way to the isoform level, they won't know what to make of it for at least the first year. It takes time for the field to catch up culturally.

Clara: So it’s a chicken and egg thing. The data could theoretically be useful, but it's hard to get the data because nobody supplies it, and nobody supplies it because there are no buyers, because people don't know what to do with it yet. 

Abhi: Exactly. I talked to someone at E11 Bio about this, with regards to connectomics. I asked, "Why do connectomics? What do you get from this?" They said, "We don't know. No one has measured this before. We know it was kind of useful for a fruit fly brain, but we don't know if it's going to be useful for anything larger. But no one else is going to do it." 

Clara: What is your opinion on Dario Amodei-style claims — that LLM-based architectures will be doing scientific ideation and this will have major impacts on the world? 

Abhi: I think it's directionally true in verifiable domains where these models have already demonstrated superhuman reasoning. For any strong AI scientist claim to be true, it is almost implicitly saying that humans are bad at building connections between the existing set of data. I think that is true. I don't think it's so true that we would naturally expect brand new fields to pop up overnight from an AI. I struggle to look at what Dario says and say, "That's clearly incorrect," but I do think humans are actually really good at extracting useful insights from scientific data. We have hundreds of thousands of postdocs and PhD students poring over the literature. They come up with a lot of really crazy things.

To steelman this: I recently interviewed Ellen Zhong. Her research is about how to interpret cryo-EM particle images better. She developed a deep learning method to do this. Afterward, she went back to existing cryo-EM data that had been measured in the literature using classical methods, applied her method, and saw that she was able to extract proteins that they did not even think existed in the dataset. That is a case where an "in silico Ellen Zhong" would be able to measure things humans did not see. But cryo-EM in some sense is a verifiable field; there is a genuine ground truth. I think in biology generally, there are not a lot of verifiable ground truths. 

Abhishaike Mahajan has spent his career trying to use computation to improve human health. He blogs at owlposting.com and tweets at x.com/owl_posting?.

Published

Have something to say? Email us at letters@asteriskmag.com.