- Home
- AI & Machine Learning
- Life Sciences Research with Generative AI: Protein Design and Literature Reviews
Life Sciences Research with Generative AI: Protein Design and Literature Reviews
For decades, drug discovery and protein engineering relied on slow, trial-and-error methods. Scientists would screen thousands of natural proteins, hoping one would fit a target like a key in a lock. But nature doesn’t optimize for human needs-it optimizes for survival. That’s why so many disease targets stayed undruggable. Today, generative AI is changing everything. Instead of searching through nature’s leftovers, researchers are now building proteins from scratch-designed for function, not evolution.
From Evolutionary Guesswork to Function-First Design
Before 2020, predicting how a protein folds was a nightmare. Even with supercomputers, it could take months to model one structure. Then came AlphaFold2. It didn’t just predict folding-it proved AI could understand the hidden rules of proteins. But prediction wasn’t enough. The real leap happened when researchers stopped asking, "What does this protein look like?" and started asking, "What do we need this protein to do?"
Now, generative AI doesn’t just analyze existing proteins. It invents new ones. Imagine telling an AI: "I need a protein that binds tightly to this cancer cell, survives in blood for 12 hours, and doesn’t trigger an immune response." The AI doesn’t search through databases. It generates a sequence from scratch-using the grammar of proteins it learned from billions of known examples.
Georgia Tech’s 2025 framework lets researchers input high-level goals: "bind this target," "stay stable at 40°C," or "catalyze this reaction." The system then generates dozens of candidate proteins. None of them exist in nature. Most wouldn’t survive in a lab. But a few? They work. Better than nature’s versions.
How Generative AI Designs Proteins
There are three main ways generative AI builds proteins today. Each has strengths, and each reveals a different piece of the puzzle.
- Protein Large Language Models (pLLMs) treat amino acid sequences like sentences. Just like ChatGPT learns word order from billions of texts, pLLMs learn protein "grammar" from over 500 million known protein sequences. Integra Therapeutics used this approach to analyze 13,000 new PiggyBac transposase variants-proteins used in gene editing. Their AI didn’t just find patterns; it created dozens of new versions that outperformed natural ones in human T cells. This is huge for cancer therapies where precision matters.
- Diffusion Models work like a sculptor chiseling away noise. Starting with random noise, they gradually refine a structure based on physical constraints. RFdiffusion3, released by the Baker lab in September 2025, doesn’t just design the protein-it designs how it binds to small molecules, drugs, or DNA. Earlier tools missed "misfit pockets" or wrong angles. This one doesn’t. It designs entire complexes at atomic resolution.
- Unified Frameworks like MIT’s BoltzGen combine both approaches. It doesn’t just predict structure or generate sequence-it does both in one step, with built-in rules from wet-lab scientists. If a design violates basic chemistry-like putting two charged atoms too close-it gets rejected before it’s even simulated. That’s why BoltzGen is the first tool that can generate protein binders ready for drug pipelines.
The results speak for themselves. In a 2025 study published in Nature Biotechnology, AI-designed transposases showed activity matching lab-optimized natural proteins. One variant worked in human T cells-something no natural version had done efficiently. Meanwhile, the Graz team’s Riff-Diff tool created enzymes for chemical reactions that had never been catalyzed before. In tests, these AI-made enzymes produced more product, faster, than any previously designed.
Why This Matters for Drug Discovery
The human body has millions of proteins. But drug developers only use a few thousand-because they’re easy to find, isolate, and modify. The rest? They’re too complex, unstable, or rare. That’s why 90% of disease targets remain untouched.
Generative AI flips that. It doesn’t care if a protein exists in nature. It only cares if it works. That’s why companies like Integra Therapeutics are now focused on "targeted, large-sequence integration"-a fancy way of saying: "We’re building custom proteins to fix genetic diseases at the source."
For example, gene therapies need delivery systems-vectors-that carry editing tools into cells. Natural vectors often trigger immune reactions or break down too fast. AI-designed proteins? They can be engineered to slip past the immune system, last longer, and target specific tissues. One AI-generated transposase from Integra Therapeutics now powers a new gene-writing platform that’s entering early trials. No natural protein could do that.
It’s not just about cancer. Enzymes designed by AI are being tested to break down plastic waste. Others are being built to capture carbon. Vaccines are being designed to target rapidly mutating viruses like influenza or SARS-CoV-2 variants. The old way took years. The new way takes weeks.
The Literature Review Revolution
While protein design grabs headlines, generative AI is quietly transforming how scientists read and use research. In 2025, a single researcher could be expected to read 150 new papers a week. That’s impossible. Even with tools like PubMed, finding the right paper among 10,000 results is like finding a needle in a haystack.
Now, AI doesn’t just search. It understands. A researcher can ask: "What proteins bind to the N-terminal domain of p53 in triple-negative breast cancer?" The AI scans thousands of papers, extracts data from figures and tables, and builds a map of known interactions-even ones buried in supplemental data. It flags contradictions, highlights gaps, and suggests untested hypotheses.
MIT’s team built a tool that does this for protein design literature. It doesn’t just summarize-it connects. If a 2023 paper describes a protein that stabilizes a certain structure, and a 2024 paper shows a mutation that improves binding, the AI links them. It then proposes: "Try combining these two features." It’s like having a co-author who’s read every paper ever written on your topic.
One lab at the University of Toronto used this to cut their literature review time from 8 weeks to 4 days. They didn’t just find what was known-they found what was missing. And that’s where breakthroughs happen.
The Dark Side: Biosecurity and the "Protein Universe"
Not everyone is excited. As generative AI creates more proteins, it’s expanding what scientists call the "protein universe"-the total number of possible protein structures. Nature only explored a tiny fraction. AI is exploring the rest. Fast.
Singularity Hub warned in October 2025: "Dangerous AI-Designed Proteins Could Evade Today’s Biosecurity Software." Why? Because biosecurity tools scan for known toxins or pathogens. They don’t know what a completely new protein looks like. An AI could generate a protein that binds to human cells, disrupts immune signaling, or hijacks metabolism-all without ever resembling a known virus.
Georgia Tech’s team built "practical guardrails" into their system: if a design looks too toxic or unstable, it gets flagged. MIT’s BoltzGen includes physical constraints learned from wet-lab feedback. But not all platforms do. And open-source tools? They’re free to use. No filters. No oversight.
There’s no global registry for AI-designed proteins. No international rules. No way to track who built what. That’s a ticking time bomb.
Who’s Using This Now?
It’s not just big pharma. Academic labs worldwide are using these tools. Boltz-2, the open-source foundation for BoltzGen, is downloaded by hundreds of labs monthly. Small biotechs are licensing Integra Therapeutics’ platform. Even universities without protein labs are using AI to design enzymes for environmental cleanup.
But the learning curve is steep. pLLMs require understanding natural language processing. Diffusion models need knowledge of structural biology. You can’t just plug in a prompt and expect magic. You need to know what a hydrogen bond is. What a beta-sheet looks like. How binding affinity works.
That’s why Georgia Tech’s framework is modular. It lets biologists use drag-and-drop interfaces for high-level goals, while computational teams handle the deep code. It’s not about replacing scientists. It’s about giving them superpowers.
What’s Next?
The next leap? Tighter integration between AI design and lab testing. Right now, AI generates a protein. You synthesize it. You test it. It fails. You try again. That cycle takes months.
Soon, labs will have AI-guided robotic systems that design, build, and test proteins in hours. MIT and Georgia Tech are already building these. Imagine: you type a target. The AI generates 10 designs. A robot builds them. A machine tests binding. Within a day, you have your best candidate.
And the applications? They’re exploding. Better vaccines. Enzymes that eat plastic. Carbon-capturing proteins. Gene therapies that work without side effects. Proteins that repair damaged tissue. All built from scratch.
The grand challenge isn’t just making new proteins. It’s making them safe, controllable, and useful. And that’s where the real work begins.
Can generative AI design proteins that don’t exist in nature?
Yes. Generative AI doesn’t copy existing proteins. It learns the rules of protein structure and function from millions of known examples, then generates entirely new sequences that follow those rules. These proteins have never been seen in nature, but they fold correctly and perform specific functions-like binding to cancer cells or catalyzing chemical reactions.
How accurate are AI-designed proteins compared to natural ones?
In multiple 2025 studies, AI-designed proteins matched or exceeded natural ones in function. Integra Therapeutics’ transposases showed activity equal to lab-optimized natural versions. The Graz team’s enzymes worked faster than any previously designed. One AI protein even functioned in human T cells, where natural versions failed. Accuracy depends on the model and constraints, but the best systems now exceed 80% success in initial lab tests.
Is generative AI replacing scientists in protein research?
No. AI is a tool, not a replacement. Scientists still define the goals, interpret results, and validate designs in the lab. AI handles the overwhelming complexity of sequence space-generating thousands of candidates in seconds. But without human expertise to guide the process, validate outcomes, and ensure safety, the results would be useless or dangerous.
What are the biggest challenges in AI protein design?
The biggest challenge is precise functional control. While AI can generate stable, foldable proteins, getting them to bind a specific target or catalyze a particular reaction is still hard. Most systems require multiple rounds of testing and refinement. Biosecurity is another major concern-AI can create proteins that evade current detection systems. And not all tools are open or well-documented, making replication difficult.
Are AI-designed proteins being used in real treatments yet?
Yes. Integra Therapeutics is using AI-designed transposases in next-generation gene therapies that are entering early clinical trials. These proteins enable more precise "find, cut, and transfer" gene editing with fewer side effects. Other AI-designed enzymes are being tested in labs for environmental cleanup and industrial biomanufacturing. While no AI-designed protein has reached market yet, several are in preclinical development.
Susannah Greenwood
I'm a technical writer and AI content strategist based in Asheville, where I translate complex machine learning research into clear, useful stories for product teams and curious readers. I also consult on responsible AI guidelines and produce a weekly newsletter on practical AI workflows.
About
EHGA is the Education Hub for Generative AI, offering clear guides, tutorials, and curated resources for learners and professionals. Explore ethical frameworks, governance insights, and best practices for responsible AI development and deployment. Stay updated with research summaries, tool reviews, and project-based learning paths. Build practical skills in prompt engineering, model evaluation, and MLOps for generative AI.