Why Pandemic Preparedness Has No Evolutionary Intelligence Architecture

Prabal Chhibbar, Ph.D. Candidate
May 2026
(8 Minutes)

Viruses Win by Changing. We Win by Finding What Cannot Change.
Viruses and humans are ancient adversaries. The human immune system has become increasingly capable of mounting tailored responses to pathogens, and in turn viruses weaponize rapid evolution to evade host defences. Successful elimination of a viral infection requires finding the constraints in a virus’s life cycle, the positions it cannot change without falling apart.

My research pinpoints these positions by studying evolutionary history from two sources: sequence comparisons across all known strains of a virus, and experiments that introduce every possible mutation at each protein position to measure which changes the virus can tolerate. To ensure a treatment holds across all evolving forms of a virus, the targeted positions must be genuinely immutable. Both approaches fall short. Sequence conservation reflects historical trends, not fundamental constraints. A stable position under past pressures may be free to change if those pressures shift. I encountered this directly when positions that looked conserved across thousands of sequences turned out to be well tolerated in single-mutation experiments. But the deeper problem emerged when studying how viruses interact with host proteins: I found distant positions buffering each other, creating hidden escape pathways that neither approach can detect.

Our Tools Are Blind to the Combinations That Actually Matter
These hidden pathways are combinatorial by nature, and the math makes them experimentally unreachable. A 500-residue viral protein has roughly 10,000 possible single mutations measurable with current technology. The space of two simultaneous mutations is 50 million combinations. Three simultaneous mutations exceed 10 billion. Omicron carried more than 30 co-occurring mutations in its spike protein alone, whose joint effects are invisible to single-mutation experiments, not because the experiments haven’t been done, but because interactions between positions fundamentally cannot be recovered by measuring each position in isolation. No funding level or timeline makes this tractable. This is a mathematical wall, not a resource constraint.

Existing computational approaches for inferring how viruses will evolve require comparing the same protein across hundreds of related species to detect which positions tend to change together. For an emerging pathogen, this data does not exist. You cannot build that picture for a virus in week three of an outbreak. The very moment evolutionary intelligence is most urgently needed is when every traditional tool goes dark.

All of this points to a single explicit gap: we have no system capable of mapping the combinatorial evolutionary possibilities of an emerging pathogen in real time.

AI Does Not Speed Up the Old Approach : It Makes the Impossible Possible
Five years ago, this inference was not possible. What changed is the arrival of protein language models. AI systems trained on billions of protein sequences from across all of life encode the statistical logic of how proteins can change, across every organism ever sequenced, not just a single viral lineage.

This is not AI making an existing method faster. It is AI making tractable an inference problem that had no solution: recovering the combinatorial rules of viral evolution from sequence alone, without needing thousands of related species, without exhaustive experiments, and without waiting for a virus to show us its hand after the fact. By measuring how a change at one position ripples through a model’s predictions across every other position, we can map which positions are functionally linked — the hidden coordination network that single-mutation experiments cannot see. Structure-prediction models such as AlphaFold 3 add a complementary layer, encoding physical interaction constraints that sequence patterns alone cannot capture. The models exist now. The infrastructure to deploy them as a shared public resource does not.

Here Is How We Would Test It
The minimal experiment requires no organization, no funding, and no new data. Take the original strain of HIV. Run cross-Jacobians on ESM-2 , a calculation that completes on a standard research computer in hours  to generate a ranked list of which pairs of positions are most functionally coupled. Check whether those predicted combinations match the mutations that actually arose during HIV’s adaptation in humans, using existing published datasets. If the model recovers the known evolutionary trajectory above chance, the core hypothesis holds. If it does not, the premise is falsified before anyone has spent a dollar on infrastructure. This is a weeks-long experiment, not a years-long program, and it either works or it does not.

From that starting point, the same test runs on SARS-CoV-2 and influenza to check whether the signal generalizes across viral families. Predictions can then be spot-checked against targeted experiments on a small prioritized set of combinations, sidestepping the need to test the entire mutational landscape of any protein. At the infrastructure level, a pilot system across these three viral families tests whether a centralized pipeline generates predictions useful to independent labs that did not build it. Those labs receive ranked mutation combinations, test them, and report back whether the prioritization added value. Failure here falsifies the premise and redirects toward other approaches.

No Existing Institution Can Build or Maintain This
What prevents this system from existing is not a coordination failure. It is a structural gap between what the problem requires and what every existing institution is designed to provide.

Research grants fund time-limited projects with defined endpoints. A real-time evolutionary inference system has no natural endpoint, it needs to be running before an outbreak, updated continuously during one, and maintained between outbreaks. The moment funding lapses, the maps go stale. Industry has no incentive: there is no proprietary position to hold and no commercial return that justifies ongoing costs during quiet inter-pandemic periods. Academia cannot self-coordinate at outbreak speed. Individual labs publish and move on, and even well-intentioned collaborations collapse under data-sharing delays and misaligned incentives. The history of pandemic science is littered with coordination efforts that worked in retrospect and failed in real time.

The Protein Data Bank solved the same structural problem: a resource every research group benefits from but no single group could justify maintaining, which required a purpose-built organization independent of grant cycles. Real-time viral evolutionary inference needs the same solution.

What Needs to Be Built?
If the pilot validates the signal, it immediately exposes a problem the pilot itself cannot solve: who runs this continuously? A cross-Jacobian calculation on one protein in one lab is a proof of concept. A system that ingests global surveillance data nightly, updates constraint maps across hundreds of viral proteins, and pushes ranked predictions to partner labs before the next variant emerges is an operational commitment and that commitment has no home in any existing institution. The pilot does not grow into the infrastructure on its own. It reveals why the infrastructure requires a dedicated organization to exist at all.

A dedicated organization with one mandate: maintain evolutionary constraint maps for high-priority viral families as a public resource, update them continuously, and connect predictions with experimental labs that validate and sharpen them. Viral sequences from global surveillance networks feed into a nightly inference pipeline; outputs ranked mutation sets and flagged high-risk combinations are pushed to partner labs via open access. The pilot covers three viral families. The mandate covers the entire infectious disease landscape.

In week one of a novel outbreak, the model draws on built-in protein knowledge to generate a first-pass constraint map with no prior information about the new virus. By week three, incoming surveillance sequences refine the estimates and flag positions where the virus is already diverging from predicted constraints. By month two, partner labs are running targeted experiments on the highest-priority combinations and feeding results back to sharpen the model. This is not a response to the outbreak. It is a prediction loop running alongside it.

Every Day This Does Not Exist, a Virus Can Evolve Faster Than We Can Think
It is worth noting that the combinatorial inference problem this infrastructure solves is not unique to pandemic surveillance. Cancer evolution, antibiotic resistance in bacteria, and protein engineering for therapeutics all face the same mathematical wall, sequence spaces too vast to measure exhaustively, with interactions between positions that single-measurement approaches cannot recover. The infrastructure proposed here is a general solution for any biological system where combinatorial sequence space exceeds experimental reach. Pandemic preparedness is the most urgent application, but the same pipeline applied to cancer neoantigen evolution or emerging antibiotic resistance patterns would generate equally actionable constraint maps. The priority is infectious disease. The scope is broader.

Pandemics persist not because viral evolution is unpredictable, but because our scientific infrastructure is not designed to predict it. The combinatorial space is too vast to measure. The evolutionary history of emerging pathogens is too sparse for traditional tools. And our measurement systems are oriented entirely toward the past. The tools to infer the rules of viral evolution now exist. What does not exist is the infrastructure to make that inference continuous, shared, and actionable. Every day that gap remains open is a day a virus can evolve faster than we can think.

Leave a comment