Deciphering the tracks left in the genome of SARS-CoV-2 & drafts of DEFUSE
Jan 26, 2024
Without any formal training, I learned how to track animals as a natural consequence of my love of the outdoors, my curiosity about the lives of animals, and my statistical mind. Having grown up spending considerable time catching lizards, I spent enough time in my childhood, adolescence, and adulthood roaming the woods, observing all the curiosities I could find, wondering how various curiosities along the path came to be.
Over the years, the combination of seeing things in nature & reading about what can be found in nature provides one with an expanded intuition about what’s out there, a sixth sense of how nature looks when it’s undisturbed, and what disturbances in the soil, snow, brush or scent in the air mean about the world around you.
I started considering myself an amateur tracker in college when my parents’ dog ran away one snowy morning on the family farm in New Mexico. We had a beloved, jolly, rotund black lab + rottweiler mix named Cici who I let outside to enjoy the snow while I cleaned the house. An hour later, I opened the door, called Cici, and waited long enough to worry that something might have gone wrong.
I looked down at the snow and saw Cici’s tracks as they ran playfully to meet with the neighbor’s dog, Red. Cici and Red’s tracks sniffed around a dense thicket of oaks before crossing the creek and journeying towards the road. When their tracks hit the dry road, they disappeared, forcing me to make an educated guess whether they went left or right. Based on the direction of travel 100m leading up to the road, I gave it an 80% chance that they went right, so I went right while firmly recording this decision in my brain so that, should I find no further tracks after a mile down the road (for dogs love to sniff the bushes & not walk in the dead center of the road), I could turn around and explore the 20%-weighted theory that the dogs went left.
About 500m down the road, I see Cici’s fat, jolly front paw hit the snow on the opposite side of the road, and then in an explosion of enthusiasm both Cici and Red’s tracks went up the hill. I followed them through patches of snow and mud, charting courses across sandstone bluffs where subtle traces of muddy paws faded into rock yet provided a vector across the stone.
Following their noses, Cici and Red pursued the musk of cows heavy in the air, and their tracks entered a muddy, messy, superhighway of cow tracks so wet & trodden the dogs’ tracks became indecipherable in the mush. I paid close attention at the branching network of swampy, stinky, cow-shit trails, as each intersection raised my uncertainty about the bearings and location of the dogs by providing alternative hypotheses for where the dogs may have gone. Every 100m or so I would find what looked like the nails of a dog’s paw piercing the muck, adding subtle but not confirmatory weight to my hypothesis that I was still on the right track.
As I stared desperately at the mud, now miles from home and fearing Cici had gotten into trouble with cows or their ranchers, I saw a single fearless, recent print of Red’s intolerably muddy paw in the snow. Instinct compelled me to look up. I saw a wall of cows standing beside a chicken coop, a phalanx of bovines representing a herd of over 100 cows trampling several hectares of snow into muddy, indecipherable mush.
The cows weren’t too disturbed… the chicken coop was safe and sound… but the dogs were nowhere to be found. Recording red’s paw print in my mind, I took a gamble that the dogs may be somewhere in this massive valley and a nearby hillside may provide a vantage point from which I could call their names. I scrambled up the hill over 5 miles from the farm, looked out over the fields of cows and sage and desert grass below, and saw Cici and Red gallivanting onwards towards the mountains yonder. I was crying tears of joy as I shouted their names at the top of my lungs, and you could see the sound travelling through the air until my voice punched their stubborn, idiotic, beloved ears half a mile away. They turned around and sprinted towards me, eager to regale the tale of their adventures.
The Hunt for SARS-CoV-2 Origins
Tracking is now something I can’t not do. Every disturbance in nature has a story, every footprint in the trail has a journey, every smear of rubber on the sidewalk has an origin. Tracking has become something of a way of being, a special lens that converts the physical state of the world around me into a kaleidoscopic, multi-dimensional probabilistic story of space, time, and the creatures who came before me. With this way-of-being, I’ve developed a profound, spiritual love of hunting as a way to connect the food I eat with the story I live, as crossing paths with prey forces us to have lived a brief chapter of our lives together.
Elk hunting is without a doubt my favorite way of experiencing life. Bow-hunting in the fall is a delightful way to study the patterns of life of rutted elk, prowl through fall foliage, and get up-close and personal with the animals that feed you for the year ahead. However, there’s also something uniquely sublime about rifle hunting later in the season when the mountains are white, the inhospitable cold consumes the world, and the stories are frozen in snow.
Nothing records the recent memory of the mountains better than the snow. When I wear the lens to conjure the kaleidoscopic stories of the past from the white winter world, I pay special attention to subtle features in the snow. To project from disturbances on a white sheet to 3D creatures in time and space, I must know the animals in the area, the size & shape differences in the hooves of deer vs. elk, the various strides, and the history of the snow itself. If a track is well-defined in crusty snow after a night below freezing, the animal likely came the afternoon before when the snow was melting; if fine grains of powdery snow still glitter at the toes of a track on a windy day, the track is recent, as otherwise those fine grains would have blown or melted away. Sometimes you can even see fresh tracks from 100 yards away simply by the way the snow glitters.
The search for the origins of SARS-CoV-2 feels like a hunt. Once we examine the geographic, genomic, epidemiological and other evidence to rule out a natural origin for a virus, one must commit to a particular area of inquiry like the right-turn on the road for Cici. The evidence available in 2022 was enough to shift my strategy from an examination of natural evolution to a hunt for the humans who made it. As with the right-turn searching for Cici, I recorded in my mind the decision made, should we not find any figurative tracks farther down this road.
To track a research-related origin of SARS-CoV-2, we must pay close attention to subtle details, we must know our quarry by studying the research methods and research programs proposed before COVID. During elk hunting season in October 2022, as Montanan mountains were coated in white, Valentin Bruttel, Tony VanDongen and I published a paper finding evidence of a particular method of assembly etched in the genome of SARS-CoV-2.
Our finding is subtle, subtle enough to lurk unnoticed in the genome for two years during a pandemic when the entire world was looking at this exact same genome. It took a trained eye of bioengineers like Valentin & Tony to spot the track in the genome, and a biology-fluent, statistical mind like my own to estimate its significance. In our paper, we estimate the tracks we found are very unlikely to occur in a natural virus and correspond neatly to prior art for how researchers modified viruses in the lab, the same researchers who proposed to insert a furin cleavage site in a bat SARS-related CoV in Wuhan’s inadequately biosecure labs.
Our hunt for the true origins of SARS-CoV-2 started off by carefully studying the research program we hunted. A research proposal called DEFUSE aimed to insert transmissibility-enhancing chunks of genetic material into the genomes of SARS-related coronaviruses (SARSr-CoVs) in Wuhan. SARS-CoV-2 is unique among its vast clade of relatives called the sarbecoviruses in that SARS-CoV-2 has that selfsame transmissibility-enhancing chunk of genetic material, a “furin cleavage site”, in the precise location where DEFUSE proposed to insert it. The furin cleavage site alone is significant evidence consistent with a laboratory origin, especially in light of DEFUSE.
When I joined the hunt in 2022, we knew that SARS-CoV-2 emerged in Wuhan, far from the hotspots of wild sarbecoviruses and at the doorstep of the biggest laboratory collection of wild sarbecoviruses in the world. The labs in question were well-known to modify bat SARS coronaviruses, and understanding their research is necessary to properly read these tracks. The closest publicly known relative to the virus at the time of emergence existed in the Wuhan Institute of Virology, unpublished since 2013, and revealed as part of a collaboration with Australian scientist Edward C. Holmes. The early outbreak in Wuhan left no traces of a larger animal trade outbreak, no tracks of infections concentrated in animal handlers like the civet handlers of SARS-CoV-1. The furin cleavage site in SARS-CoV-2 is highly unusual for a coronavirus in having particular codons – CGG-CGG – that are the rarest form of codon encoding arginine in sarbecoviruses, but the most common codons researchers would use to make a human-optimized chunk of genetic material.
Some researchers, including Eddie Holmes who collaborated so closely with the Wuhan Institute of Virology and other People’s Liberation Army scientists, tried to point us the other way. They tried to tell us there were no tracks, that we should disbelieve our eyes and noses to instead fixate on the wet market, away from the labs. However, we have full confidence in our eyes & our noses, our methods and our minds. Their methods were flawed and their conclusions were unjustified; the wet market was a ruse, even Andersen knew a lab origin was “so friggin’ likely” and Holmes’ undisclosed conflict of interest raised separate questions about possible alternative motivations in this matter. The strongest scent was down the lab-leak path, this we knew.
There was other evidence that floated around our minds, but this is the gist of it. When I returned from the Crusades of outbreak forecasting and public health policy debates & turned my attention to the origins of SARS-CoV-2, that was the lay of the land. This is where I picked up the tracks and became acquainted with the other trackers at the scene.
As I mingled with sleuths and forensic scientists on Twitter, I came across Valentin and Tony posting figures of a highly unusual track in the SARS-CoV-2 genome. Valentin & Tony had examined DEFUSE closer and noted that DEFUSE proposed to modify coronaviruses using a very specific technology called “infectious clone technology” and “reverse genetics systems”.
To build reverse genetic systems, pre-COVID researchers would typically look at the genome of a coronavirus on a computer screen, move around particular molecular “cutting/pasting” sites to be more regularly-spaced, order overlapping blocks of DNA whose ends contain the cutting/pasting sites, cut at these sites, and then paste the overlapping segments of DNA together, slowly building the full-length DNA clone like a Lego tower. With a full-length DNA clone, researchers transcribe the DNA to RNA, insert the RNA in a cell, the cell translates the RNA, and *poof* a virus is born by immaculate conception of modern biotechnology.
Prior work. In 2016, the Wuhan Institute of Virology rescued a bat SARS coronavirus using infectious clone technology. They had a genome of the virus on their computers (A), modified BglI cutting/pasting sites (B), and used these modified sites to construct a full-length DNA clone of the virus docked inside a bacterial artificial chromosome or BAC (C). If this virus emerged to cause a pandemic, we would recognize it by the scars they left in the genome.
Valentin and Tony, meanwhile, observed this highly unusual spacing in BsaI and BsmBI map in SARS-CoV-2, and hypothesized this could indicate a synthetic origin of SARS-CoV-2. Under this hypothesis, the pandemic virus may not have originated in the time-honored life cycle of RNA viruses in cells making RNA viruses in cells, but a genome on a computer converted by scientists to a full-length DNA clone transcribed to RNA and shoved inside an electrocuted (“electroporated”) cell forced to summon this Frankenstein virus into existence. SARS-CoV-2 may have been born in a lab.
As someone fluent in both biology and statistics, I saw the merit in Valentin & Tony’s observation and felt this track warranted closer study.
Valentin, Tony, and I studied the tracks that reverse genetic systems left in the genomes of infectious clones pre-COVID. We looked at the pre-COVID DNA clones of coronaviruses and found they all had unusually evenly-spaced cutting/pasting sites and all of those sites were dusted with the silent mutations that glowed above the tracks like the fresh, glittering snow. SARS-CoV-2 appears anomalous among wild coronaviruses in every way it is consistent with a reverse genetic system. For these two very common molecular scissors – BsaI and BsmBI – SARS-CoV-2 has a very even spacing of cutting/pasting sites cleaving the virus into 6 segments, and all restriction sites moved around contain a significantly higher concentrations of silent mutations than we would expect by chance. We used standard methods to estimate the odds of these patterns under a natural origin to be very low – from 1 in 350,000 to 1 in 20 million odds, depending on how we estimate the odds.
The BsaI + BsmBI map of SARS-CoV-2 is anomalous in its even-spacing, and more importantly in forming 6-segments where the longest segment is unusually short. The longest segment plays the largest role in determining whether/not researchers are able to assemble the CoV in a plasmid or bacterial artificial chromosome, hence researchers following this path will shorten the longest-fragments to reduce the odds of toxic elements or unfaithful assemblies.
Recently, Freedom of Information Act (FOIA) requests by Emily Kopp at US Right to Know obtained earlier drafts of the DEFUSE grant. Emily Kopp has also been listening to the open-source discussions amongst us trackers, and she knew that DEFUSE may have been the research path that caused the pandemic, so she searched for digital tracks in prior drafts of DEFUSE. While we were looking forward in time from DEFUSE to the infectious clones it could have made, Kopp went backwards along the path in hopes there could be prior tracks giving a clearer picture of the direction DEFUSE collaborators were hoping to go.
In these early drafts of the grant in question, researchers provide more details about their planned reverse genetic methods: they propose to construct their SARs-related coronaviruses using 6 segments, just like the 6 segments we find for SARS-CoV-2, and in their budget they list the enzyme BsmBI, one of the two for which we find this pattern. These FOIAs corroborate our finding – if you asked zoonotic origin proponents what we would find in more detailed DEFUSE methods, they wouldn’t know. Under our theory, “6 segments” and “BsmBI” would be there.
There’s still room for uncertainty and there’s still value in obtaining more information. The hunt goes on, although those of us tracking this trail have high confidence that we’re on the right path, especially as more and more findings corroborate our theory. However, we must keep in the back of our minds that there are other plausible interpretations for these statements. Maybe the researchers were referring to another 6-segment SARSr-CoV assembly. The Wuhan Institute of Virology’s last pre-COVID published infectious clone, rWIV1, was intended to be assembled with 6-segments two years prior to DEFUSE, but instability in segment C forced them to split C into C1 and C2, making 7 segments, so 6-segment assembly does not refer to rWIV1. Additionally, it’s possible the researchers were using BsmBI as a placeholder for cost-estimates of these enzymes as BsmBI is one of the most popular enzymes on the market. Finally, the draft doesn’t order BsaI, the other enzyme in our analysis, so we don’t have a complete picture.
However, sometimes when tracking we can arrive at the right conclusions even without the complete picture. You don’t have to see every track in Cici’s journey to deduce where they went. Evidence doesn’t always come in the form of “smoking guns”, sometimes dogs wander along mucky and indecipherable trails for a while, and criminals have been convicted even when the guns are cold or buried or undiscovered at the bottom of a lake. We have a very rich picture already of the scene of the crime as the virus emerged in Wuhan without an animal trade outbreak containing the exact genomic features seen nowhere in other wild SARS coronaviruses, all of which are detailed in a grant from 1 year before the virus emerged.
We didn’t always have DEFUSE, and that in & of itself is noteworthy. The grant itself had to be pried from the unwilling hands of EcoHealth Alliance’s president Peter Daszak, who refused to disclose this work while he appointed himself to be the US emissary to the WHO’s COVID origins investigation, and accepted appointments to lead The Lancet’s task force investigating COVID-19 origins. Even before DEFUSE, a lab origin was likely based on the geography, the unusual epidemiology lacking reservoirs or a broader animal trade outbreak, and the furin cleavage site showing up at the doors of the Wuhan Institute of Virology. Since 2020, the evidence for a laboratory origin of SARS-CoV-2 has seen a subtle but continuous accumulation corroborating the hypothesis that DEFUSE-related work generated SARS-CoV-2. The recent FOIA’s combine with our paper to strengthen the connection between the DEFUSE grant and the origin of SARS-CoV-2. We may not know who held the pipette, but we are increasingly confident they were aware of DEFUSE in 2019.
Our forensic examination of the SARS-CoV-2 genome has assisted investigative journalists’ efforts to obtain documents and contextualized otherwise obscure scientific jargon in 1,400 pages of FOIA’d emails. Investigative journalists’ & congressional investigators’ dogged pursuit of digital fingerprints, emails, Slack messages, and more has provided critical information of scientific value for those of us studying the tracks of SARS-CoV-2 and its creators.
The hunt for the true origin of SARS-CoV-2 goes on yet the theory of a lab origin appears to be on the right track. Our adventure is not over as those who likely created SARS-CoV-2 have not yet been forced to sit down in a court of law, compelled via discovery to disclose everything they knew, and granted their constitutional right to due process and a trial by jury, nor have the victims of COVID-19 been provided the full answers for why their loved ones were sick, hospitalized, or died.
Titans in a Tempest
Those of us tracking SARS-CoV-2 are aware that we are hunting our prey as tempests gather. Months before our study was published, Russia invaded Ukraine. Since our study was published, China began intimidating Taiwan and increasing its territorial aggression in the South China Sea. Hamas slaughtered Israelis & took many hostage, Israel invaded Gaza, Palestinians fled their homes, Iran-backed Houthis began attacking ships in the Red Sea and Gulf of Aden, and there is a US presidential election this year. I wish this news could come during calmer weather when the temperatures of the world were lower, and none of us searching these tracks desire for any further loss of life. Experiencing one of the four horsemen was bad enough, we do not need to release War or Famine in exchange for Pestilence.
While we are acutely aware of the weather, we can’t help but follow these tracks where they lead. While great nations are trapped in Nash Equilibria of deterrence in a multi-polar world, the world is made of more than great nations as across these great nations there are small scientists and journalists locked in Nash Equilibria of their own. Our work was insulted and ridiculed by many virologists with mainstream media connections and conflicts of interest on the topic of SARS-CoV-2 origins, as even Professor Edward C Holmes was working with the Wuhan Institute of Virology to collect bat SARS-related CoVs at the time of the pandemic, and even he may have communications which reveal more evidence of the intentions or progress of researchers in Wuhan. Even Peter Hotez was subcontracting work to Zhou Yusen, the Wuhan Institute of Virology scientist who is mysteriously dead, reportedly having “fallen” from a rooftop. As rain falls in Ukraine, storm clouds gather in the Middle East and thunder booms in the South China Sea, there are those of us tracking SARS-CoV-2 origins who feel this issue could so easily be forgotten, emails deleted after 7 year Capstone policies, tracks washed away in the storm, should we pause our hunt and seek shelter from the storm.
There is a tragic vindication in the new FOIA’d drafts of the DEFUSE grant. Yes, our theory is corroborated in a manner one would only guess if they had seen past the insults and attempts to discredit us when our paper first emerged. Only those who read DEFUSE and read our paper closely with open-minds knew that it was likely researchers kept along this path, and thus we are not surprised to find “6-segments” and “BsmBI” in these earlier drafts because the odds of this pattern occuring by chance in nature were, by our estimates, very low.
As Nicholas Wade emphasizes, this is “The Story of the Decade.” It is a tragic story of unmanaged risks of research, NIH and NIAID funders bypassing democratic processes to overturn moratoriums on risky research, failures in cooperative threat reduction as China’s PLA fails to bolster the biosafety of its labs and be transparent about the nature of work being conducted in Wuhan’s labs; it is a story of tempestuous geopolitical times as a pandemic emerged during a trade war between Trump and Xi Jinpeng, a story of immense global stakes and clashes between titans where, if you zoom in close, you will find a small story of little people with few resources studiously examining prior work and finding subtle traces of glittering genomic dust in the genome of SARS-CoV-2 revealing the possibility that <zoom out> 20 million people died due to a laboratory accident.
While Cici’s adventure ended beautifully, the story of the hunt for SARS-CoV-2 origins is not a happy story, at least not yet, not in this weather with our quarry still out there. There is no joy in vindication given the stakes and combustible possibilities of inflammatory justice during this geopolitical storm. As we look up from the tracks in the snow, we don’t see happy dogs or muddy cows. We see dark clouds, rockets, bombs, rising temperatures, and a whole world gone mad.
One hopes and prays that their science is, in the end, good for the world. That’s all I ever wanted as a scientist. At a time when we may hope and dream that our findings could be celebrated, our insight rewarded, our careers marginally advanced and our reputations bolstered for our independence, rigor, and creativity, my heart is heavy with worry that we uncovered tracks we were never intended to find. The path ahead is more perilous than we could have known from our positions in the public domain hearing from ODNI that the US intelligence community is divided and uncertain on this question for which there is enough evidence to be more certain.
One is left with uncomfortable questions about why these tracks lurked in the public domain for two years, and why we’re only learning more through FOIAs from investigative journalists, and what that means for attribution of non-natural biological agents, a cornerstone of deterrence. Why is Peter Daszak, the PI of DEFUSE, not facing trial given the strength of evidence before us? Why would NIAID not disclose the full details about their grant funding DEFUSE collaborators in 2019? Why would China react so aggressively against Australia once Australia called for investigations into the origins of COVID-19? Why would scientists so close to the titan of NIAID react so aggressively to small scientists like us with our finding of modest snow-dust in SARS-CoV-2? Why did Peter Daszak tell his colleagues to delete “China Genbank Sequences”, write a letter to The Lancet claiming lab origin theories are “conpsiracy theories”, position himself as the US emissary to the WHO and the head of The Lancet’s COVID origins investigation, all without disclosing DEFUSE, and his until recently undisclosed plans for 6-segment assembly of a bat SARSr-CoV in Wuhan + orders of BsmBI?
What else are we going to learn from FOIA’s and compelled discovery? The greater the coverup, the greater the crime. Accidents happen, and had people acknowledged this accident as such in November 2019, we could have contained the outbreak, or the world could’ve had a full, honest account from researchers, funders, and all other parties involved to cooperate on further laboratory-related threat reduction. Instead, the first impulses to cover up the origins have trapped scientists and possibly governments into the Nash Equilibrium of blockading information, but the pressure of evidence builds and the dam will break. The preponderance of evidence points confidently towards a DEFUSE-related lab accident, but it’s not clear why it was covered up, how much more pressure has bottled up as a consequence of the coverup, or how we can regain trust in scientists & their institutions for funding & risk-management absent a global account. We need to sit the researchers and funders down in an international court with plea deals, not for retributive justice but for restorative truth, to share the full truth of what DEFUSE PI’s and their funders did in Wuhan in 2019 and how we can prevent such an accident & coverup in the future. Only with such a global account and process for truth & reconciliation do I see hope for unbottling the truth without it blowing up. Accidents happen, and those taking risks must bear the costs of their accidents, but coverups from people in positions of public trust or people receiving public funds for their risky work are unforgivable for they erode the trust the binds our society together. We need truth & reconciliation for trust & biosafety.
Our hunt continues, but I’m increasingly fearful that our quarry is larger than what we planned to hunt. While the tracks are clear, I’m not confident meadows ahead contain only a tiny lab and small researchers who made a mistake. Our careful attention to detail may have led us to titans in a tempest. Only with a gentle titan on our side can us tiny researchers resolve this dispute and catch the quarry hopefully in a way that increases cooperation, and not hostilities, amongst the titans.