Thirteen genetic sequences – isolated from people with COVID-19 infections in the early days of the pandemic in China – were mysteriously deleted from an online database last year but have now been restored.
Jesse Bloom, a computational biologist and viral evolution specialist at the Fred Hutchinson Cancer Research Center in Seattle, found that the sequences were removed from an online database at the request of scientists in Wuhan, China. But with a little internet sleuthing, he was able to restore copies of the data stored in Google Cloud.
The sequences don’t fundamentally change scientists’ understanding of the origins of COVID-19 – including the difficult question of whether the coronavirus spread naturally from animals to humans or escaped in a laboratory accident. However, its deletion heightened concerns that the Chinese government’s secrecy has hampered international efforts to understand how COVID-19 came about.
Bloom’s results were published in a preprint paper that has not yet been peer-reviewed and was released on Tuesday. “I think it certainly coincides with trying to hide the sequences,” he told BuzzFeed News.
Bloom found out about the deleted data after reading an article by a team led by Carlos Farkas of the University of Manitoba, Canada, about some of the earliest genetic sequences of SARS-CoV-2. Farkas’ paper described sequences taken from hospital outpatients in a project by researchers in Wuhan developing diagnostic tests for the virus. However, when Bloom tried to download the sequences from the Sequence Read Archive, an online database owned by the US National Institutes of Health, he received error messages stating that they had been removed.
Bloom realized that the copies of the SRA data are also maintained on Google servers and was able to figure out the URLs in which the missing sequences could be found in the cloud. In this way, he obtained 13 genetic sequences that could help answer questions about the development and origin of the coronavirus.
Bloom noted that the deleted sequences, like others collected later outside of town, were more similar to bat coronavirus – believed to be the ultimate ancestor of the virus that causes COVID-19 – than sequences created with the Huanan Seafood Market in Wuhan were connected. This adds to previous suspicions that the fish market may have been an early victim of COVID-19 and not the place where the coronavirus first jumped from animals to humans.
“This is a very interesting study by Dr. Bloom, and in my opinion the analysis is completely correct, ”Farkas told BuzzFeed News via email. Scott Gottlieb, former head of the Food and Drug Administration, also praised the results on Twitter.
However, some scientists were less impressed. “It really doesn’t add anything to the origins debate,” Robert Garry of Tulane University in New Orleans told BuzzFeed News via email. Garry argued that the Huanan market or other markets in Wuhan could still be the source of COVID-19.
Bloom is one of 18 scientists who published a letter in May criticizing the WHO and China study on the origins of SARS-CoV-2. The scientists argued that the WHO-China report did not “balance” the competing ideas that the coronavirus transmitted naturally from animals to humans or escaped from a laboratory – a theory the report considered “extremely unlikely “Considered. Following the publication of the WHO-China report, the US and 13 other governments complained that they “lacked access to complete original data and samples”.
The deleted virus sequences were first uploaded to the SRA in early March 2020, around the time that researchers led by Yan Li and Tiangang Liu from Wuhan University published a preprint describing their work with genetic sequencing to diagnose COVID-19. Just a few days earlier, China’s State Council had ordered that all papers related to COVID-19 be centrally approved.
The sequences were then withdrawn from the SRA in June, around the time the final version of the paper appeared in a scientific journal. According to the NIH, the authors requested the removal of the sequences. “The requester stated that the sequence information was updated, sent to a different database, and wanted the data removed from SRA to avoid version control issues,” NIH spokeswoman Amanda Fine told via email BuzzFeed News.
However, it is unclear whether the sequences have since been put online in another database.
“There is no plausible scientific reason for the deletion,” wrote Bloom in his preprint, arguing that the sequences were likely “deleted to disguise their existence”. That indicated, he wrote, “a not entirely serious effort to follow the early spread of the epidemic.”
Although the sequences were deleted, Garry pointed out that key genetic mutations they contained were still published in a table in the Wuhan team’s thesis. “Jesse Bloom has found nothing new that is not already part of the scientific literature,” Garry told BuzzFeed News, accusing Bloom of writing his preprint in an “inflammatory way that is unscientific and unnecessary.”
Bloom wrote to the Wuhan researchers asking why the sequences had been deleted, but received no response. Li and Liu also did not immediately respond to a request from BuzzFeed News.
This is not the first time scientists have raised concerns about the removal of data that could help answer questions about the origins of COVID-19. The main database of information on coronavirus sequences, maintained by the Wuhan Institute of Virology – which has been at the center of speculation about a possible “laboratory leak” of the virus – was taken offline in September 2019. The origins of the pandemic visited the institute in February, they were told that the database, which allegedly contained data on 22,000 coronavirus samples and sequence records, had been removed after repeated hacking attempts.