DANS in the NRC newspaper
On the 13th of April last, the NRC published a background article on ‘what goes wrong in data research’. For this, Ingrid Dillo, Senior Advisor at DANS, was interviewed by Laura Wismans. In this article, Ingrid Dillo explains that available data are not always of good quality and that there is still much work to be done in the area of sharing and reusing data.
DANS has made the article available below by translating it into English. No changes have been made to the original content.
A study of herpes virus raises eyebrows and shows what goes wrong in data research
Source: NRC by Laura Wismans
Data research – It was an exciting discovery, but it turned out to be poorly executed. A study into the herpes virus reveals where things can go wrong in data research.
The cold sore virus can play a good game of hide and seek. Where this virus, herpes simplex (HSV), hides in the body has been clear for decades, but how exactly it hides remains mysterious. When scientists in the United States reported last year, based on a new technique with large amounts of data, that the virus hides in more places than previously known , herpes researchers worldwide were immediately interested.
Could this be the reason why it is so difficult to make vaccines? Should the conventional approach to making antivirals change?
“It was a very exciting invention,” says Georges Verjans, principal investigator of Herpeslab NL at Erasmus MC in Rotterdam. “It just turned out to be untrue. The research was poorly conducted.” But before Verjans and his colleagues figured that out, there was first a tug-of-war over the data and then extensive reanalysis was needed. After that, correcting the erroneous conclusions also proved to be a big hurdle.
According to them, the case exposes a bigger problem: the handling of scientific data is still far from being in order. The control mechanism does not work properly when it comes to research with big data, while it is increasingly being worked with and far-reaching conclusions are drawn from it.
Ingrid Dillo, senior advisor at DANS, the institute of KNAW and NWO that helps researchers make their data available for reuse, also recognises this . “Progress is definitely being made, but internationally and across disciplines, there are large differences in how data is handled, and everywhere there are significant obstacles. For example, researchers are hardly rewarded for making their data accessible.”
Fighting is complicated
What exactly happened in the herpes study that caught the attention of the Rotterdam virologists?
First, a bit about HSV, one of the nine herpesviruses found in humans. It shows itself in a cold sore. But about 70 percent of people carry the virus for life, and a large part of them never get a cold sore. They still spread it, which makes combating it complicated.
“Since the 1980s, we have known that the dormant virus retreats into the nerve pathways,” says Werner Ouwendijk, virologist at Erasmus MC and colleague of Verjans. “But how HSV is kept under control there, and whether other cells play a role, we don’t know. That’s what we, and many other researchers, are looking into.”
The researchers in the US were also searching, in mice. They used a fairly new molecular technique: single-cell sequencing. This makes it possible to find out for individual cells whether they contain the virus or not. From the mice, a nerve node was taken – “it easily contains 50,000 nerve cells and many other types of cells, in total maybe a few million” – and with the help of enzymes, all the cells were cut loose. This was followed by some filtering steps, and each cell was given a kind of barcode. Measurements were then taken for each. “It is a great method,” says Ouwendijk. “Because it generates a lot of data that can be viewed from many angles, it is useful for forming new ideas.”
The Americans saw something remarkable: the dormant virus was not only found in the nerve cells but was also seen in different types of immune cells. In January 2023, they published about it in Science Advances.
“We were not immediately suspicious,” says Ouwendijk. “We found the fact that it was also present in other cells especially interesting, we were keen to investigate it further.” When reading the article, eyebrows were raised in Rotterdam.
“If you make such an important discovery, you want to confirm your conclusions with one or two other independent methods,” says Verjans. With plain histology, for example. “In a slice of tissue, you then seeif the virus is also present in the immune cells there. But such confirmatory research was not done here at all. We found it strange that such an important journal did not request that.”
They brought together an international group of HSV researchers to review the American research. But when they requested the data from Science Advances, they did not get it right away. After asking several times, a link followed but to the raw data only . All sorts of things appeared to be wrong with the data.
Cells stuck together
“Included in the dataset were a lot of dead cells and cells stuck together,” says Ouwendijk. “Those dead cells are a problem because the virus particles can easily join other cell material, and those stuck-together cells are problematic because it is no longer possible to distinguish to which cell a measurement exactly belongs.”
According to the researchers, this is a logical explanation for the conclusion that the dormant virus was also in immune cells. In reality, it need not have been in there at all. “In fact, they should have stopped as soon as they found out that there were too many dead cells in the sample,” says Verjans.
In addition to criticising the researchers, Verjans and Ouwendijk are also critical of the role of Science Advances. It should have been stricter. “It’s not good that we had so much trouble getting the data,” says Verjans. “If there are no data attached to a study, a journal should send it back, not send it out for review.”
In the case of such a large-scale data research, substantive review of the data and analysis steps should also be appropriate, according to the virologists. “You cannot expect every peer reviewer to have this expertise,” says Verjans. “That expertise should actually be added. Journals also have people who look at the ethical and statistical side of research, why not at the data?”
Much-discussed topic
Dealing with data is a hot topic throughout science. Thanks to new techniques and more computing power, countless disciplines are working much more with data these days. “Sharing findings and data is considered very important,” says Ingrid Dillo of DANS. “It allows scientists to check each other and build on each other’s work.”
Ten years ago, a catchy acronym was coined. Research data should be FAIR: findable, accessible, interoperable and reusable. “Not that data was not thought about before that,” says Dillo, who was present at the international meeting in Leiden where FAIR was introduced. “But the acronym really caught on, it was embraced by policymakers, funders and faculties. However, they are not rules, but guidelines. That is an important difference because it can be interpreted in many ways. What exactly is findable? That is not defined. When are data FAIR enough? We first published about it for the first time in 2016, and there is still debate about it today.”
Dillo knows that there are big differences worldwide in the attention given to research data. “For example, digital competence centers have now been set up at Dutch universities to assist researchers. Funding has been made available for this from the NWO. This is not happening everywhere in Europe. We are working on a European Open Science Cloud, though. imilar developments can be seen in the US and Australia, but less so in Asia and Africa.” However, Dillo warns that the good public availability of data does not necessarily indicate quality. “A neat dataset may well emerge from scientific gibberish.”
Still, data failures are not reported on a regular basis. Some recent examples do stand out: The Lancet and The New England Journal of Medicine both had to retract a paper on Covid medications in 2020 because underlying data were kept secret, and later turned out to be flawed. And in 2023, a paper on superconductivity that had appeared in Nature in 2021 was withdrawn because the data behind it had been tampered with.
I think a lot of bad data research goes unnoticed,” says Dillo. “When it does stand out, it is usually research that has startling findings. Especially in research that turns a field upside down, there should actually bean earlier alarm bellat the journals, but of course, they are keen to publish.”
According to Dillo, both researchers, their institutions and the journals bear responsibility for ensuring that the data are in order. She is pragmatic about whether this will improve in the short term: “It requires such a large investment, a separate type of academic staff at universities, a battalion of specialised reviewers at journals, that it will be difficult to improve this quickly. People often think it’s mainly a technical question, that complex infrastructure needs to be put in place. But the most complicated thing about all this is people and culture change.”
The reanalysis that Verjans, Ouwendijk, and the group of virologists did ended up in a separate paper. That appeared early March in the Journal of Virology. Why not in Science Advances? Verjans sighs. “We approached them first, but they didn’t accept the paper. They wanted us to send a letter to the editor. But we think it is important to get it corrected in a relevant place, and letters to the editor are rarely read. Besides, they are not findable through search engines. The Journal of Virology did accept the paper. Fortunately, all virologists read that journal too.”
The Dutch version of this article also appeared in the NRC newspaper of 13 April 2024.
FAIR and Open data