“Academic research cannot be conducted without good data and research tools”
The recently published magazine Onderzoek addresses the challenges surrounding the exponentially growing terabytes of data. How do you store it safely and efficiently? And how do you make that data accessible and searchable? Journalist Nienke Beintema spoke with Henk Wals, director of DANS, among others.
‘Good data storage is crucial if you want to safeguard the quality of science,’ says Henk Wals, director of Data Archiving and Networked Services (DANS). This institute of NWO and KNAW is the national service centre for the storage and management of research data. ‘Fortunately, this issue is no longer the exclusive domain of the academic. We have experts dealing with this now, and increasingly the infrastructure is improving.’
Two hundred thousand datasets
DANS is one of the largest repositories for research data in the world and manages almost 200,000 datasets. In addition to DANS, there are several repositories in the Netherlands for different domains, such as 4TUdata for the engineering sciences. The National Plan for Open Science is creating a joint structure for these repositories. ‘You store data just as a researcher would on a laptop,’ says Wals. ‘In addition, the data is often in the institution’s repository. And then there are domain-oriented institutions where those data come together.’
Coordination on major themes
All universities are establishing local Digital Competence Centres (DCCs), with data stewards who support researchers in creating datasets that have consistent structures, storage and searchability. Work is also underway to establish thematic DCCs: an infrastructure for national coordination on major themes, such as climate change or the coronavirus pandemic. These themes require data exchange and collaboration between different disciplines. 4TU and DANS are also closely involved in this. ‘And finally, there is a framework for international coordination.’
‘It’s not just about the physical infrastructure,’ says Wals. ‘There are also many non-tangible aspects at play. For example, the standards with which the data must comply, such as the FAIR principles: findable, accessible, interoperable and reusable. Accessible does not mean that everyone should be able to access all the data just like that. Sometimes data is sensitive to privacy. The metadata – i.e. the labels attached to the data – indicate which restrictions apply and how access is regulated.
What is produced where?
Much progress has been made in recent decades. Nevertheless, it’s estimated that less than a quarter of all data in the Netherlands is properly stored. ‘In fact, we don’t even know exactly what’s being produced or where, or who’s storing it. That’s why the data stewards are so important: they ensure that data management is done carefully from the outset.’ The interoperability between the different domains also leaves much to be desired. Currently, it’s often a matter of tinkering with data sets. ‘Whereas what you want, for example in the case of the coronavirus pandemic, is to assemble different data together quickly. Data management is an urgent social challenge.’
Please read this article on the website of NWO.