Linking data in a secure, FAIR way

The Personal Health Train (PHT) is based on FAIR: the principle that data must be Findable, Accessible, Interoperable and Reusable. The goal of the project is to securely link data regarding a patient, which is stored by different parties at different locations (stations), so that scientists can conduct more extensive analyses, for example. This pilot project aims to link data from participants in the Maastricht Study (on diabetes) to their data at Statistics Netherlands (CBS). Information from CBS about living environment, socio-economic conditions and more can uncover relevant knowledge about the risk factors for diabetes.

Everything revolves around privacy

Johan van Soest has a background in medical information science and is involved in the Personal Health Train (PHT) as a UM researcher. “When linking databases that contain personal information, it’s all about privacy. Within the UM Community for Data-Driven Insights (CDDI), which this project is a part of, it is always about technology, science, and social and legal interests. The technology is often not the most complicated part. Most of the time is spent on the administrative, political and ethical discussions.”

Considerable attention to ethical and legal questions

Because if you can enrich the Maastricht Study database, which includes thousands of Limburgers with and without type 2 diabetes, with CBS data about the same population, then it is also technically possible to link, for example, a supermarket's bonus card to health data. The question is whether that is desirable and to what extent people should give their explicit permission for linking data in this way. That is why, for example, Professor David Townend is also a member of the project group, as a Professor of Law and Legal Philosophy, specialising in data security and privacy in medical research.

With distributed learning, software only becomes more reliable

The test phase of the project has been completed and now working with the real data can begin. “We hope to have the connection working by the spring of 2020. After that, scientific questions can be answered using the data.” The goal is that the infrastructure and the analyses can be used by everyone. “That is the principle of distributed learning, from which software only becomes better and more reliable”, says Van Soest. “For some researchers, that currently feels a bit like surrendering freedom. Many people still have difficulty imagining what open science will mean for their research.”

FAIR builds on what already existed as a scientific challenge

As far as Van Soest is concerned, open science is about being as transparent as possible about what you have done in your research. “In essence, making data FAIR builds on what we have been working on for twenty years: the ability to work with data from colleagues. FAIR requires an extra investment, such as digitally describing what certain data mean. And researchers naturally want to complete their research as quickly as possible and are happy to do what they know.”

A tip for researchers who also want to pursue Open Science

As a researcher, if you want to know how you can apply open science and FAIR to your work, Van Soest recommends contacting colleagues who already have some experience with it. “And don’t be afraid to just give it a try. You can always go back to what is familiar.”

Femke Kools (text)