“Open” health data and GDPR
- De-identification is ineffective and expensive1
- There is no such thing as foolproof de-identification2
- Unimportant items of information can be combined to create important information;3; 4
- Ubiquitous data collection means we have no– to little–control over our data
- All digital databases can and very likely will be breached5
- Remedy for harm from breach is not always calculable or enforceable6
- Consent is imperfect7 but is necessary even if it is implicit
- Sharing and privacy are two sides of the same coin8
Clinicians and Researchers Want
Non-de-identified data. Data from which the identify of the patient has been removed is not as useful as data that identifies the patient. Doctors are not interested in personal identity, but they want to know personal characteristics such as race, age, gender, where the patient was born, where the patient lives, even the income level, etc. These characteristics impact health in important and specific ways but they also identify the patient personally.
Specific, targeted data, not-generic and not-voluminous data. For example, MBs of fitbit data are not useful, but focused blood pressure readings or heart beat data may be useful. One term used is “prescribed data,” that is, the doctor prescribes to the patient to collect and submit specific data.
GDPR has specific definitions for personal data,9 genetic data,10 biometric11 and health data.12 The Article 6 of GDPR13 allows lawful processing of personal data if one or more of the following conditions are met:
- the data subject has given consent;
- it is necessary for the performance of a contract to which the data subject is party;
- it is necessary for compliance with a legal obligation;
- it is necessary to protect the vital interest of the data subject or another natural - person;
- it is necessary for the performance of a task carried out in the public interest;
- it is necessary for the purposes of the legitimate interests pursued by the controller or third party.
However GDPR prohibits processing of biometric, genetic, and health data unless one of the three conditions below would apply:
- The data subject must have given “explicit consent.”
- “Processing is necessary for the purposes of preventive or occupational medicine, for the assessment of the working capacity of the employee, medical diagnosis, the provision of health or social care or treatment or the management of health or social care systems and services”
- “Processing is necessary for reasons of public interest in the area of public health, such as protecting against serious cross-border threats to health or ensuring high standards of quality and safety of health care and of medicinal products or medical devices”
Data from publicly funded research to be findable, accessible, interoperable, and reusable. But, does that include clinical research? Clinical research, as opposed to basic research, is based on personally identifiable information, and as such, is required by law to be protected.14
Personal health data from non-medical, commercially available devices such as pedometers, blood pressure monitors, and sleep trackers are usually shared with the device providers via contractual agreements “signed” by the people when they sign up for these services. These data are not from publicly funded research, but can be valuable for medical purposes. My assumption is that FAIR4Health excludes such data. Since EHRs are not a product of publicly funded research, my assumption is that FAIR4Health excludes such data.
However, publicly funded research data may also contain personally identifiable information. If so, that information has to be removed through a process of specified de-identification. However, it may still have to be restricted to a certain community rather than be made available to everyone. Is the data still FAIR if it is restricted to a specified community?