Anonymization Primer: Participant Privacy Risks

Medical researchers from pharmaceutical industries and academia are increasingly engaging in the secondary analysis of unstructured data or unstructured clinical documents obtained from clinical trials. However, there are practical challenges in data-sharing for re-analysis or secondary analysis. One of such challenges is the lack of transparency while sharing data at the patient level.    

The lack of transparency while sharing patient data was for good reasons. Since health information is often associated with our most personal aspects (e.g. ability to work, dietary habits, sexual orientation, stigmatizing medical history etc), redacting certain variables allows patients to maintain their privacy. It aligns with principles such as respect for persons, justice, and non-maleficence. Additionally, it aligns with applicable legal requirements such as General Data Protection Regulation (GDPR).       

As the regulatory authorities move away from qualitative redactions and require de-identification, it is worthwhile to discuss privacy, consequences of privacy breach, and the quantitative de-identification process.

Defining Privacy

Although there is no consensus about a specific definition of ‘privacy’, researchers view privacy as ‘the ability to control the collection, use, and disclosure of one’s personal information.’ Another definition of privacy states that privacy means ‘whether others can access one’s information, regardless of whether it is the individual who is in control of her information’. But in the context of this article, we use privacy in the context of whether others know information about a person and can draw various inferences from it. 

A survey study revealed that when participants were asked if they were willing to have their records used for research, without their knowledge or permission, a majority clearly stated ‘no’. But when researchers mentioned the database would be anonymous for research or that access to the data would be under their control, a majority of patients now thought data-sharing was a good idea that helps advance science. Typically, the sponsor organizations engage in practices that respect the sentiments of the patients. For instance, the use of controlled platforms, data sharing agreements and third-party ethical oversight can reduce risks to individuals.  

Consequences of privacy breach

Privacy breaches can have drastic negative effects and may harm patients if the information is used by individuals with malicious intent. Health information can be misused and that may affect a person’s ability to get a specific job or maintain their current job. Their ability to get insurance may be affected. But worst of all, they may experience social stigma if information related to their gender, race, ethnicity, or disability status is revealed to the general public. In extreme scenarios, patients’ ability to maintain autonomy over their lives may be affected. Researchers need to de-identify variables to prevent the harm it may lead to the patients. For sponsor organizations, privacy breaches may mean heavy legal penalties.        

At present, Health Canada strongly recommends quantitative de-identification instead of qualitative rule based redactions. The regulations focus our attention on what is called the “Risk of Re-identification” or “ROR”– namely, that there can be negative repercussions if an adversary or intruder can determine, with absolute certainty or very confident, the identity of a patient in a clinical trial. ‘Adversaries' is a term used to describe people or entities that might try to identify research participants. 

Patient Identifiers

The Health Insurance Portability and Accountability Act (HIPAA) privacy rules list eighteen identifiers that require de-identification. Most other regulatory authorities have similar policies. These direct identifiers can be summarized as:    

  1. Name
  2. Location. Usually, the first three digits of a Zip Code can be shared in certain cases or the names of states can be shared.
  3. Dates. Age brackets or year of an event can be shared but sharing specific dates such as birth date, date of death creates a risk of re-identification.
  4. Contact information such as Telephone number, Fax number, Emails
  5. Identifying numbers such as Social Security numbers, Medical record numbers, Health plan beneficiary numbers, Account numbers, Certificate/license numbers, Vehicle identifiers, and serial numbers, including license plate numbers, Device identifiers, and serial numbers. 
  6. Web identifiers such as Universal Resource Locators (URLs), Internet Protocol (IP) addresses.
  7. Biometric identifiers e.g., finger and voice prints, full-face photographs, and any comparable images

Any other unique identifying number or patient characteristic is also covered under privacy acts by various regulatory authorities, including Health Canada. To de-identify personal information, data science experts can help in finding personal identifiers in structured or unstructured datasets. 

While de-identifying these variables, sponsors need to meet statistical thresholds to mitigate the risk of re-identification. Researchers in the pharma industry need technology solutions to meet these regulatory requirements within a short timeframe.   

Real Life Sciences has launched the RLS Protect platform that helps sponsors in de-identifying data and meeting statistical thresholds defined by Health Canada. The platform enables sponsors to share high-quality data that upholds principles of openness, transparency, and adds data utility, yet protects patients from the risk of re-identification. Since each clinical trial has its unique sets of variables and study designs, our team of expert data scientists works closely with sponsor teams to ensure optimal results. To learn more about RLS Protect and our data science services, contact us here.


crossmenu linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram