The “What” and “Why” of Health Data Anonymization and how Pharmaceutical Sponsors and Contract Research Organizations need to prepare

Do we really need to share health data?  The answer, when discussed across health care organizations varies. However, regardless of varying opinions, global health regulators and leading pharmaceutical companies overwhelmingly agree the answer is yes. From a research perspective, most would agree the sharing of trial results and real world data contributes to broader use cases, secondary research, more informed principal investigators, a more informed patient population and more. Ultimately, this benefits the greater good.  Many pharmaceutical organizations believe in trial transparency and data sharing and have implemented policies to support it, while others think it’s a good idea but have not made it an organizational priority. Most, however, struggle to implement scalable systems and processes to effectively support it - even when there is broad organizational and strategic buy in.

Anonymizing health data – also known as data anonymization or de-identification puts patient privacy first while striving to retain as much data usefulness as possible.  This data usefulness is also known as data utility.  

Health data anonymization is undertaken to share data for a secondary audience such as other institutions, research organizations, or individuals that have a use for data that was originally collected by other researchers.  Data sharing enables health and life sciences research to progress at a faster rate. 

For example, a Pharmaceutical Sponsor specializes in drug development for a rare disease area and has conducted several clinical trials, some with successful outcomes while others failed.  Meanwhile, a leading researcher at a University is undertaking new research to further understand the same rare disease.  The researcher at the University would gain significant insights from the Pharmaceutical sponsor’s research outcomes saving potentially months or years of time and costly effort.  According to industry advocates and government regulators, as long as patient identifiable information collected from the trials conducted by the Pharmaceutical Sponsor remain anonymous, that trial data can be shared and utilized.

How is Data Anonymized to Protect the Patient?

Historically, personal and patient data was redacted and only the remaining information was shared. This significantly minimized the potential for reusable information in a research setting.  New technologies have evolved – methodologies that protect patient privacy while providing researchers with usable information that retains the integrity of the information for research purposes.  It’s like crowdsourcing data so research can be taken in a multitude of directions, though individual privacy remains paramount.  Consider the following:

Screenshot (781).png

The limited information that is retained is not useful for a researcher. However, we have achieved the primary goal of patient privacy by redacting any health information that could be tied back to a specific individual. 

Health data anonymization has evolved by utilizing quantitative risk analysis to determine which data can be retained, anonymized or redacted. Think of this as a statistical analysis of health data to determine how to transform (anonymize) the data in a way that retains its validity for research while protecting the specific attributes/identity of an individual. 

Screenshot (783).png

Unlike the redaction example, the integrity and utility of the information is retained and can be used by a researcher to advance her own research.

Anonymization algorithms can apply to research documents and datasets such as trial results or real world data.  Patient records and patient narratives are often referenced in document form whereas clinical trial data are often in quantitative tabular formats.

How Anonymous is Sufficiently Anonymous? How Much Anonymization is Enough?

In practice, global regulatory agencies such as Health Canada, EMA and the FDA do not require 100% anonymity all of the time. But anonymity must meet an acceptable level as defined by each regulatory body. As discussed above, to make data sharing valuable the data must retain some meaning, or utility. Regulators measure the risk of data anonymization failure in terms of re-identification of the subject. Meaning, how likely is it that a trial participant or patient can be identified from the shared information.  If it were completely impossible to identify trial participants from shared health data, the statistical probability of re-identification would be close to 0%. An example of this is the redacted statement above. However, regulators understand that if the probability of re-identification is 0%, then data utility is also at 0% and is of little use to researchers. At that point, why share it?

Global health regulators are willing to accept some risk of re-identification to maintain adequate levels of data utility.  For data to be shared across health organizations or individuals, the risk of patient or trial participant reidentification has to fall below certain statistical risk thresholds. These risk thresholds vary from case to case depending on the particulars of the dataset or trial. For example, rare disease areas with small patient populations pose additional complexity to protect patient identity.

Health data anonymization requirements imposed by regulators have progressed significantly the last five years. The competing needs for data sharing and maintaining patient privacy are both growing at rapid rates.  As technology evolves to help manage the risk side of the equation, so must the processes, tools and organizational readiness of Pharmaceutical Sponsors and Contract Research Organizations (CROs) for managing the process.  All of these considerations are built on the foundation that it makes and there are distinct advantages to sharing health data. The recent global pandemic underlines the need for accelerated timelines in developing therapeutics, which can be achieved by sharing data. However, sharing data should not be at the cost of personal privacy.

Real Life Sciences, LLC works with clinical trial managers, R&D teams and researchers to provide the following services:  Clinical Trial Redaction and Anonymization of data and documents, Qualitative and Quantitative Risk Assessments. Process and Workflow Automation, Expert Regulatory Knowledge, Strategy and Process Consulting.

Contact Real Life Sciences at:


crossmenu linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram