Significance of data anonymization
In the healthcare domain, regulatory authorities promote clinical trial transparency by providing reporting guidelines. These guidelines and regulations require patient privacy to be protected, and therefore, describe strict expectations related to data anonymization. Anonymization is a process conducted to remove personally identifiable information (PII) such as name, social security number, and also quasi-identifiable information (age, gender). After data anonymization, data can be used for research analysis and further improvement of patient care.
Qualitative redaction approaches
To protect patient privacy, at present, a popular qualitative method employed is redaction. In redaction, patient data is completely blocked out by overlaying it with opaque boxes. Redaction does not require any specialized equipment or other significant resources and hence has been a long-standing industry practice. But this practice brings a couple of unintended consequences - its qualitative nature and all usefulness of the information is removed. Both these consequences have prompted regulatory authorities to encourage alternate methods. Health Canada & the European Medicines Agency (EMA) encourage data sharing within institutes and also encourages making it available for secondary researchers. Too much qualitative redaction reduces data utility and now that alternative solutions (i.e., quantitative risk assessments) are becoming more commonplace, these newer techniques that offer greater data utility while protecting patient privacy should be favored.
Quantitative approaches in data anonymization for Health Canada PRCI.
Regulatory authorities such as Health Canada, and EMA are encouraging data anonymization, instead of redaction.
While the EMA’s process regarding the publication of clinical data (Policy 0070 ) was suspended in December 2018 (the one exception later being all medicines related to COVID-19), the date of applicability for the EMA’s Clinical Trials Regulation (CTR536) has been fixed as 31 January 2022. The EMA is requesting anonymization reports from the applicants where companies are expected to describe their anonymization method and how it affects data utility.
Health Canada is in favor of a quantitative approach- i.e. using statistical anonymization methodology. Using an anonymization approach, some data that previously would be redacted are instead transformed using techniques such as pseudonymization, date shifting, and generalizations. Currently, Health Canada is scrutinizing and labeling more and more submissions as non-compliant that take a qualitative redaction approach. It has asked sponsors to provide justification of redactions and also requested to submit an annotated or readable version. Some sponsors have received notices regarding non-conformance to Health Canada guidance when entire sections were redacted. Health Canada clearly states that it may reject proposed redactions “when the proposed redaction pertains to information already in the public domain”.
As sponsors adopt anonymization approaches, they need to comply with guidelines provided by Health Canada. Anonymizing variables would make re-identification of participants difficult and Health Canada has suggested a statistical threshold of 0.09, an extremely low level of risk. Anonymization, therefore, requires expertise in data science and technology for its implementation. Furthermore, as compared to structured data, anonymizing unstructured data such as clinical study reports poses unique challenges and high-level expertise becomes necessary. Additionally, keeping up with the evolving landscape of regulations is difficult for smaller companies. These considerations complicate data anonymization processes for the sponsors. Pharmaceutical companies and clinical research organizations (CROs) are looking for efficient, accurate, and reliable technology solutions to meet new regulatory standards of anonymization.
RLS Protect: a solution to data anonymization challenges
Data anonymization challenges can be overwhelming to sponsors and CROs in this changing environment. RLS Protect offers data anonymization and risk assessment services to fulfill regulatory requirements and voluntary sharing initiatives. Broadly speaking, using RLS Protect platform data anonymization can be achieved in three distinct steps: project initiation, risk modeling, and anonymization. During project initiation, initial data is reviewed and variables to be anonymized are determined. Next risk modeling, transformation rules are discussed with clients, and transformation rules are applied. Lastly, during the anonymization step, automated search and redacting are done. For ensuring the process has achieved its goals, various reports are generated in addition to manual quality control. Equipped with a powerful platform RLS Protect, our team of data scientists has the depth of expertise in quantitative risk modeling / high scale automated anonymization of PDFs. RLS Protect is well-suited for most healthcare organizations and their patient data.