Prior to Health Canada’s anonymization guidelines issued in 2019, subjective input from researchers helped in maintaining data utility, and the process was primarily conducted at the clinical level. At present, preserving data utility during the anonymization process must involve quantitative measurements at the document/data level. Similarly, it must include a well-defined and precise implementation of the selected rules to prevent over-redaction or over-anonymization. Health Canada provided these guidelines to maximize the release of analytically-valuable information.
Sponsors and Clinical Research Organizations (CROs) often consult experts at data transformation or anonymization such as Real Life Sciences (RLS) to conduct this process. At RLS, we follow a quantitative and metrics-driven process for preserving data utility for mandatory disclosures. This process is facilitated by the RLS PROTECT solution that is designed to balance the protection of patient identity and data utility. The solution supports and expedites regulatory submissions and further facilitates voluntary data-sharing projects.
Using this solution, RLS data anonymization specialists preserve data utility by taking a methodical approach that has six distinct steps.
First, they determine the set of possible transformations across all identifiers to meet the minimum 0.09 threshold in line with the EMA/HC guidance on anonymization techniques. This enables the data to preserve the clinical or research value after transformation and it aligns with guidelines issued by the European Medicine Agency (EMA) 2018 guidelines. Regulatory authorities such as Health Canada prefer anonymization of data instead of redactions. RLS has demonstrated its thought leadership by adopting this process early on. Overall the preference is to employ transformations that allow for a higher level of granularity over outright redaction of data. This process yields multiple possible transformation scenarios across indirect identifiers.
The next step is to measure the ‘Risk of Re-identification’ (ROR) across all possible transformation scenarios. This ROR metric is used to filter transformation scenarios that meet the risk threshold. In this step, several options for a number of variables or identifiers are considered. (Health Canada requires a 0.09 threshold but it can be customized to meet thresholds recommended by other regulatory authorities as necessary.)
In the third step, we prioritize transformation scenarios by the ‘Information Loss’ (IL) metric. These metrics are used to ensure the optimal anonymization solution with minimal loss of data quality. The metric guides the next steps in the process such as optimization and transformation. The IL metric is also used to rank the different transformation scenarios in terms of data utility/data loss.
For the fourth step, input from the client team plays a critical role. In this step, we optimize transformation options in consultation with clinical scientists, who independently prioritize quasi-identifiers and evaluate the Clinical Utility (CU) based on the context of the drug or condition in study. At this point, clinical scientists from the client team also further review the risk associated with medical events, including minor adverse events, found in the documents.
Then we focus on selecting the optimal transformation scenario using optimal ROR, IL, and CU trade-offs. This selection requires a thoughtful approach as too stringent criteria will suppress data from too many patients. If new information is found that was previously missed, the process may be updated to incorporate new measurements if necessary.
The last step is to measure implementation risks post-transformation, i.e. the precision of implementation to ensure over-anonymization has not occurred. Anonymization is an iterative process that means fine-tuning may be necessary upon manual review. Our data scientists and client team work together to ensure regulatory requirements are adequately met. This step-by-step approach ensures that the anonymization process does not have a detrimental effect on data utility and any undesirable impacts on data can be caught in a timely manner.
Each clinical trial or program tries to solve a specific clinical problem, and as such, no two clinical trials are alike. A cookie-cutter approach to data transformation is oftentimes not the best way to go. RLS data scientists work closely with the client team and customize data solutions for queries at hand to meet quantitative regulatory requirements. This approach ensures quick approvals from the regulatory authorities and helps complete clinical trials within a given timeframe. Our specialists are available to further discuss data-related challenges and propose solutions.