European Medicines Agency's (EMA's) Clinical Trial Regulation (CTR) & Clinical Trial Information System (CTIS): Trial Sponsor and CRO Frequently Asked Questions & Regulatory Announcements

What may be considered CCI when redacting CTA documents for submission to CTIS and how should I go about redacting this information? 

Some examples of confidential information may be names of manufacturers, details of the manufacturing process, future development plans, innovative analytics methods and equipment used at the clinical sites.

Once you confirm the information that is confidential with your IP Legal, Regulatory, Clinical and non-Clinical teams, the redactions in the CTA document should be limited only to the specific information that is deemed confidential. This means that full paragraphs and pages will not be redacted while specific content or syntax within a sentence, table or figure will be redacted.

Prior to finalizing your CCI decisions, remember to double check applicable public sources such as global trial registries, publications and abstracts to confirm the information indeed remains confidential and has not already been made public.
Link to EMA Guidance document on how to approach the protection of personal data and commercially confidential information in documents uploaded and published in the Clinical Trial Information System (CTIS).

How can I streamline my internal review and approval process of redacted CTIS CTA documents?

To optimize and streamline your internal approvals, consider using a log to capture each proposed redaction and the associated justification. This information will assist reviewers and approvers and help mitigate potential questions and delays. Further, by building a list, you can repurpose the information for other documents and studies, as applicable.

An example log may include a column for each of the following:

  1. Document type e.g.; Investigator Brochure
  2. Document filename e.g.; IB_9301_redacted_draft
  3. Document version: v2.1
  4. Page reference e.g.; page 7
  5. Non-redacted syntax e.g.; 3 mg
  6. Proposed redacted syntax e.g.; 3
  7. Justification category: e.g.; proprietary manufacturing technique or method
  8. Justification: e.g.; text justification of why this information is proposed to be redacted
  9. Transparency Redactor: e.g; Jane Smith
  10. Redaction date: 15 Oct 2023

Which studies are required to transition to the CTR and what is the deadline to complete this?

Since the launch of the CTR in 2022, the timescales and expectations for transition studies from EudraCT have been socialized by EMA. The authorization of transition studies in CTIS requires advanced planning to prepare and submit the necessary documents prior to the deadline of 30 January 2025.

The EMA is estimating up to 6,000 studies under the Clinical Trials Directive (CTD) will require transition and approval in CTIS prior to the deadline of 30 January 2025. To date, approximately 320 studies have completed the transition.

Trials authorized under the CTD with at least one active site in the EU must be transitioned by 30 January 2025. Trials that have ended locally in all EU member states but have active sites in other regions of the world do not need to transition. Key to achieving the deadline of 30 January 2025 is not just submitting the necessary information and documents into CTIS but receiving the authorization itself.

There are specific considerations and deliverables for each transition study as outlined in the following documents that are made available by EMA.

Guidance for the transition of clinical trials

CTCG’s best practice guide and cover letter template

New EMA announcement 06 Oct 2023 explains planned changes to Transparency Rules for 2024. Revisions to EMA’s CTIS Transparency rules will impact all trial sponsors

Following a public consultation period in May and June 2023, on 06 October 2023, the EMA announced upcoming changes to CTIS Transparency Rules that will become effective in 2024.

To be implemented in 2024, changes include removal of the deferral mechanism for all trials, an abbreviated list of ‘for publication’ documents and simplification of required data fields in CTIS.

With these changes, EMA aims to simplify and streamline the transparency of information for patients and healthcare professionals while balancing protection of commercial confidential information and personal data for trial sponsors.

Trial sponsors will need to continue to manage their CTIS projects under the current rules while anticipating the changes for mid-year 2024.

EMA Announcement

Revised CTIS Transparency Rules

What happens if I receive an RFI regarding redacted version ‘for publication’ documents submitted to CTIS and how can I avoid this?

Avoiding an RFI is critical to your study timelines. RFI’s require a short turnaround time. Awareness of the CTR and its guidance are critical to mitigating the risk of an RFI. Read on to learn more about avoiding RFIs due to CCI redactions and examples of acceptable CCI.

RFI’s related to ‘for publication’ documents have recently been noted bay several study sponsors. In these cases, the Member State has noted dense redactions in these documents and the RFI has indicated the sponsor needs to resubmit a revised version of the document that adheres to Section 1.3 Legal Framework in EMA’s CCI Guidance document (including references to Article 81.4 of the CTR).

Avoiding an RFI is critical to your study timelines. RFI’s require a short turnaround time and therefore the operational considerations of turning a revised document in an expedited fashion are often challenging. 

Awareness of the CTR and its guidance are critical to mitigating the risk of an RFI. Per the CTR guidance, when deferring publication of documents, redactions should be minimal. Further, sponsors should be redacting ‘for publication’ documents based on what is anticipated to be confidential at the time of publication (not at the time of upload to CTIS).

Examples of information that may be deemed acceptable to redact as CCI are listed below (note: this is not an exhaustive list). It is recommended you check the guidance to confirm your suggested CCI data for your study meet the necessary criteria.

Link to more information: EMA CCI CTIS Guidance

Are Member States reviewing public version documents in CTIS and could my ‘for publication’ document redactions result in an RFI?

The guidance from EMA has been consistent: if you are deferring, ‘for publication’ version documents should contain minimal if any redactions upon upload to CTIS. Further, the guidance states that Sponsors should not defer and redact.

At the start of the CTIS launch in 2022 it became clear that Member States were not reviewing ‘for publication’ documents to ensure conformance to this expectation - although it was expected that study sponsors adhere to the guidance at all times. However, more recently, multiple study sponsors have begun to receive RFIs as a result of heavy redaction in ‘for publication’ documents. It is important for all study sponsors to remember that when deferring, only commercially sensitive information at the time of publication (following the deferral period, if applicable) should be redacted. EMA is expecting all ‘for publication’ documents to be meaningful for the public.

Link to Sponsor Handbook with guidance regarding deferral and ‘for publication’ document redaction.
Link to EMA
CTR Questions & Answers Version 6.5, see Question 6.5, paragraphs 259-261.

My organization considers the dosing information for my CTIS project as Confidential (CCI). However, CTIS requires metadata fields be populated with dosing information. How can a Sponsor approach the protection of this information?

Depending on the phase of the trial, Sponsors can protect this information from becoming public despite certain CTIS fields requiring an input. Below is EMA’s guidance from section 3.3 “Q&A on the protection of Commercially Confidential Information and Personal Data while using CTIS”:

How can dose details be protected from disclosure from CTIS for certain trials category falling within category 2?
The main characteristics of medicinal products used in clinical trials falling under category 2 are subject to publication rules after a decision on the clinical trial application is issued by the first Member State concerned.

These main characteristics also include structured data fields on daily dose allowed and maximum dose allowed for the medicinal product under investigation. In some instances, depending on the trial development phase, dose details may be considered to be CCI. In such instances, sponsors can include ‘dummy data’ (e.g. 00 digits) in the related structured data field(s) of CTIS.

The full information on the posology should, however, be provided to the Member States for assessment in the document version ‘not for publication’ and can be redacted in the corresponding documents to be published.

This approach would be acceptable only on justified grounds, i.e. when the sponsor proves that the specific information on the posology is not in the public domain and constitutes patentable matter, the disclosure of which before a patent application is filed (typically, after the completion of the trial and during the trial readout) would jeopardize its protection.

This might be applicable for example to integrated phase I/phase II trials that are to be marked in CTIS as category 2 trials. The grounds for considering dose details as CCI should be clearly documented in the cover letter of the application. Link to additional Guidance

Transparency: An integral part of the clinical trial process.

With the primary aim of fostering research innovation and improving transparency of the clinical trial process in Europe, the Clinical Trials Regulation (CTR) was implemented in January of 2022. To facilitate the implementation and management of the CTR, the EMA simultaneously launched a digital portal known as Clinical Trials Information System (CTIS). Beginning January 31, 2023, all new clinical trial applications must be submitted via the CTIS.

These changes are leading a cultural shift in clinical trial transparency  [See: A Cultural Shift Is Happening] and the operational impact of this change needs to be discussed in further detail.        

Why adopt transparency as a critical component in the clinical trial lifecycle?   

The EMA has elevated transparency and disclosure of in-process and approved trials to another level. With the CTR, transparency has become an integral part of the clinical trial process in Europe. Using the CTIS, a publicly accessible database, anyone interested in the clinical trials conducted in Europe can search to obtain useful information. Interested parties may be current patients being treated for a disease or chronic illness, participants of a clinical trial seeking detailed information regarding his or her trial, researchers who are seeking details about past and in-process trials and their results, and everyday citizens seeking knowledge of a disease or treatment.

Increased transparency will help to mitigate redundancy or duplication of clinical trial initiatives. If a clinical trial had negative or inconclusive results, the CTIS will make  this trial information available to researchers and individuals. For pharmaceutical companies, this visibility will help in avoiding the repetition of unsuccessful study designs. Increased transparency also helps in improving the quality of data. In other words, the structure and workflow that the CTIS requires is providing clarity and consistency in what stakeholders can expect when querying clinical trials in Europe. 

In a nutshell, although these increased transparency requirements brought about by the CTR will trigger changes to pharmaceutical company’s current workflow, embracing transparency as a high value component in clinical trials will help to expedite research in a given therapy area while building trust and confidence with the public who seek to learn and understand about the trial itself.

Operationally, based on the information that is required in CTIS about a particular clinical trial,  study sponsors need to assume all trial documents and information will become public, with few exceptions. All functional teams must understand that disclosure is now an integral part of the clinical process and prepare submission documents accordingly.

How can trial sponsors manage this change?

Over the next several years, as the industry embraces this change, the resulting impact to clinical teams and risks brought about  with the disclosure of clinical documents  and Commercially Confidential Information (CCI) must be managed carefully. Study sponsors may need to conduct a review of existing processes such as medical writing practices and awareness training regarding transparency of confidential information. A gap analysis of roles, responsibilities, tasks, internal policies and procedures may be required. Study sponsors may choose to start with foundational aspects such as how end-to-end CTR information flow will be managed in addition to the preparation of public version documents. Further, a risk-based approach to anonymizing personal data will be a necessary step in the process before sharing trial results and patient narratives.

As of January 31, 2023, the use of CTIS for new trial applications is mandatory. Study sponsors need to assess their internal capabilities, map their updated business processes, prioritize action items and seek support from vendors who specialize in identification of personal data and can support the redaction and anonymization of this information in a timely manner.

Is there any support available to navigate these changes?

Transparency requirements have indeed disrupted the historical workflows previously implemented by study sponsors and CROs. Study sponsors are looking for more guidance during these times of change. The EMA has provided guidance documentation for study sponsors which is a good place to start. Additionally, to cope with the increased workload related to standardized document templates and document redactions, some companies are choosing to collaborate with external software and service providers such as Real Life Sciences (RLS).

Redactions can be less stressful purpose built digital tools are used to complete the document redaction and anonymization tasks. At RLS,  solutions such as RLS Protect can help to automate the identification of personal data and confidential information and prepare public version documents. However, minimizing the need for redactions is one strategy that study sponsors can adopt. In the next blog of this series, let’s delve deeper into “lean” authoring...

Clinical Trial Anonymization: Maintaining Data Utility

Prior to Health Canada’s anonymization guidelines issued in 2019, subjective input from researchers helped in maintaining data utility, and the process was primarily conducted at the clinical level. At present, preserving data utility during the anonymization process must involve quantitative measurements at the document/data level. Similarly, it must include a well-defined and precise implementation of the selected rules to prevent over-redaction or over-anonymization. Health Canada provided these guidelines to maximize the release of analytically-valuable information.

Sponsors and Clinical Research Organizations (CROs) often consult experts at data transformation or anonymization such as Real Life Sciences (RLS) to conduct this process. At RLS, we follow a quantitative and metrics-driven process for preserving data utility for mandatory disclosures. This process is facilitated by the RLS PROTECT solution that is designed to balance the protection of patient identity and data utility. The solution supports and expedites regulatory submissions and further facilitates voluntary data-sharing projects.

Using this solution, RLS data anonymization specialists preserve data utility by taking a methodical approach that has six distinct steps. 

First, they determine the set of possible transformations across all identifiers to meet the minimum 0.09 threshold in line with the EMA/HC guidance on anonymization techniques. This enables the data to preserve the clinical or research value after transformation and it aligns with guidelines issued by the European Medicine Agency (EMA) 2018 guidelines. Regulatory authorities such as Health Canada prefer anonymization of data instead of redactions. RLS has demonstrated its thought leadership by adopting this process early on. Overall the preference is to employ transformations that allow for a higher level of granularity over outright redaction of data. This process yields multiple possible transformation scenarios across indirect identifiers.

The next step is to measure the ‘Risk of Re-identification’ (ROR) across all possible transformation scenarios. This ROR metric is used to filter transformation scenarios that meet the risk threshold. In this step, several options for a number of variables or identifiers are considered. (Health Canada requires a 0.09 threshold but it can be customized to meet thresholds recommended by other regulatory authorities as necessary.)

In the third step, we prioritize transformation scenarios by the ‘Information Loss’ (IL) metric. These metrics are used to ensure the optimal anonymization solution with minimal loss of data quality. The metric guides the next steps in the process such as optimization and transformation. The IL metric is also used to rank the different transformation scenarios in terms of data utility/data loss.   

For the fourth step, input from the client team plays a critical role. In this step, we optimize transformation options in consultation with clinical scientists, who independently prioritize quasi-identifiers and evaluate the Clinical Utility (CU) based on the context of the drug or condition in study. At this point, clinical scientists from the client team also further review the risk associated with medical events, including minor adverse events, found in the documents.

Then we focus on selecting the optimal transformation scenario using optimal ROR, IL, and CU trade-offs. This selection requires a thoughtful approach as too stringent criteria will suppress data from too many patients. If new information is found that was previously missed, the process may be updated to incorporate new measurements if necessary.

The last step is to measure implementation risks post-transformation, i.e. the precision of implementation to ensure over-anonymization has not occurred. Anonymization is an iterative process that means fine-tuning may be necessary upon manual review. Our data scientists and client team work together to ensure regulatory requirements are adequately met. This step-by-step approach ensures that the anonymization process does not have a detrimental effect on data utility and any undesirable impacts on data can be caught in a timely manner.   

Each clinical trial or program tries to solve a specific clinical problem, and as such, no two clinical trials are alike. A cookie-cutter approach to data transformation is oftentimes not the best way to go. RLS data scientists work closely with the client team and customize data solutions for queries at hand to meet quantitative regulatory requirements. This approach ensures quick approvals from the regulatory authorities and helps complete clinical trials within a given timeframe. Our specialists are available to further discuss data-related challenges and propose solutions.

A Cultural Shift is Happening Before our Eyes - The Impact of the European Medicines Agency's (EMA) CTR & CTIS on Clinical Trial Transparency

The rollout of the Clinical Trial Information System (CTIS) has resulted in a cultural shift in the ways in which study sponsor leaders and operational teams think about clinical trial transparency and how this impacts their planning and operations. A primary aim of the Clinical Trial Regulation (CTR) EU No 536/2014, improving clinical trial transparency, will help to enhance trust and confidence with the public and provide material benefit in aiding and improving research. However, the operational impacts of CTIS and how clinical trials are managed in conformance with the Regulation in Europe is igniting changes that are far reaching within the organization. These changes will be adopted through well defined SOPs and a clear understanding of roles and responsibilities for clinical teams and their counterparts. Study sponsors that choose to embrace the change holistically by building a transparency minded culture throughout the organization will wind up on top. 

What History Tells Us?

Transparency requirements have evolved and progressed for years. In 2017, the European Medicines Agency (EMA) published external guidance for Policy 0070, and in 2019, Health Canada published a Public Release of Clinical Information (HC PRCI) guidance. The pharma industry has gradually become accustomed to disclosing trial results. Some have progressed to sharing beyond what is mandated by regional health authorities. The documents for sharing results may include trial synopsis, layperson summaries, and Clinical Study Reports (CSRs), for example. 

In 2022, the EMA launched the CTIS, and in 2023, its use will become mandatory for new clinical trials. The transparency requirements implemented as a result of the CTR has brought several questions and issues to the forefront.

What are the Aims of the Clinical Trial Regulation?

Few regulations have impacted pharmaceutical manufacturers more than Regulation (EU) No 536/2014, otherwise known as The Clinical Trials Regulation or CTR. The CTR aims to centralize the regulatory submission and review process for all trials conducted in the twenty-seven (27) European Economic Union countries. The intent is to position Europe as a favorable region to conduct clinical trials while increasing the transparency of clinical trials information to the public at large.

The CTIS is the secure online portal that is used for the implementation and operation of the CTR. This portal facilitates interactions between study sponsors, researchers, regulatory bodies and ethics committees throughout the lifecycle of a clinical trial. The public can also access a subset of trial information from the portal. The effort to centralize and streamline the clinical trial process in Europe using the CTIS has had a widespread impact on the operational teams that coordinate and manage clinical trials. To provide further clarity, leading up to the launch of CTIS, the EMA published guidance to use the CTIS, which has completely changed current conversations around clinical trial transparency. 

Why is the cultural shift happening? 

Apart from the use of CTIS, the transparency requirements alone have caused manufacturers to rethink their approach to clinical transparency and the ways their cross-functional teams think about and manage the regulatory process. For example, CTR’s transparency requirements have brought changes to what trial information and documentation is disclosed to the public and when. Five important needs draw our attention now: 

This combination of requirements has triggered study sponsors to rethink how their teams work together to achieve this new set of expectations. As a result, the importance of a common understanding of what clinical transparency means, how it’s implemented and supported, and ultimately how it can be embraced must be tackled not only through roles, procedures, and processes but also how it is weaved in the cultural fabric of the organization.   

How do we embrace this cultural shift? 

Pharma leaders and regulatory affairs teams are looking for solutions that will help them adapt to the current evolving landscape and seamlessly integrate CTIS into their internal day-to-day workflow. These solutions include digital tools, streamlined processes, and expert teams that can help manage the transition. Real Life Sciences (RLS) has been at the forefront of developing and implementing solutions for current clinical transparency challenges.

This 6-part blog series will focus on the high-impact areas being realized by pharmaceutical manufacturers today. Our experts will share their perspectives and observations on how study sponsors can implement changes to improve operational efficiency and employee satisfaction while embracing the new world of clinical transparency we find ourselves in. Together, these focus areas make up the challenges and opportunities that are upon us in clinical transparency. 

Live Webinar: Planning for Publication of Trial Documents and Plain Language Summaries under the CTR

Today's Disclosure and Transparency teams are faced with new pressure points resulting from recent changes in the regulatory space and beyond. The CTR has had an immediate impact on authoring and redaction processes and how clinical teams work together. This session will highlight common pressure points and offer practical tools and solutions in support of preparing ‘for publication’ documents and Plain Language Summaries.

Register Here:

Date: Thursday, December 8th

Time: 10:00am EST

Are we ready for the Clinical Trial Information System (CTIS)?

Industry expectations from a regulatory perspective are evolving as the new Clinical Trial Information System (CTIS) for clinical trials in the European Union (EU) goes live. On 31 January 2022, the launching of the CTIS facilitated the meeting of the requirements of the Clinical Trials Regulation (CTR). Beginning 31 January 2023, study sponsors can apply only under the CTR instead of the prior regulation (the Clinical Trials Directive).

But what’s the difference between CTR and CTIS? 

The CTR (EU) No 536/2014 is the new regulation that replaces and expands the EU Clinical Trials Directive 2001/20/EC. The new regulation focuses on three main aims - fostering a favorable environment for conducting clinical trials in the EU, ensuring the highest standards of safety for the study participants, and increasing transparency of clinical trial information i.e. data sharing. The next step for CTR is the Accelerating Clinical Trials in the EU (ACT EU) initiative that focuses on increasing transparency in data sharing among various study sponsors.

The CTIS is a digital portal and database built especially to facilitate the implementation of the CTR. Study sponsors can submit the applications using the portal and also send documents to regulatory authorities throughout the life cycle of their clinical trial. The member states of the EU will use this portal for conducting their daily business processes. The CTIS digital portal will streamline communication between study sponsors and member states of the EU.   

Are we ready for the CTIS?

Most study sponsors are dealing with a time-sensitive question - are we ready for the CTIS?

As of July 2022, 195 clinical trial applications were submitted through the CTIS. However in earlier years, each year approximately 4000 clinical trials are approved. Clearly, study sponsors are still seeking more information and reorganizing their internal processes to make submissions via the CTIS.     

The CTR has introduced new standards of transparency and disclosure of in-process and completed trials. Study sponsors now have four major disclosure considerations before their clinical trial application (CTA) submissions. First, personal data and commercially confidential information (CCI) are exempt from disclosure. Study sponsors need to determine what constitutes the CCI for their studies. Secondly, public and non-public versions of documents need to be submitted simultaneously. 

Next, after the approval of the study, the documents will become available to the general public. If a deferral was requested then the documents will not be published along with the public version of the document. Instead, the documents will be published per the approved deferral timelines. An additional note to remember is that the timelines for pediatric studies and adult studies are different. Lastly, now clinical trial documents need to include a plain language summary for Phase 2-4 trials within 12 months of the close of the trial.      

Although there are four major disclosure considerations, each of them has several decision points (e.g. what constitutes the CCI?) and these decisions can have an impact on the entire lifecycle of the study. Study sponsors may benefit from having regulatory affairs teams in-house or consulting external agencies that can provide expert opinions about regulatory affairs. Since this is a time-sensitive endeavor, several study sponsors are currently looking for more information.    

Where can I find more information? 

Some study sponsors struggle with finding reliable and clear information about the CTIS. The European Medicines Agency website is a good place to start. It includes various guideline documents and videos to inform about the new regulations. However, some of these rules can be complicated, and seeking advice from experienced professionals with several years of regulatory experience usually helps.      

Recently at the Clinical Data Disclosure Day 2022, a virtual webinar series, Real Life Sciences (RLS) team presented the CTIS readiness webinar. It offered practical tools and reference materials to meet the disclosure-related requirements during the time crunch. Additionally RLS experts also shared insights about how to plan for the publication of trial documents under the CTR. For a copy of the recording Contact RLS.

Clinical Trial Transparency: The Relationship Between Data Utility & The Risk of Re-Identification

Anonymizing patient information is crucial for privacy protection. Due to data sharing initiatives across research institutes and research projects, anonymization has become an important step in clinical research. When research studies that require expensive diagnostic resources such as fMRI, genetic testing, research organizations come together to share data for secondary analysis. Regulatory bodies such as the US Food and Drug Administration, Health Canada and EMA encourage data sharing initiatives.  

Although there are several factors that contribute to decisions around what and how to anonymize patient information, regulatory requirements continue to be one of the most important factors. Depending on the regulatory requirements, disclosures and submissions have their own unique nuances. As such, the transformation approaches used in one submission may not be applicable to others, and a different approach may be needed.

Since 2019, Health Canada strongly encourages employing quantitative risk modeling methodologies. The European Medicine Agency too has suggested similar guidelines. These guidelines are reinforced by not only Policy 0070 but also with Clinical Trials Regulation (Regulation (EU) No 536/2014).  In particular, pseudonymization of identifiers as opposed to outright suppression or redaction to preserve data utility. When sample sizes are small i.e. less than twenty patients, quantitative risk assessments will often yield the decision to outright suppress many of the identifiers. But the quantitative modeling process can help by generating transformation options for each identifier and provide the supporting evidence and rationale for the anonymization approach taken.

A fundamental problem in privacy-preserving data disclosures is how to make the right tradeoff between protecting patient privacy and data utility. Broadly speaking, patient privacy and data utility may have an inverse relationship. Researchers have observed that even for modest privacy gains complete destruction of the data utility may be needed. Another possible unintended consequence is that excessive protection of privacy, using techniques such as suppression (or removing data), can give misleading results and that could pose a public health risk.

These challenges can be tackled with the help of machine learning technology. Let’s elaborate using Figure 1.      

jPSi2Kh3DgnAJKohg wc2TF7hg 8PWXsmCTOGsgSxdTMAFUe1j8MzWHNJ1DM 00hqoGVr1Y6rDP52hOwb9GcPqLctMROi3LbRtEf3X

                       Figure 1. Finding an Acceptable trade-off

In Figure 1, data utility is represented on the x-axis and privacy protection is represented on the y-axis. For researchers, preserving data utility while maintaining patient privacy is the ideal situation. But often the ‘ideal situation’ of maximum patient privacy protection and maximum data utility may be impossible to achieve as it is ill-defined. Researchers try to focus on finding an acceptable ‘trade-off’ or the sweet spot where adequate data is retained without compromising patient privacy.  

With the help of statistical modeling, instead of an ideal case scenario, an acceptable trade-off may be computed. This process can sometimes become iterative depending on how much data is lost in each round of quantitative modeling. Therefore, anonymization approaches must counterbalance the level of anonymization with the level of information loss. Typically, this will entail anonymizing certain identifiers with greater levels of anonymization than others, i.e. a tradeoff is made between levels of anonymization across identifiers to maximize the data utility and minimize the loss of information.

At Real Life Sciences (RLS), we have developed the industry leading purpose-built anonymization platform RLS Protect.  RLS team of data anonymization experts collaborates with  clients to determine the balance between the risk of re-identification and data utility. We welcome any and all queries related to your clinical trial transparency efforts

Screenshot 286

Anonymization Techniques

Data sharing and secondary analyses of research data are becoming common practices. Disclosure of individual-level participant data has become the new normal. While sharing data from clinical trials, sponsors and clinical research organizations (CROs) use various techniques to protect the privacy and confidentiality of patients. These techniques are commonly referred to as anonymization techniques. These techniques have their pros and cons. Some techniques protect patient data but make secondary analyses difficult, while others make secondary analyses possible but carry a risk of patient re-identification. 

Here we discuss further details of four popular anonymization techniques.  

Anonymization techniques in clinical research

Earlier, the most common method for data anonymization was suppression. Suppression involves redacting or removing the data, so it is no longer readable. This is also known as “Masking” or “Redaction”. However, a major disadvantage of this procedure is that it decreases the data utility for secondary analyses. The research community needs alternative statistical approaches that help preserve data utility along with low re-identification risk.  

One of the popular methods is “generalization”. Generalization involves making patient identifiers less granular or more general. For instance, instead of specific patient ages, age ranges are shared. Or instead of specific geographical locations, broader areas such as state, country, or even continent are disclosed. This method is useful when data is relatively homogenous and there are no extreme outliers. Outliers may pose a risk of re-identification of patients.  

Another commonly practiced method is pseudonymization. In Pseudonymization new data elements are introduced instead of existing data. For clinical trials, it involves re-codifying a value such as a Patient ID into a different number. This method is typically used for directly-identifying variables such as patient ID, medical record numbers, or even phone numbers (e.g., patient ID 5280 will be shared as 2805). One major advantage of this method is that it keeps the link to entire study participant data intact. However, it still has a significant re-identification risk.    

A less popular technique adopted is Random Noise. When this approach is adopted, it involves adding or subtracting random values/amounts from numeric or data-oriented identifiers to make it difficult to determine original values. Data scientists may sometimes use specific methods such as additive or multiplicative noise while introducing noise. This procedure allows them to reduce the re-identification risk even when outliers or influential observations may be present in the dataset. Since this method requires data science expertise, it may not be easy to find such experts in a niche domain.  

Regulatory requirements

Apart from a variety of approaches, sponsors and CROs need to be mindful of regulatory requirements and industry standards. There are suggested data standards such as Clinical Data Interchange Standards Consortium (CDISC), Study Data Tabulation Model (SDTM), Analysis Data Model (ADaM), and anonymization guidance for data transformation from industry consortia/working groups (e.g. PhUSE). After thoughtful deliberation, various authorities selected these standards because they provide a reasonable tradeoff in terms of removing risk, preserving data utility, and are reasonable to implement with current technologies and tools. The choice of transformations for a variable depends on:

Sponsors/CROs make anonymization decisions after careful consideration to ensure that it meets regulatory requirements. For example, Since 2020, Health Canada has discouraged redactions to promote data sharing. Along with clinical scientists, data scientists weigh-in to pick the right anonymization technique. If data expertise is not available in-house then often CROs choose to collaborate with an external data agency such as Real Life Sciences. At RLS, our team of experts treats each clinical trial dataset as a unique challenge and provides customized anonymization solutions. We keep up with the current regulatory requirements and suggest solutions accordingly. These solutions have helped our clients to get quick approval from the regulatory authorities. Connect with us to further discuss clinical trials and the solutions we offer.   

Maximizing Data Utility In Clinical Transparency: Outlier Patients

Regulatory authorities such as Health Canada now encourage data sharing for secondary analyses and derivative research. Data sharing with other institutions needs to be done in a safe manner while protecting patient privacy and confidentiality. One method commonly adopted in the recent past was redacting the personally identifiable information. But this approach reduces the data utility. Health Canada now encourages quantitative risk assessment while anonymizing data for safe data sharing.

A common practice in data anonymization is to transform personal information, such as an actual age to an age range, in which the actual age will fall. For instance, prior to data anonymization, in a dataset under the age variable a patient's actual age such as 42 years is included. During the data anonymization process, 42 years is replaced by a range of 40-45 years. The age range is defined as part of the risk assessment process. Since there will be more such individuals in the dataset, the risk of re-identification is quite low. 

This common practice of data anonymization is not well-suited for data points that are extreme. These extreme observations are termed as ‘outliers’ and need to be dealt with before the dataset is analyzed. The central tendency of a dataset is indicated by the average or the mean. For instance, the average age of patients in a dataset is 49 years. However, this dataset may include individuals who are 98 years of age or 18 years of age. These extreme values are the outliers as the data point is far away from the mean. 

Outliers pose several problems for data analysis. Outliers skew the averages for the group, which may not be a true representation of the patient group or the control group. Also, these extreme values may lead to invalid results when various statistical tests are conducted to analyze the dataset. Sometimes outliers could simply be due to errors in data entry. Hence, sponsors/CROs check data periodically to ensure correct data is entered or if possible, a measure may be repeated to get the normal value.  

The bigger problem regarding outliers from a patient privacy perspective is the risk of re-identification. These outliers are (a single or) a small set of patients with extreme attributes, such as a single elderly patient aged 98 years. In these cases, often the data utility will suffer, because the patient will not fit into any equivalence class with others. Providing an age range for data anonymization would not be a useful solution. An adversary may attempt to re-identify an outlier when a range is provided. HIPAA regulation encourages addressing outliers for eighteen direct identifiers such as age, phone numbers, race etc at a ‘safe harbor’ anonymization level. Safe harbor means the data for these eighteen identifiers is removed from the dataset.    

This strategy often leads to suppression of that attribute for all patients, greatly degrading the data utility. One solution to this issue is to allow the system to treat some patients as outliers and suppress their entire record (all attributes). For instance, a sponsor would suppress all attributes of the single elderly patient. While one might lose a little data utility for that single patient, it might lead to a greater increase in data utility by allowing us to retain age for the rest of the patients.

Using this approach for increasing data utility across smaller populations which are often more sensitive to single outliers is beneficial for data sharing.

Data related decisions can be difficult as they may have unintended consequences down the road. Sponsors/CROs often engage expert services for consulting and addressing queries from regulatory bodies. Real Life Sciences (RLS) provides expert advice on challenges such as outliers to meet the requirements of regulatory authorities. RLS data scientists help in identifying outliers and then suppressing their data as needed.  

Maximizing Data Utility In Clinical Disclosure: Reference Populations

Choosing the right reference population maximizes data utility. While choosing the right reference population, the right approaches and methods further facilitate data sharing and secondary analyses. Without a reference population, the risk of re-identification of patients is real. 

When anonymization of patients is done, researchers focus on blending in any unique patients remaining in a given dataset or the removal of uniqueness. When other similar people with the same characteristics exist in the dataset then an adversary might not be able to definitively re-identify a patient. A widely accepted numerical threshold for similar individuals is 11. (Here similar individuals are in regards to their indirect identifiers). Different ways to select reference populations are worth further discussion. 

The risk of re-identification

Reference population means “the group of individuals used to determine the risk of re-identification”. The reference population of a dataset is the set of persons who are considered to have ‘similar’ characteristics to those being modeled for risk. Here ‘similar’ refers to suffering from the same disease(s) of the trial in question, demographics, a period in time, locations, as well as having participated in a clinical trial for the indication/treatment in question. 

Sponsors and Clinical Research Organizations (CROs) need to meet the regulatory requirement of 0.09 risk threshold which demonstrates that there are at least 11 individuals with each set of identifying attributes. Or the cell size equivalent is 10 individuals. There are multiple ways to demonstrate that there are at least 11 individuals with a set of characteristics but using the K-anonymity model (k=11 threshold) is one of the common ways. 

In 2019, Health Canada has issued further guidelines about reference population selection. Its guidance document states that:

“The selection of the appropriate reference population determines the total patient group size and the amount of anonymization (i.e. data transformation) that is necessary to reduce the risk of patient re-identification. The reference population can be informed from patients in the single trial in question (smallest population), all patients in similar trials by a specific study sponsor, all patients in similar trials (e.g., by disease or therapeutic intervention category), or all patients in a geographic area (largest population).

When the appropriate reference population is one other than the single trial in question, an extrapolation of the trial population can be applied to achieve an estimate of the population size. In keeping with the first and second guiding principle, risk of re-identification should be informed not only by the number of individuals in a single study, but also by the number that reflects real-world risk.”

Determining a reference population

Since each dataset is unique, a blanket method cannot be applied to selecting the reference population. The selection is done on a case-by-case basis. CROs typically conduct trials with small sample sizes and focus on indications with limited established and pre-existing research. These factors drive the decision-making when selecting reference populations and must be evaluated prior to making the selection.

There are four possibilities for selecting a reference population. The most conservative one is where each individual study serves as its own population. Another method is to combine all the participants. In this method, all the studies in the submission together serve as a “pooled” population. The third method goes beyond the submission and it includes all the studies in the submission plus other recent similar studies for the same indication/treatment. Lastly, the least conservative selection would be having larger geographic / prevalence estimates. 

For rare and ultra-rare diseases, a more conservative approach is usually adopted while selecting the reference population. When each individual study serves as its own population, researchers do not assume a wider disease population. This allows them to focus the risk measurement only on the population of the study. Conversely, if there are additional studies with sufficient patient counts, outside of the study population, then geographical estimates are used. But often in the case of ultra-rare diseases that are yet to be extensively researched, such populations are not found. Please see our case study on working with rare disease populations.

As you can notice, selecting a reference population requires a nuanced understanding of the dataset at hand. If the reference population is selected using too conservative criteria then the risk of re-identification is lower and less conservative criteria may increase the risk of re-identification. At Real Life Sciences, data science experts can provide insights to make the decision-making process easier. Based on the input from sponsors, we provide recommendations for selecting the reference population. Furthermore, these experts can later anonymize sensitive variables in the dataset per regulatory requirements. We welcome further discussions about clinical trial datasets for rare or common diseases.