Resource Library

Quantitative Anonymization and Clinical Data Sharing for ChAdOx1 nCoV-19, by the University of Witwatersrand and Real Life Sciences, LLC

Study Summary

The Wits Vaccines and Infectious Diseases Analytics (VIDA) Research Unit is dedicated to advancing global
health through high-quality research focused on combating infectious diseases, with an emphasis on
vaccine development and evaluation. A notable example is their ChAdOx1 nCoV-19 vaccine trial conducted in South Africa, which assessed the safety and immunogenicity of the AstraZeneca COVID-19 vaccine in both HIVpositive and HIV-negative adults. The study revealed that the vaccine was well tolerated and induced robust immune responses in both groups, with heightened immunogenicity observed in SARS-CoV-2 seropositive participants. It also demonstrated cross-reactive binding antibodies against the Beta variant and the wild-type virus, highlighting its potential efficacy against multiple strains. Conducted in South Africa, a region with one of the world’s HIV prevalence rates, this trial provided critical insights into vaccine performance in a diverse population, underscoring the importance of inclusive vaccination strategies in addressing global health challenges.

Data Anonymization Overview:

While completing the ChAdOx1 nCoV-19 Project, the University of the Witwatersrand (Wits VIDA) took action to fulfill its trial transparency obligations as outlined in the Gates Foundation Open Access Policy.

The Gates Foundation, the funding partner of the Wits VIDA ChAdOx1 nCoV-19 Project, through its Open Access Policy, states that research results must be accessible and open immediately after completion.

The importance of anonymizing and sharing participant-level data to enable its use in secondary research while ensuring the privacy of participants was also a critical success criterion of the University of the Witwatersrand. By making the anonymized trial data accessible to qualified researchers from academic
institutions and/or pharmaceutical companies, valuable insights can be gained from the trial while maintaining participant privacy. This effort aims to advance secondary research, enhance the scientific value of the data, and preserve its clinical relevance. The anonymized datasets are securely hosted on the Vivli platform, which offers secure and controlled access and built-in tools for data analysis, supporting in-depth exploration of the trial findings.

Data Anonymization Approach:

Wits VIDA partnered with Real Life Sciences, LLC (RLS), a leader in quantitative, risk-based data anonymization, to execute this critical task. RLS applied its advanced anonymization techniques through its flagship solution, RLS Protect, striking a careful balance between preserving data usefulness and safeguarding participant privacy. The RLS team utilized the k-anonymity (k-map) privacy model
to fulfill the needs of this project.

The Project:

The data anonymization project was completed in three (3) weeks from initiation to the delivery of the final anonymized data. Within this span, RLS and Wits VIDA conducted a project kickoff meeting, and RLS led the effort to complete the data analysis, create the anonymization plan, and process the data. RLS
reviewed and confirmed the anonymization options and outputs with the Wits VIDA research team through a few collaborative discussions. The collaboration between RLS and Wits VIDA was crucial in staying aligned with the goals and timelines.

Table 1 Wits Vida Case Study

Data Analysis:


During the initial data analysis, RLS identified several factors for proceeding with the quantitative anonymization:
 Variable Discrepancy with Participant IDs:
Data attributes for subject identifiers varied widely across datasets. In order to maintain consistency and linkability across all datasets for each participant, subject identifiers required reformatting. For example, a subset of the 49 datasets anonymized required the record_id attribute to be updated to the patient_id attribute and another subset of datasets required a patient_id attribute to be derived from a subject_id attribute.
 Indirect Attributes:
Given the importance of the sociodemographic of the participants, it was important to preserve those attributes within the parameters of the risk assessments as long as they do not pose risk of reidentification. Some of those were Age, Sex, Weight, Height, and additional attributes that required
additional modeling, such as pregnancy status and social smoking, drinking, or drugs taken.
 Identifying Attributes:
RLS and Wits VIDA identified all attributes with potential re-identification risks, such as hidden verbatim text or personal information. These variables were carefully handled to eliminate re-identification risks while preserving clinical utility.
 Dates:
All participant-associated dates were randomly offset by a value between 1 and 364 days. Each participant received a unique random offset, which was applied to their dates while preserving the original durations between events.
 Adverse Events (AEs), Concomitant Medications (CMs), and Medical History (MH):
Through evaluation to determine the best approach for patient protection and clinical utility.

Anonymization Plan:

Following data evaluation, RLS and Wits VIDA formulated the anonymization plan by defining participant IDs, quasi-identifiers, date variables, AEs/MH/CM variables, identifying variables, verbatim text variables, and the connections between participants across datasets. Applying a privacy model and risk threshold
involved careful discussion. Although regulatory projects often have set risk thresholds, for a data-sharing project such as the ChAdOx1 nCoV-19 Project, the trial sponsor can define its own. RLS and Wits VIDA aimed to avoid an overly conservative approach while ensuring low re-identification risk, considering
the study’s population, quasi-identifying attributes, and controlled data access requirements.


Based on these factors, RLS recommended the k-anonymity model with a 0.33 risk threshold. This approach balanced data accessibility for researchers with privacy protection for participants, facilitating responsible data sharing for secondary research.

Data Processing and Output Metrics:

Below is a summary of how a subset of attributes were anonymized during the quantitative analysis. Note the stated examples and values are modified from their original values to protect the anonymity and privacy of trial participants.

 Participant IDs: IDs are restructured and pseudonymized consistently across all datasets.
Table 2 Case Study
 Quasi-Identifiers: Seven primary quasi-identifiers present within the datasets: Age, Gender, Race, Weight, Height, Date, and Body Mass Index (BMI).


-Age: Generalized to age-bands of 10
-Race and Gender: Retained
-Dates: Offset 1 to 364 days
-Height and Weight: Generalized to intervals of 10
-BMI: Generalized to <18.5, [18.5, 24.9[, [25, 29.9[, ≥30

To achieve this level of data utility while staying within the acceptable risk threshold, it was necessary to suppress the quasi-identifiers for 4.23% of the study population.
 Adverse Events, Concomitant Medications and Medical History: Following
evaluation and risk assessment, it was decided that verbatim terms would be
redacted, while all other terms would be retained. This approach ensures the
clinical utility of the study while safeguarding patient privacy. All coded, lower-level
and higher-level terms will be retained.
 Identifying Variables: One hundred and sixty-eight variables were considered
identifying; therefore, they were redacted. The variables may potentially
contain verbatim text or comments that directly reveal the participant.

The Results:

In addition to the anonymized datasets, RLS provided Wits VIDA with a detailed anonymization report. This report outlined how each data attribute was modified, including what generalization, anonymization, or redaction was applied for each variable. Importantly, non-sensitive attributes were retained in their original form, which is also noted in the report. Together, the report details provided an organized and structured approach for the Wits VIDA research team to review and assess the anonymized results. The data’s overall utility remained reasonably high due to the study’s focus on COVID-19, the large population size, and the applied risk threshold, which minimized the need for suppression or generalization of attributes like age, height, weight, or BMI.
A key part of the anonymization process was the risk analysis, which measured the re-identification risk before and after the assessment. This analysis can be reviewed through the RLS Protect solution or in the anonymization report.

Tabe 3 Wits Vida Case Study

After receiving the completed dataset and anonymization report deliverables
from RLS, the Wits VIDA team completed their final review posted the anonymized
datasets and anonymization report to the Vivli data repository which is immediately
visible to secondary researchers.

Accessible Trial Results for Secondary Research:

Vivli, a non-profit organization manages a data sharing and analysis platform that is available to the research community to share individual participant data of completed clinical trials. The Vivli platform includes an independent data repository, in-depth search engine and a secure research environment. Users can search listed studies, request datasets from data contributors, aggregate data or share data of their own. Wits VIDA has selected Vivli’s secure platform to make its anonymized data available to researchers. “Combined with other similar COVID-19 trial data, the anonymized Wits VIDA trial data provides significant value to those on-going and future COVID related studies,” said Julie Wood, COO at Vivli.

In Closing:

The fast-moving ChAdOx1 nCoV-19 data anonymization project succeeded due to a commitment to responsible data sharing, strong collaboration between Wits VIDA and RLS, and the use of RLS Protect, which provides specialized capabilities for statistical and anonymization scenario analysis. This initiative followed RLS’ industry-leading practices for anonymizing clinical data, providing broader access to the results data while ensuring participant privacy. The anonymized datasets were immediately made available to support secondary research, enhancing the scientific impact of the ChAdOx1 nCoV-19
Project and promoting clinical transparency. Additionally, the data meet the Gates Foundation’s standards for responsible sharing.
“Real Life Sciences’ clinical data anonymization capabilities and frequent communication throughout the three week engagement gave us the confidence we could achieve the expectations of the Gates Foundation’s Open Access Policy without issue,” said Alane Izu, Research Lead at Wits VIDA.
The collaboration between RLS and Wits VIDA was highly effective, with both teams working together to anonymize the data while preserving its scientific value. Wits VIDA contributed their in-depth knowledge of the study, while RLS complemented this with their expertise in quantitative anonymization methodology for clinical data. This partnership facilitated open communication and innovative solutions, successfully balancing privacy protection with data utility for future research.
Researchers can request access to the anonymized ChAd0x1 nCOV-19 data available on Vivli.org.

About Wits VIDA
The Wits Vaccines and Infectious Diseases Analytics (VIDA) Research Unit is
dedicated to advancing global knowledge and improving health outcomes,
particularly in Africa. Their mission focuses on conducting high-quality research
to combat infectious diseases, with an emphasis on epidemiology of vaccine
preventable diseases, vaccine development and evaluation.


About Real Life Sciences, LLC
Real Life Sciences (RLS) and its industry leading anonymization platform,
RLS Protect, protects personal and confidential data utilizing advanced
anonymization techniques using a quantitative risk-based anonymization
methodology. RLS Protect is the leading SaaS solution to assist biopharma,
CROs and academic institutions with regulatory disclosure submissions such as
EMA Policy 0070 and Health Canada Public Release of Clinical Information and
voluntary data sharing of clinical data and documents.


About Vivli.org
Vivli is a non-profit organization working to advance human health through the
insights and discoveries gained by sharing and analyzing data. It is home to an
independent global data-sharing and analytics platform which serves all elements
of the international research community. The platform includes a data repository,
in-depth search engine and cloud-based analytics, and harmonizes governance,
policy and processes to make sharing data easier. Vivli acts as a neutral broker
between data contributor and data user and the wider data sharing community.

crossmenu linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram