Real Life Sciences has been accepted to present at PHUSE US Connect 2022 in Atlanta.

Leveraging deep NLP frameworks to simultaneously perform anonymization and insight generation from both clinical and real world datasets.


Clinical documents and datasets from randomized controlled studies contain a variety of structured, semi-structured and entirely unstructured data. Processing of unstructured data represents the majority of the workload  in the anonymization/disclosure process, NLP techniques are necessary to perform  structuring/classification tasks efficiently.

Unstructured data needs to be classified and converted into a structured format in order to perform disclosure risk assessments and anonymization of patient identifiers prior to disclosure/sharing. This intermediate step of structuring/classifying data prior to anonymization has downstream benefits for performing secondary analysis of the clinical data. NLP can also be used to structure other related sources of real world data (patient survey responses, doctor's notes/EHR or social media data)  

We discuss how an integrated NLP platform (RealNLP) is used to 1) generate additional insights into the existing clinical data selected for disclosure, 2) how both clinical and related real world data assets can be simultaneously analyzed and compared


crossmenu linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram