Real Life Sciences has been accepted to present at PHUSE US Connect 2022 in Atlanta.
Clinical documents and datasets from randomized controlled studies contain a variety of structured, semi-structured and entirely unstructured data. Processing of unstructured data represents the majority of the workload in the anonymization/disclosure process, NLP techniques are necessary to perform structuring/classification tasks efficiently.
Unstructured data needs to be classified and converted into a structured format in order to perform disclosure risk assessments and anonymization of patient identifiers prior to disclosure/sharing. This intermediate step of structuring/classifying data prior to anonymization has downstream benefits for performing secondary analysis of the clinical data. NLP can also be used to structure other related sources of real world data (patient survey responses, doctor's notes/EHR or social media data)
We discuss how an integrated NLP platform (RealNLP) is used to 1) generate additional insights into the existing clinical data selected for disclosure, 2) how both clinical and related real world data assets can be simultaneously analyzed and compared