On-Demand Webinar: Patient Voice: The Intersection of Real World Data & Natural Language Processing

Using a categorization model for Social, Emotional, Cognitive, Physical & Functional parameters we will discuss amplifying the patient voice via Natural Language Processing. By sorting through large amounts of data in ways never before possible you will learn how to generate novel insights from Patient Outcomes in Real World Data.

View Here: Patient Voice: The Intersection of Real World Data & Natural Language Processing

Duration:45 Minutes

On-Demand Webinar: Improving the Patient Journey: Leveraging Novel Social Media Data to gain Insights into Patient and Caregiver Reported Outcomes


We will examine traditional (e.g. patient reported outcomes) and non-traditional (e.g. social media) approaches to collecting patient insights that were mapped into a framework modeled around Development Decision Points (DDPs) and milestones. During this webinar we will be highlighting approaches based on the relative prevalence of clinical, economic, functional, behavioral and perceptional data and the efficiency for insight generation. We will look at a published retrospective analysis conducted across 17 case studies from early-development, peri-launch and post-market programs to explore the impact of traditional and non-traditional approaches on patient access.

View Here: Improving the Patient Journey: Leveraging Novel Social Media Data to gain Insights into Patient and Caregiver Reported Outcomes

Duration: 44 MInutes

Curating Self-Reported Patient & Caregiver Narratives

Natural Language Processing (NLP) algorithms enable us to use novel data sources such as electronic health records (EHR), media interview transcriptions, and social media data. For studies using social media, the data is spontaneous, heterogenous, and is less structured. This makes data curation a necessary step. Data curation is one of the main influencing factors that determines the quality of the data.

Data scientists have proposed a variety of strategies for data curation. Some consider data curation unnecessary while others think it is unavoidable. Data curation may be necessary for privacy concerns such as excluding personal identification, or to ensure the data used in the model cannot be ‘hacked’. But most importantly, data curation is necessary to identify good quality, robust, and inclusive data that can be converted into actionable insights.    

Social media data enables us to understand narrators’ perspectives about specific diseases, treatment, and healthcare options available. Narrators may include stakeholders such as doctors, care team members, primary caregivers, and patients themselves. At Real Life Sciences (RLS), we use sophisticated NLP algorithms to convert unstructured social media data into structured data that can be analyzed to understand the patient journey further.

To get high-quality data, we employ NLP algorithms that will weed out irrelevant data and provide usable data. First, we use tools to aggregate several social media sources instead of one popular networking site. Popular social media websites such as Twitter or Facebook often represent a small portion of the population of a specific disease. Hence, we need a different approach for comprehensive data collection within a specific population. We include many patient forums and online communities in order to enrich our data. Our NLP algorithms cast a deep, comprehensive, and wide net to include data from disease specific forums, blogs, and verified patient communities along with social media sites.

After collecting lots of data, we establish processes to reduce the noise in the data or whittle it down to retain data that we care about. We start with a simple and popular ‘keyword-based approach’ (or bag of words) to identify relevant posts. Here keywords or terms used are definitively associated with the disease, not confused with other diseases. For instance, in a recent study related to Alzheimer’s disease, some of the keywords used, that were specific to the disease, were dementia, AD, APOE etc. As an example, this helped in narrowing down the initial sample of ten million posts to one million!

Next, we apply sophisticated algorithms to ensure keywords are true experiences and not random experiences or “joke” references. A “concept model” that includes a combination of keywords such as patient and disorder is built. This is also referred to as ‘relational modeling’ as the sentence structure and syntactic relations between keywords is of prime importance in relational modeling. For instance, a narrator may have mentioned ‘grandfather got forgetful’, which could be based on one event or due to a disease that is not relevant to the research study. The algorithm ensures that narrators have referenced a diagnosed concept e.g., ‘grandfather got forgetful’ does not include a diagnosis and hence would be disqualified from the study. But ‘grandfather got forgetful due to his Alzheimer’s’ will be included in the qualified data.   

Since the main objective of the study is not prevalence of the disease but to understand patient journeys, additional models are implemented to extract and analyze the data. For instance, in Alzheimer’s study, a “suffer relation”model that captured symptoms was applied. If a narrator living with Alzheimer’s posts “last week, I had severe anxiety”, the suffer relation model would capture the post. 

Later a functional “impairment” model was applied. This model allowed us to understand not just symptomatology but impaired activities (e.g., getting dressed, parenting, driving a car). It will help us to know the functional problems caused by or alongside the impairments. For example, a narrator may post ‘Dad’s Alzheimer’s is worse, and he’s had trouble dressing. We need to hire help’. This narrative allows us to look at the impairment and the impact of that impairment.

Each research study asks for unique research questions and NLP algorithms need to be tailored accordingly. The process of data curation is iterative and may need to be revisited depending on various factors. Also, manual review of the data may be necessary at a later stage. At RLS, our data team works closely with clinical scientists to ensure high quality of data analytics.

Research Ethics In Social Listening

In recent years, clinical researchers have adopted social listening as a technique while conducting social media studies to gain a deeper understanding of diseases and further drug development efforts. Social listening refers to a passive approach where the research team does not provide any prompts or questions but simply ‘listens’ to the narratives posted by people. Data is aggregated from various posts made on social media sites such as verified patient communities or Twitter or disease specific blogs.         

The unsolicited, voluntary social media posts often contain real-time updates of disease progression and patient journeys. Investigating social media data can reveal trends in patients’  knowledge of a certain disease at a certain stage, their thoughts or approach in dealing with the disease, and coping strategies. The information gained from social media data can form valuable insights that help healthcare organizations such as hospitals and pharma companies to create positive patient experiences. Additionally, support for patients and caregivers or group interventions can be delivered via social media. Social media data can even  help public health agencies to develop public health policies. 

Although the benefits of social media research are evident, the risks associated with social listening techniques cannot be ignored. Most research organizations are careful about maintaining patient privacy. But since the standards for privacy were not clearly defined by the regulatory authorities, loss of patient confidentiality or revealing information inadvertently was a real possibility. The scientific community has debated about social media research ethics for over a decade. In 2011, the US Food and Drug Administration (USFDA) had issued guidelines for pharma companies regarding their use of social media but it did not address social listening. In current times, the US FDA themselves have used natural language processing (NLP) technology and social media for pharmacovigilance purposes but no clear guidelines for social listening were available until the recent past. The FDA has now provided further guidelines about social listening.

Medical product information can be gathered from various sources such as animal toxicology trials, pharmacogenomic studies, registries, clinical studies, and product quality reports. The FDA  emphasises  that social media, in particular social listening techniques, can be a valuable source in gathering information about medical products. Social listening techniques can also be useful for other agencies such as the Center for Veterinary Medicine and the Center for Food Safety and Applied Nutrition. According to the FDA, it is possible that social listening techniques are more efficient than traditional post-marketing surveillance and it has the potential for faster signal detection. The FDA acknowledges that social media data can be used in a variety of studies such as qualitative, quantitative, and mixed methods.    

In its recent Guidance document, the FDA provides three important non-binding recommendations or guidelines. First, it suggests that social media researchers need to include a wide variety of social media sites. Some websites may appeal to a certain demographic of  individuals. Including data from only such sites may introduce a bias in the outcomes of a research study. To improve the generalizability of the study outcomes, many different social media sites need to be included. Next, it notes that researchers may have access to identifying information on verified patient community sites. But when the research topic is sensitive in nature, anonymous posts may be included in the study to protect patient privacy. Lastly, the FDA suggests that researchers need to clearly state limitations of the study when they affect data integrity e.g., a lack of mechanisms to verify the diagnosis.  

At RLS, social listening techniques combined with advanced natural language processing have been used effectively to study diseases of the skeletal system such as ankylosing spondylitis and psoriatic arthritis. Even prior to the issuance of FDA guidelines, respect for persons or patient privacy and confidentiality has been and will always be one of our core values. Our research team stays up-to-date on various guidelines provided by the regulatory authorities such as the FDA, Health Canada PRCI & the EMA, including policy 070 and EU CTR 536. Furthermore, Real Life Sciences deploys industry leading technology surrounding Quantitative Risk Modeling and Assessment for Clinical Trial Transparency and Disclosure. While working with various organizations, our team ensures that studies are compliant with institutional review board’s (IRB) requirements. Social listening techniques combined with AI technology and categorization frameworks such as SPEC-F allow us to understand diseases better and expedite drug development.

PHUSE US CONNECT 2022

Real Life Sciences presented at PHUSE US Connect 2022 in Atlanta.

How Disclosure Teams Are Collaborating And Embracing the Changes Imposed by Global Transparency Regulations

Summary:  

Never before have Disclosure & Transparency teams found themselves more squarely the center of the clinical process and balancing the various needs of regulators and internal policies. Collaboration between Disclosure/Transparency and Medical Writing, Regulatory, Legal, Biostats and Clinical teams is an absolute requirement to the organization’s success.

To receive a copy of the presentation please contact us here: Contact

Agenda

Making Sense of Novel Real World Data to Improve the Development of Patient Relevant Endpoints: A New Framework

As part of achieving drug approvals and patient access, momentum is shifting from traditional methods of collecting and analyzing clinical trial and patient outcomes data to a hybrid approach that involves assessing novel real world data (RWD) for post-market monitoring. The hybrid model combines the results data collected during structured trials with that of out-of-clinic patient and caregiver data from the real world. Real world data helps drug companies hear from a wider breadth of patients and gauge the concerns patients, caregivers, and providers are voicing online about their disease state or medication. As patients usage of social media increases, gathering this information in a digestible and approachable manner will help organizations assess disease burden. This shift is already leading to a change in operating models for pharmaceutical organizations and Clinical Research Organizations (CRO).  

In June 2018, the United States Food and Drug Administration (FDA) encouraged the use of social media to amplify the patient’s perspective emphasizing  the value of considering  the patient experience.  The FDA produced its guidance document in 2020 - Patient-Focused Drug Development: Collecting Comprehensive and Representative Input.

The FDA also has its own programs for and perspectives on collecting information about adverse events from social media, signaling the increased reliance on utilizing specialized social media in Pharmacovigilance. The FDA continues to explore the value of social media mining for safety signal detection and patient experiences. The FDA outlined its potential direction in FDA Perspectives on Social Media for Postmarket Safety Monitoring.

The FDA in examining collection methods for increasing focus on patient experience outlined the following:

The advances in Artificial Intelligence (AI) and Natural Language Processing (NLP) have made it possible to curate and interpret large amounts of unstructured data. What many organizations, including the FDA struggle with, is the ability to collect and make meaningful use of such large amounts of unstructured data, organize and categorize it in a standardized way that allows analysis and decision making.

The FDA states in  FDA Perspectives on Social Media for Postmarket Safety Monitoring that the regulatory body has limitations:

Real Life Sciences has experience across therapeutic areas (Alzheimer's Disease, Oncology, Rheumatoid Arthritis, Psoriatic Arthritis, Multiple Sclerosis, Depression, Anxiety and Diabetes)  in novel Real World Data collection and analysis using publicly available social media and verified patient communities. Using the Real Life Sciences patient analytics platform, a combination of natural language processing and data mining approaches were put in place to aggregate, consolidate and structure real world patient reported data from social media and drug safety databases. Pulling together all of this unstructured social media data and structuring it in order to produce outputs has uncovered three overarching parameters:

  1. Natural Language Processing is a fundamental requirement to curate, categorize and structure large amounts of data collected from specialized forums and websites related to various disease areas
  2. In order to operationalize the data there must be a mechanism to apply standardized terminology to large amounts of unstructured data.  The standardized narratives must then be applied to a framework that can organize patient experiences for analysis
  3. Scalable data curation and quality control processes that ensure resulting outputs minimize the noise of RWD

Below is an excerpt from a recent Real Life Sciences’ study “Alzheimer’s Disease: Developing Quantifiable Patient and Caregiver Insights from Self-Reported Specialized Social Media Data” which illustrates how collecting, standardizing and categorizing unstructured patient and caregiver narratives may effectively work:

“Reports were classified against a series of standard medical taxonomies such as the WHO’s International Classification of Functioning, Disability and Health (WHO-ICF) and Medical Dictionary for Regulatory Activities Terminology (MedDRA), and further into the following categorizations:

Social, Physical, Emotional, Cognitive, and Role Activity (SPEC-R)”


Real Life Sciences’ Real World Data collection and analysis solution in combination with its proprietary SPEC-R framework is referenced in “Developing an integrated strategy for evidence generation” in the Journal of Comparative Effectiveness Research.

The power of utilizing social media and verified patient communities lies in the ability to connect the data and make it meaningful. These previously disconnected and unstructured free text sources of valuable information can now identify key insights and disease patterns to conduct analysis that uncovers strategic and actionable findings about patient burden patterns. Further analytics, such as those provided by Real Life Sciences, can lead to decisions resulting in new research, new labels, missed revenue opportunities, stronger instrumentation tools, patient centric research models and more.

Market Access teams, R&D leadership, Medical Affairs, Regulatory Affairs and other teams within the pharmaceutical organization and CRO’s can leverage Natural Language Processing to be proactive and inclusive around patient and caregiver perspectives during the drug development life cycle. Coupled with an organizational framework such as SPEC-R, these organizations can harness the true utility of RWD on social media and verified patient community data to identify patient disease burden including Quality of Life (QOL) measures.

Natural Language Processing Uses and Applications in Healthcare.

Natural Language Process (NLP) 

Although human languages follow specific structures and patterns, they pose challenges for machines. In the early days, low-level machine languages that used binary codes (0 and 1) were commonly used for human-to-machine communications. Later, high-level languages such as C++ were developed. But it is only in recent times that machines are able to interact with human languages. With the advent of artificial intelligence (AI), machines can process human languages. 

In the early days of AI in linguistics, its use was envisioned primarily for translation services (e.g. English to Spanish). But soon enough other language-related applications were developed. Machines could perform simple tasks such as autocorrection or email response suggestions with accuracy. Natural language processing (NLP) is the branch of AI that enables machines to read, comprehend, and work with human languages. At present, machines can interpret and work with spoken language or written text. This makes NLP well-suited for solving various problems related to big data in the healthcare domain.           

NLP in healthcare for cost reduction

NLP applications can help in performing various repetitive administrative tasks that may otherwise require trained personnel. For instance, in hospital settings, NLP can be employed for information extraction and document categorization. NLP applications can read doctor’s notes and extract relevant information to accurately assign billing codes. This will help in pre-approvals or timely authorizations of treatments, which ultimately reduces the burden of illness. 

Accountable care organizations can use NLP to improve the patient journey. Chatbots or virtual assistants can provide information whenever requested by the patient and thus reduce the anxiety about hospital visits or medical procedures. This also allows patients to plan other aspects of their life such as caregiver schedules, which ultimately creates a positive patient experience. Additionally, a patient could ask for information about activities they can do or need to avoid after undergoing surgery. Chatbots can discuss the options and be useful in improving the quality of life for patients. NLP can also be utilized to study social media posts for understanding patient journeys.     

Clinical applications

Manual reviews of medical records are time-consuming. NLP applications help in reducing the time required for manual expert review of unstructured data such as electronic health records (EHR). Doctors and other healthcare professionals are already burdened with paperwork. Additionally, humans are prone to errors and omissions as fatigue sets in. NLP applications have the ability to review, analyze and sort the EHR into meaningful data and meaningful insights. In the case of clinical trials, safety reviewers read narratives from adverse events, medical notes, and also medical literature. The regulatory authorities that provide oversight - the US Food and Drug Administration (FDA) and the Centers for Disease Control and Prevention (CDC) consider NLP as a possible solution for unstructured data.      

Real world patient reported data often has missing data and NLP applications can be used to find this missing data. A study done in the UK used NLP for text mining data from free text EHR. The occupation of a patient is usually recorded in a structured field as categorical or numerical data. In mental illnesses, occupation in structured fields is often missing as the patient may be unemployed or is still a student. However, a doctor or nurse may have written it as free text in their medical notes. Given that mental illnesses and occupations are correlated, researchers need data about the occupation. The research team developed NLP applications for data mining and identified occupations for a majority of patients. Identifying occupations helps in designing policies for occupational placement and providing increased support for patients.              

NLP platforms for clinical research

Pharmaceutical companies have focused on the use of NLP for clinical trials. A recent study surveyed fifteen pharmaceutical companies to study their understanding of pharmacovigilance initiatives. The survey report has identified areas such as ‘language translations, case verification, in-line quality control, prioritization/triage, data entry, alerts for cases of interest, workflow management, and monitoring where automation technologies’, including NLP can be effectively used. NLP applications can further be useful in qualitative studies where data can be extracted from focus groups, interviews, or questionnaires.  

NLP applications help in extracting meaningful insights from the data. RLS has a trained data team that has the experience and expertise to work with available data. Real Life Sciences Technology Platform offers powerful tools to aggregate data, analyze it using standardized medical terminology, and provide answers to interesting research questions. Recently, a study analyzed social media narratives from patients and caregivers with Alzheimer’s disease to understand the disease burden. Using this novel method of social listening, narratives were categorized into the ‘Social, Physical, Emotional, Cognitive, and Role Activity (SPEC-R) framework. The study revealed current gaps in data captured by clinical instruments and advanced our understanding of the disease burden. This organizational framework of SPEC-R can be applied to study other diseases such as arthritis, Parkinson’s Disease, and trauma. 

Social Media and Patient Communities

Social media has been around for about eighteen years now. More and more patients are using social media as an outlet. However, the role of social media data and verified patient communities in Real World Data/Real World Evidence is still evolving. Real World Data (RWD) is observational data from real world settings such as electronic health records (EHR), billing or disease registries. Real World Evidence (RWE) is the clinical evidence obtained from the analysis of RWD. RWE can provide valuable information that a controlled setting of a clinical trial may not be able to provide.  

Social media posts from patients include information about disease, symptoms and its progression. These patient or caregiver generated data often narrate their daily life difficulties or solutions. Research community has debated about using social media narratives as RWD. But a more important question is - is it possible to advance our current understanding of various diseases using these social media narratives?    

Social media and Virtual communities

A common misconception is that social media refers to networking sites such as Facebook or LinkedIn. However, social media encompasses several avenues such as wikis, microblogs, photo or video sharing tools, and discussion forums. Instagram or WhatsApp are mobile based interactive platforms, whereas multi-user virtual environments are often found in online gaming. This list of social media continues to grow. 

Patients, caregivers, and even healthcare professionals share information or interact with each other using social media to form virtual communities. Virtual communities can be helpful in overcoming logistical challenges such as location or space to establish communication channels. Researchers can listen to these channels to advance our current understanding of a disease. But since the information shared on social media is voluntary, it may not be complete or it may not address all aspects of a research query. In addition to social listening, health care professionals may need to gently direct conversations to seek responses to their specific research query e.g. how did stay at home order affect patient journey for a particular disease?  

Real world data and patient perspectives

Broadly speaking, patients and caregivers use social media for learning from other patient’s experiences. Research has revealed that patients use social media for emotional support and to seek information about the disease. Some use social media to boost self-esteem, or to gain network support of people in similar situation. Social media posts are used to make a ‘social comparison’ or how bad their disease progression is. Patients or caregivers may also use social media for emotional expression such as release of negative emotions or vent about the illness. Patients have reported a variety of outcomes of using social media. For instance, while some reported enhanced feelings of well-being, improved self-management, others have reported diminished subjective well-being or increased anxiety.  

Social media narratives can be extracted, analyzed, and interpreted like any other form of RWD. Social media narratives give us an insight into patient perspectives and lifestyle. In traditional methods of data collection, the data is collected in a healthcare professional’s office using questionnaires or interviews. (Or during the pandemic, telemedicine gained popularity and questionnaires were administered via the internet). However, these instruments often do not capture all aspects of the real world issues in caregiving or living with the disease. Social media narratives can help us bridge that gap. It allows a clinician to see a patient’s point-of-view in that moment of difficulty and offer care accordingly.     

But is it ethical to treat social media narratives as RWD or is it considered as eavesdropping on social conversations? 

Ethics in healthcare and use of social media

From social media narratives, researchers can study patient experiences and behaviors to better design patient-centered care. But there are ethical considerations that research groups need to follow while using social media for scientific query. The US Food and Drug Administration (FDA) has provided guidelines for data collection on social media. For instance, posts on public forums imply that the person posting is aware that their views are ‘out there’ in the public domain. Additional consent may not be necessary. But on password-protected sites, the user may not intend a public posting and want to maintain some privacy. When it is possible to identify a user, the risk of identification or vulnerability may require obtaining consent. Furthermore, obtaining consent from minors or individuals with cognitive impairment may not be feasible. 

Following research ethics guidelines to protect patient privacy is a core value at Real Life Sciences. Our team of researchers uses social media data and analyzes it using our platform to form meaningful insights. During this process, we ensure that patient privacy is maintained. 

FDA Regulations and Social Media Patient Experience Research

During drug development and later in health care research, researchers study patient experiences and behaviors to better design patient-centered care. Social media narratives i.e. posts made on health forums or social networking sites such as Twitter offer an insight into real-time patient experiences and are considered real-world data (RWD). Most researchers and healthcare professionals made every attempt to follow research ethics or responsible conduct while using social media for research. But a lack of guidelines from regulatory authorities made it difficult to use social media data in an effective manner while maintaining respect for persons, beneficence, and justice in front and center.  

In 2020, the US Food and Drug Administration (USFDA) has published guidelines to include patient experience data in drug development processes. The primary reason to issue these guidelines is to have meaningful patient and caregiver input that can provide valuable insights for medical product development and regulatory decision-making. The FDA has provided guidelines about four different aspects of patient experience-related research methods. Out of these the data collection guideline is relevant to our mission and hence discussed in further detail. 

According to the FDA guidelines, researchers need to be mindful of the locations of data collection (from whom do you get input) and the rationale behind that location (why?). Additionally, they need to be careful about how the information is collected. Along with traditional methods such as interviews, focus groups, surveys, and medical charts, the FDA now acknowledges novel sources like social media along with verified patient communities and digital health technologies as data collection sources. These novel sources can provide data about a patient's day-to-day functioning and quality of life, disease progression, experiences with treatments such as the burden of treatment, patient’s views on different disease outcomes, and impacts of these outcomes.   

The FDA suggests that social media can be used for two main purposes. First, researchers could conduct targeted social media searches using social media tools such as medical community blogs, crowdsourcing, or even social media pages. This might provide valuable information during the early stages of a study to understand the current landscape of the research problem. It can also result in the development of research tools such as qualitative study discussion guides). Another purpose of social media research is to supplement traditional research approaches e.g., interviews, focus groups as mentioned above. Researchers need to be aware of their comparative benefits and limitations.    

When data is collected from verified patient communities and social media, it includes online support groups and online educational groups. From a researcher’s perspective, these groups are useful for gathering information about the disease or current health conditions, information about treatments and experiences of care, and also for recruiting research participants. These groups may include information that identifies patients or other reporters such as caregivers/relatives etc. Researchers must have the authorization to obtain protected health information (PHI) before they collect such data. This can be a limitation as it lengthens the study duration but it is necessary to maintain respect for persons. Another limitation is that since the social media posts are voluntary, the representativeness of the sample can be questioned. In spite of these limitations, social media data can provide ‘robust, meaningful, and sufficiently representative patient input’ and hence is used as RWD for healthcare research.  

An independent survey conducted to study trends in the pharmaceutical industry, in particular opportunities and challenges associated with the use of real-world evidence (RWE) and RWD, revealed that RWE is a strategic priority for pharma companies. These companies intended to focus on evidence related to various topics such as the burden of disease, patient safety monitoring, and comparative effectiveness of treatments. The companies reported benefits of RWD including reduced cost of post-marketing regulatory requirements, label expansion, and recruiting research participants during the pandemic. 

The independent study also reported that a majority of pharma companies have already partnered with other organizations to work on new sources of RWD. The respondents had welcomed the FDA guidelines for further use of RWE and RWD. We too have appreciated the FDA guidelines at Real Life Sciences. Our methods, expertise, and technology to access and analyze new sources of RWD such as social media narratives are in alignment with the FDA guidelines. The RLytics platform can cast a wider net to aggregate data from various disease-specific social media along with general social networking sites such as Twitter or Reddit. The platform is further able to analyze the data using standardized medical terminology, and thus help in answering advanced research questions that cannot be completely answered in a research lab. This opens up exciting new possibilities for research groups who struggle with traditional research methods and want to learn more about the patient journey to focus their efforts on patient centric research. 

Research Ethics In Social Listening

In recent years, clinical researchers have adopted social listening as a technique while conducting social media studies to gain a deeper understanding of diseases and further drug development efforts. Social listening refers to a passive approach where the research team does not provide any prompts or questions but simply ‘listens’ to the narratives posted by people. Data is aggregated from various posts made on social media sites such as verified patient communities or Twitter or disease specific blogs.         

The unsolicited, voluntary social media posts often contain real-time updates of disease progression and patient journeys. Investigating social media data can reveal trends in patients’  knowledge of a certain disease at a certain stage, their thoughts or approach in dealing with the disease, and coping strategies. The information gained from social media data can form valuable insights that help healthcare organizations such as hospitals and pharma companies to create positive patient experiences. Additionally, support for patients and caregivers or group interventions can be delivered via social media. Social media data can even  help public health agencies to develop public health policies. 

Although the benefits of social media research are evident, the risks associated with social listening techniques cannot be ignored. Most research organizations are careful about maintaining patient privacy. But since the standards for privacy were not clearly defined by the regulatory authorities, loss of patient confidentiality or revealing information inadvertently was a real possibility. The scientific community has debated about social media research ethics for over a decade. In 2011, the US Food and Drug Administration (USFDA) had issued guidelines for pharma companies regarding their use of social media but it did not address social listening. In current times, the US FDA themselves have used natural language processing (NLP) technology and social media for pharmacovigilance purposes but no clear guidelines for social listening were available until the recent past. The FDA has now provided further guidelines about social listening.

Medical product information can be gathered from various sources such as animal toxicology trials, pharmacogenomic studies, registries, clinical studies, and product quality reports. The FDA  emphasizes  that social media, in particular social listening techniques, can be a valuable source in gathering information about medical products. Social listening techniques can also be useful for other agencies such as the Center for Veterinary Medicine and the Center for Food Safety and Applied Nutrition. According to the FDA, it is possible that social listening techniques are more efficient than traditional post-marketing surveillance and it has the potential for faster signal detection. The FDA acknowledges that social media data can be used in a variety of studies such as qualitative, quantitative, and mixed methods.    

In its recent Guidance document, the FDA provides three important non-binding recommendations or guidelines. First, it suggests that social media researchers need to include a wide variety of social media sites. Some websites may appeal to a certain demographic of  individuals. Including data from only such sites may introduce a bias in the outcomes of a research study. To improve the generalizability of the study outcomes, many different social media sites need to be included. Next, it notes that researchers may have access to identifying information on verified patient community sites. But when the research topic is sensitive in nature, anonymous posts may be included in the study to protect patient privacy. Lastly, the FDA suggests that researchers need to clearly state limitations of the study when they affect data integrity e.g., a lack of mechanisms to verify the diagnosis.  

 At Real Life Sciences, social listening techniques combined with advanced natural language processing have been used effectively to study diseases of the skeletal system such as ankylosing spondylitis and psoriatic arthritis. Even prior to the issuance of FDA guidelines, respect for persons or patient privacy and confidentiality has been and will always be one of our core values. Our research team stays up-to-date on various guidelines provided by the regulatory authorities such as the FDA. Furthermore, while working with various sponsor organizations, our team ensures that studies are compliant with institutional review board’s (IRB) requirements. Social listening techniques combined with AI technology will allow us to understand diseases better and expedite drug development.