Improve Contextual Awareness using Social Network Data


Public health activities within the chronic disease realm have predominantly relied on survey data to gather information on disease prevalence, behavioral models, risk populations, risk probability, and disease progression. Surveys are subject to a number of known limitations, e.g., respondents’ reluctance to participate, social desirability biases, lag time between questionnaire design, data collection and availability, and intermittent coverage of important topics due to associated implementation costs.

Chronic disease control experts and policy makers lack access to real time data and efficient tools to provide contextual awareness to the surveys that are implemented for chronic disease surveillance and program management. The implications of not having a timely and broader understanding of the environment/community affects the representativeness and demographic specificity of the assessment and the data used to drive policy and interventions.

This proposal seeks to develop an analytics platform that can be leveraged by both public health and clinical care to build a cohort around a given chronic indicator (e.g., Tobacco use) by harnessing web and social network data (e.g., Twitter, Facebook, Search data etc.). This national cohort can be utilized to provide specific insights both longitudinally and prospectively to help investigators reveal largely assumption-free insights via systematic generation of hundreds of possible outcomes rather than an arbitrary priority selection of a few outcomes. The approach can also potentially support traditional surveillance by serving as a guiding tool for vetting the inclusion and exclusion of survey questions.

Project Goal

CDC seeks to support the development of an analytics platform that harnesses web and social network data and delivers novel surveillance capabilities for chronic disease indicators. The proposal seeks to build large nationally representative cohorts of social network users for each indicator by key characteristics (e.g., demographics, activity, etc.) that are systematically inferred from user profiles, tweets, posts, and search behaviors. The project will employ

appropriate informatics tools and techniques to extract and infer traits among the data and allow the creation of cohorts that are reflective of regional U.S. Census estimates. These cohorts can then be analyzed to gain insights and answer a diverse set of questions for national, subnational, and demographic-specific prevalence estimates. Further analysis could help identify co-occurring themes and potentially answer the questions “How many” and “Why?” for any given indicator.

Phase I Activities and Expected Deliverables

• Conduct a review of the data access and use policy of Twitter, Facebook and Search engine data

• Conduct a preliminary study to determine applicable social network data streams and public health indicators

• Identify appropriate informatics solutions (e.g., natural language processing algorithms) to access, monitor, and extract data

• Develop a prototype analytics platform with “Cohort builder“ function and demonstrate the creation of least one nationally representative cohort in the chronic disease domain


The overall goal is to leverage innovative health technologies to improve health outcomes and subsequently quality of life for individuals living with chronic disease. An analytics platform using social data can more efficiently provide deeper insights into health behaviors as they are occurring and improve policy development as well as delivery of interventions. By harnessing the data produced by social events and interventions, programs can be evaluated as they are implemented, hypothetically generating real-time feedback to maximize effectiveness. Web and social network data can be an important source for identifying new hypotheses and can greatly impact the future direction and investments of the center. The cohort builder and the cohort analysis capabilities will provide benefits to chronic disease surveillance and program management practices. Access to social behavior data in real time will help to drive:

• a contextual awareness to the survey, i.e., know your population

• development/modification of survey questions to improve survey data quality

• ability to monitor changes associated with program interventions between surveys

Commercialization Potential

The analytics platform can immediately operate on a subscription based revenue model from public health and clinical care. Any organization can diversify to support other healthcare initiatives (e.g., Community Health Needs Assessment, etc.) as revenue domains. Information technology companies, government, health systems, health information exchange entities, health care providers, and public health systems are a few of the potential markets.

