Xin Chen's Dissertation Defense: Integrative Big Data Analytics for Public Health Studies

Date of Event

Location: BMI Conference room,  HSC L3-045

Time: noon-2pm

Candidate: Xin Chen

Program: Biomedical Informatics

Advisor: Dr. Fusheng Wang

Dissertation title: Integrative Big Data Analytics for Public Health Studies


Public health studies are moving towards Public Health 2.0 and Precision Public Health, driven by the availability of big data at large scales and finer resolutions, including electronic health records (EHR), user generated data, and consolidated data from multiple data sources. This offers a compelling opportunity to enhance public health studies with improved accuracy for timely disease detection and prevention.  Our goal is to explore integrative big data analytics through developing informatics methods and applying them to multiple public health studies, including opioid epidemic, cancer epidemiology, marijuana, and school safety and violence.

We demonstrate the power of big EHR data for opioid epidemic research with spatial-temporal analysis of large scale patient visits data from New York State Statewide Planning and Research Cooperative System (SPARCS). The study demonstrates major disparities among population groups and geospatial regions, and evolving patterns of opioid epidemic in New York State.

Social media provides a constant stream of information from a large population, which is easily accessible for public health studies. We take advantage of social media data to understand health topics, in particular, marijuana. We perform content analysis to reveal the discussed topics and the sentiment of users, representing the major opinions of the public. 

We perform integrative analysis of social media data and geospatial data by developing spatial querying and probability based method for annotating geographic entities with social media data. In particular, we use Twitter tweets to generate additional knowledge about schools in New York City to understand school safety and violence.

By integrating health data at high spatial resolution with social-economic and environment data, we can have better understanding of disease patterns with much improved accuracy. This is demonstrated by our work on cancer epidemiology in New York State.  

Our research builds a big data analytics driven, evidence based approach for improving public health studies with richer knowledge and higher accuracy for disease detection and prevention.