Research

My research has been focused on the formulation, implementation, and advancement of statistical methodologies that are applicable across diverse fields. Specifically, my work has been devoted to developing and applying statistical methods specially designed for survival analysis, longitudinal studies, joint analysis of longitudinal and survival data, and handling of high-dimensional data.

This broad scope includes creating statistical tools that enable predictive inference from complex clustered or longitudinal data, the statistical modeling of length-biased survival outcomes and - more recently - developing techniques for variable selection, predictive inference, and multiple testing within high-dimensional datasets.

Statistical methods for high-dimensional data. My current research has largely focused on methods to analyze data with many more predictor variables than observations. The statistical issues that arise in these analyses are simultaneously estimating the impact of a large number of predictors variables (i.e., p>n), and controlling the global error rate of the study. A focal point is interpretable machine learning models for continuous outcomes, notably sparse linear regression with high-dimensional predictors. While this is a field that has had much development over the last decade, our framework is the first option to do so with uninformative prior on the regression coefficients. Using an uninformative prior on the regression coefficients means that standard methods of performing the E-step are not possible. As a result, we take an approach that is motivated by the popular two-group approach to multiple testing, with empirical Bayes estimates of the hyperparameters. The resulting method is a computationally efficient and powerful Bayesian approach to sparse high-dimensional linear regression, with superior performance to other candidate approaches.

Applications to neuroimaging data. Much of my recent methods development has been motivated by applied projects that use MRI data from patients with chronic left-hemisphere stroke and aphasia. A common goal of these studies is to use high-resolution T2-scans that indicate stroke affected areas, to map some domain of cognition to specific regions of the brain. This can provide theoretical insights regarding brain function and can also inform clinical treatment. The outcome of interest is commonly the subjects’ Aphasia Quotient (AQ), a score quantifying language impairment vital to understanding patients’ treatment options. However, collecting AQ is a cumbersome task, particularly for patients who have recently had a stroke. As a result, our methods have aimed to develop models that can predict subjects’ unknown AQ based on image data. In this research, we have been able to the quantify uncertainty in predictions from high-dimensional sprase regression models.

Modeling and Prediction with Longitudinal/Clustered Data. My pre-tenure methodological research largely focused on joint modeling of longitudinal and survival outcomes with the application to large longitudinal cohorts. This research involved complex longitudinal models that correct for informative dropout resulting from a survival outcome. A sub-theme of this research was predicting longitudinal or survival outcomes based on observed data. From 2017 onwards, my research interests have evolved towards modeling and predictive inference for data featuring spatial-temporal dependence structures. My methodological papers in this arena predominantly involve the development of flexible longitudinal, clustered, multilevel, and/or spatial models capable of accommodating non-standard data characteristics. The emphasis lies on interpretability, valid confidence and prediction interval estimation, and adaptability to data distributions.

Applications to child health. I’ve been long interested in applying statistical methodologies to the realm of child health. I have utilized various small-area estimation methods to estimate the prevalence of obesity and mental health disorders, including ADHD and ASD, in children under the age of 18. These analysis used spatial random effects model with post stratification, and we provided methods for robust measures of uncertainty quantification and multiple testing. In other research, I have led efforts into rigorous examination of physical activity, sedentary behavior, and weight status among infants and toddlers. These projects underscore my commitment to employing robust statistical approaches to tackle pressing public health issues, thereby contributing to the broader understanding and improvement of child health outcomes.

Length Biased Survival Analysis. My research in length-biased survival analysis focuses on the development of methods for current duration data sampling, with a focus on time-to-pregnancy (TTP) and infertility. This approach is used in the National Survey of Family Growth (NSFG), where women report the duration of their ongoing pregnancy attempts. Such data are length-biased and right-censored, as longer attempts are more likely to be in progress at survey time, and complete TTP values are not available. The NSFG’s inclusion of fertility treatments adds complexity, as these alter the natural pregnancy attempt distribution and can bias TTP distribution estimates if ignored.

I currently serve as an associate editor at Statistics in Medicine and am a Statistical Consultant (reviewer) for the American Journal of Obstetrics and Gynecology.