Building predictive models to identify risk factors for post-acute COVID complications
For my Master's thesis at UTHealth, I investigated how we could predict long-COVID outcomes in patients with diabetes. Using a comprehensive dataset, I applied multiple modeling approaches—logistic regression, LASSO regularization, and ensemble machine learning methods—to identify which clinical and demographic factors were most predictive of prolonged COVID symptoms.
The project required careful feature engineering, cross-validation techniques, and rigorous model comparison using ROC curves and performance metrics. This work demonstrated how statistical rigor combined with clinical understanding could produce actionable insights for patient risk stratification.
Key Outcomes & Technical Approach
The analysis revealed several significant predictors of long-COVID risk among diabetic patients, with machine learning models achieving strong discriminative performance (AUC > 0.80). I compared multiple modeling approaches including standard logistic regression, LASSO for feature selection, and ensemble methods like Random Forest.
All analysis was conducted in R, with reproducible code and comprehensive documentation. The findings contributed to understanding post-acute COVID complications in high-risk populations and demonstrated the value of applying multiple analytical approaches to clinical research questions.