Industry

Child Welfare · Public Health

Client

UTHealth School of Public Health

Linking CPS, Medicaid, and Home Visiting Data for Child Safety Analytics

Engineering data pipelines to connect fragmented child welfare systems

For the Safe Babies initiative, I worked on building a predictive model that mapped the risk of child maltreatment across Texas. Using Stata, Python, and Tableau, I linked massive datasets from Child Protective Services (CPS), Medicaid claims, and home visiting programs. Performed cleaning, matching, and modeling them to estimate community-level risk at both ZIP-code and county levels. I helped design the dashboard now live publically on the maltreatment risk page, visualizing how factors like poverty, maternal health, and school enrollment combine to shape risk patterns. This interactive map allows state and local partners to identify at-risk communities and target prevention resources more effectively by translating data into actionable policy. The clean, integrated dataset enabled quasi-experimental analyses and predictive modeling to identify risk factors and evaluate intervention effectiveness.

Technical Implementation & Impact

The project combined advanced data engineering with statistical modeling. I built hierarchical matching algorithms in Python and Stata, implementing fuzzy matching on names, dates, and identifiers while handling data quality issues like missing values and inconsistent formatting. After linkage, I applied LASSO regression and Random Forest models to predict child safety outcomes. The integrated dataset and analytical findings are now being used to guide state-level program interventions and policy decisions. Due to the sensitive nature of the data, code and detailed results remain confidential, but the methodology established a framework for future child welfare data integration projects.