Publication: Machine Learning Boosts Precision in Social Determinants of Health Analysis

Area-level social determinants of health (SDOH) based on patients’ ZIP codes or census tracts have been commonly used in research instead of individual SDOHs. However, using area-level SDOH measures as a substitute for individual SDOH measures may not be appropriate, especially in highly diverse urban neighbourhoods like New York City.

On February 8, 2024, a study conducted by AIMS researchers was published in PLOS ONE. This study, which analyzed 20,805 patients from the BioMe biobank in New York City, revealed that the concordance between individual educational attainment obtained from surveys and ZIP code-level education derived from the American Community Survey—matched for the participant’s gender and race/ethnicity—was only 47%.

This study developed a machine learning (ML) model to predict educational attainment, and the concordance significantly improved to 67%. Subsequently, three additional distinct models were developed for predicting 5-year cardiovascular hospitalization. Educational attainment was imputed into the models as either survey-derived, ZIP code-derived, or ML-predicted educational attainment. As expected, the model utilizing survey-derived education achieved the highest performance. Interestingly, the model incorporating our ML-predicted education outperformed the model relying on ZIP code-derived education (AUROC 0.75 versus 0.72; p<0.001). The findings suggest that implementing ML techniques can improve the accuracy of SDOH data and consequently increase the predictive performance of outcome models.

Receiver-operating characteristic and precision-recall curves for predicting cardiovascular disease hospitalization of each model

Article Source: Comparison of predicting cardiovascular disease hospitalization using individual, ZIP code-derived, and machine learning model-predicted educational attainment in New York City
Takkavatakarn K, Dai Y, Hsun Wen H, Kauffman J, Charney A, et al. (2024) Comparison of predicting cardiovascular disease hospitalization using individual, ZIP code-derived, and machine learning model-predicted educational attainment in New York City. PLOS ONE 19(2): e0297919. https://doi.org/10.1371/journal.pone.0297919

Leave a Comment