A novel method leveraging time series data to improve subphenotyping and application in critically ill patients with COVID-19.

Link: https://doi.org/S0933-3657(23)00264-6
Authors: Oh, Wonsuk; Jayaraman, Pushkala; Tandon, Pranai; Chaddha, Udit S; Kovatch, Patricia; Charney, Alexander W; Glicksberg, Benjamin S; Nadkarni, Girish N

Abstract: Computational subphenotyping, a data-driven approach to understanding disease subtypes, is a prominent topic in medical research. Numerous ongoing studies are dedicated to developing advanced computational subphenotyping methods for cross-sectional data. However, the potential of time-series data has been underexplored until now. Here, we propose a Multivariate Levenshtein Distance (MLD) that can account for address correlation in multiple discrete features over time-series data. Our algorithm has two distinct components: it integrates an optimal threshold score to enhance the sensitivity in discriminating between pairs of instances, and the MLD itself. We have applied the proposed distance metrics on the k-means clustering algorithm to derive temporal subphenotypes from time-series data of biomarkers and treatment administrations from 1039 critically ill patients with COVID-19 and compare its effectiveness to standard methods. In conclusion, the Multivariate Levenshtein Distance metric is a novel method to quantify the distance from multiple discrete features over time-series data and demonstrates superior clustering performance among competing time-series distance metrics.

Leave a Comment