Abstract

Dear Editor:
We thank the readers for their interest in our article “Unsupervised Machine Learning Reveals Novel Traumatic Brain Injury Patient Phenotypes with Distinct Acute Injury Profiles and Long-Term Outcomes.” 1 Regarding the first comment on the group counts in Figure 4, this appears to be a mistake in the legend. We thank the readers for bringing this to our attention.
To address the readers' second comment about the inclusion of multiple coagulation measurements in the final feature set used to generate cluster assignments, we would like to emphasize that our approach utilized automated feature selection methods. Here, the final feature set is the smallest feature set that optimizes a given loss function. Although international normalized ratio (INR) measurements are derived from prothrombin time (PT), our approach specifically did not aim to exclude any features based on a priori domain expertise, but rather leave feature selection as a machine learning optimization problem. If INR and PT demonstrated multi-collinearity, it may be expected that one of them would be dropped during the automated feature selection process. Additionally, the necessity of each feature to the final feature set used to generate the cluster assignment was assessed via permutation testing as detailed in Figure 2. Using two separate metrics, Jaccard similarity coefficient and the pairwise similarity index, PT and INR were independently considered necessary for the cluster assignment (i.e., when either feature was “knocked down,” the remaining feature set was not able to reproduce a similar clustering assignment). Finally, the cluster assignment outcomes using these features were reproduced in the TRACK-TBI external dataset, demonstrating generalizability. Therefore, we do not believe multi-collinearity is a major concern in interpreting the results based on the methods utilized.
