Abstract

We read with great interest the study by Choi et al., 1 which makes a timely and novel contribution by incorporating residential greenness, quantified as the Normalized Difference Vegetation Index (NDVI), into machine learning models predicting 30-day pneumonia readmissions. Integrating satellite-derived environmental data to complement often incomplete electronic health records represents a conceptually important advance, and the authors demonstrated that NDVI survived a rigorous three-step feature selection pipeline (comprising Pearson correlation filtering, variance inflation factor reduction, and backward elimination) and was included among the final 21 predictors across a cohort of 22,600 patients. Building on this promising foundation, we suggest that assessing the performance gain due to NDVI integration through ablation analysis would be a natural and valuable extension of the work. This would provide a more direct estimate of incremental benefit gained from incorporating NDVI. While the authors’ permutation importance analysis provides a useful post-hoc evaluation of the relative importance of NDVI in the model’s decision-making process, it serves a different purpose. Permutation importance measures the extent to which a model already trained with NDVI depends on that feature when generating predictions; thus, it is complementary to, but cannot substitute for, an evaluation of whether including NDVI during model training improves overall performance. As formalized by Hooker et al., retraining-based approaches are more appropriate than permute-and-predict methods for evaluating whether and how much a feature truly improves predictive performance. 2 Ablation analysis has also been successfully applied in comparable clinical prediction settings, including in-hospital stroke mortality prediction using MIMIC-IV, where removal of vitals and clinical assessment features led to the largest performance decline in model performance among all feature categories. 3
There is also a great opportunity to further strengthen the equity dimension of this work. NDVI may be considered not only a predictive feature but also a structural proxy for social determinants of health, given that residential greenness is unequally distributed across socioeconomic contexts. 4 In addition, Rigolon et al.’s systematic review found that NDVI-based measures were associated with stronger protective health effects among low-income and minority populations who may have limited access to other health-promoting resources, 5 potentially overlapping with groups whose risk profiles are less completely captured by standard EHR features. Future work examining whether NDVI integration differentially improves model performance across sociodemographic subgroups would be a valuable extension. In particular, studies have shown that readmission and health-risk prediction models can exhibit substantial performance disparities across insurance and racial subgroups in different settings.6,7 More specifically, subgroup-specific ablation analysis could help determine whether NDVI preferentially improves predictive performance for marginalized cohorts whose standard EHR data may be relatively sparse. Assessing whether NDVI integration reduces algorithmic performance disparities between privileged and vulnerable groups would provide important evidence as to whether it functions not only as a meaningful predictor but as an equity-promoting feature. Such effects could be quantified using group fairness metrics, e.g., equalized odd ratio. Choi et al. have laid important groundwork in this still underexplored area, and these extensions represent promising directions for future research.
