Abstract
Background and Objectives:
Photoplethysmography (PPG) sensors have been increasingly used for remote patient monitoring, especially during the COVID-19 pandemic, for the management of chronic diseases and neurological disorders. There is an urgent need to evaluate the accuracy of these devices. This scoping review considers the latest applications of wearable PPG sensors with a focus on studies that used wearable PPG sensors to monitor various health parameters. The primary objective is to report the accuracy of the PPG sensors in both real-world and clinical settings.
Methods:
This scoping review was conducted in accordance with Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA). Studies were identified by querying the Medline, Embase, IEEE, and CINAHL databases. The goal was to capture eligible studies that used PPG sensors to monitor various health parameters for populations with a minimum of 30 participants, with at least some of the population having relevant health issues. A total of 2,996 articles were screened and 28 are included in this review.
Results:
The health parameters and disorders identified and investigated in this study include heart rate and heart rate variability, atrial fibrillation, blood pressure (BP), obstructive sleep apnea, blood glucose, heart failure, and respiratory rate. An overview of the algorithms used, and their limitations is provided.
Conclusion:
Some of the barriers identified in evaluating the accuracy of multiple types of wearable devices include the absence of reporting standard accuracy metrics and a general paucity of studies with large subject size in real-world settings, especially for parameters such as BP.
Introduction
Awearable device is an electronic instrument with wireless communication abilities that can be worn on the human body. 1 Concepts for modern wearable devices have existed since the 1990s, including items such as the Active Badge, the first wearable tracking device, and the mBracelet, a smartwatch for completing financial transactions. 1 From 2007 to 2015, the release of devices such as the Fitbit, Samsung smartwatch, and Apple watch has increased the attention of wearable devices to health and fitness monitoring. 1
As technology continues to improve and the number of wearable devices on the market grows, the concept of using wearable devices for clinical health monitoring has become increasingly popular. Recent long-term health trends, such as an increase in obesity 2 and an increase in the elderly population, 3 have put an additional burden on health care systems around the world. Using technology to assist in monitoring and treating patients could help significantly reduce this growing burden. 4,5
The COVID-19 pandemic has further underscored the need for wearable technology in health care. During the pandemic, patients are often unable or unwilling to leave their homes, hindering them from much needed clinical visits. Wearable technology can enable remote health monitoring, where a health care provider can monitor a patient's health parameters such as heart rate (HR), temperature, or breathing rate from a distance and thus can still provide effective care. Several studies have shown the effectiveness of using remote health monitoring during the pandemic. 6 –8 Golinelli et al. 8 conducted a systematic review describing how the use of digital technologies in general has assisted with diagnosis, surveillance, and prevention. In particular, Tsamis et al. 6 demonstrated that wearable sensors could effectively be used to monitor motor fluctuations for people with Parkinson's disease during the pandemic with disrupted access to health care.
This scoping review aims to further the development of wearable technologies by providing an overview of wearable photoplethysmography (PPG) sensors for the purposes of remote health monitoring. PPG sensors work by utilizing light to penetrate the human skin and detect changes in blood flow. Most modern PPG sensors use a green LED because they possess the best signal-to-noise ratio against motion artifacts. 9
The scope of the review focuses on PPG sensors developed within the 5 years 2017–2021 to reflect the rapid increase in technological advancements during this time. Recognizing the role of PPG-based wearable devices in monitoring various health parameters, this review provides an overview of the accuracy and limitations of the current research and suggests directions for the future development of algorithms toward clinical-grade and large-scale applications (apps).
A thorough summary of the articles that use PPG technology for monitoring various health parameters in wearable devices, followed by a concise report of accuracy metrics and limitations for each health parameter, is provided as supplementary material. To the best of the authors' knowledge, this scoping review is one of the first wearable device reviews to report accuracy metrics for such a large scope of health parameters.
Methods
This scoping review was conducted in accordance with PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses). Studies were identified by querying the Medline, Embase, IEEE, and CINAHL databases for articles published in the period 2017–2021. The goal was to capture eligible studies that used PPG sensors to monitor various health parameters for populations with a minimum of 30 participants, with at least some of the population having relevant health issues, that is, health issues that either corresponded to the measured health parameter or affected the health parameter in a direct way. The primary objective was to obtain studies that met the above criteria and provided some form of accuracy metric for evaluation within this review.
The following search terms were used in combination in all the databases: (ECG OR PPG OR sensor OR actigraphy OR gyroscope OR accelerometer OR temperature) AND (wearable device) AND (estimation OR measurement OR tracking OR assessment OR quantification OR quantifying OR evaluation). Relevant Medical Subject Headings were included where appropriate, and searches were limited to title, abstract, and keywords. The eligibility criteria were as follows.
- The trials could be in clinical or real-world environments.
- The population had to be a minimum of 30 participants, at least some of whom had health issues.
- The wearable device must have used PPG-based technologies, although additional peripheral sensors such as accelerometry and electrocardiogram (ECG) were permitted.
- The study had to report some form of accuracy metric for the PPG device.
- The study had to be published between January 2017 and April 2021.
Studies were excluded based on the following criteria: - They were review articles. - They only included healthy populations. - They did not report any accuracy metrics. - They were not published in English.
Five reviewers (S.K., J.L., M.N., C.G., and M.H.C.) independently screened the selected articles by their titles and abstracts and evaluated whether a given study met the inclusion or exclusion criteria. If the information from an abstract was unclear, the full text was retrieved. At all steps of the screening process, the decision to include or exclude an article was made by the five reviewers, with S.K. serving as the final arbiter. The extracted data from all the studies were tabulated and reviewed multiple times for their consistency and accuracy. The extracted data included demographic information such as the total number of participants, average age of the population, medical condition of the participants, and the number of the participants associated with a given condition, the type of health parameters or disorders, the study type (clinical settings vs. real-world settings), the wearable type, methodology, and accuracy metrics used in each article.
It is important to note that not all health parameters measurable by PPG are included in this review because some parameters did not have any studies that met the inclusion criteria. One such important parameter is oxygen saturation (SpO2). Of the search results, 112 studies dealt with the accuracy of SpO2, but they were ultimately excluded due to having a sample size of <30. Although SpO2 is arguably an important health parameter, when reviewing the capabilities of PPG, the authors chose to remain with a 30-participant cutoff because the goal of the review was not simply to assess the accuracy of PPG but also to assess the accuracy of PPG in the context of moving the technology toward a real-world setting.
Results
Of the 2,996 articles screened, 28 articles are included in this review, with 5 real-world studies and 23 clinical studies (Fig. 1). Among the 28 articles that fulfilled the eligibility criteria, 21 articles are journal articles, and the rest are abstracts (5) or posters (2). The health parameters appearing in the included articles are HR/heart rate variability (HRV), atrial fibrillation (AF), blood pressure (BP), obstructive sleep apnea (OSA), blood glucose (BG), and respiratory rate (RR). Six articles focused on HR/HRV, with two articles analyzing real-world settings; seven articles focused on AF, with three articles analyzing real-world settings; six articles focused on BP, with all of six analyzing clinical settings; four articles focused on OSA, will all of four analyzing clinical settings; three articles focused on BG, with all of three analyzing clinical settings, one article focused on heart failure (HF) and one article focused on RR, both in clinical settings.

PRISMA flowchart of the identification and selection of studies. PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses.
Each of the above articles reported at least one of the following metrics: accuracy, specificity, sensitivity, correlation coefficient (r), mean error, mean absolute error (MAE), or mean squared error.
Data extracted for the results of this review include publication information (authors, publication date, and journal name) as well as specific details such as the subject size, average age, and medical condition of the subjects; wearable type and device features, methodology and accuracy metric(s) used; and a categorized summary of the limitations for the identified health parameters. Table 1 contains sample size and demographic information for each study. Table 2 provides an overview of the wearable devices and their accuracies. Table 3 contains a summary of the limitations found within the studies.
Basic Characteristics of the Included Studies
The table summarizes the focus of the health parameter, study type, total participants, average age of the population, medical condition of the participants, and the number of subjects associated with that condition.
AF, atrial fibrillation; AHI, apnea–hypopnea index; AWS, Apple watch standby mode; AWW, Apple watch workout mode; BG, blood glucose; BP, blood pressure; FBT, Fitbit wristband; HR, heart rate; HRV, heart rate variability; ICU, intensive care unit; N/R, not reported; RR, respiratory rate.
Categorical Summary of the Articles Included in This Review That Use Photoplethysmography Signals on Their Own or in Combination with Other Sensors to Monitor Various Health Parameters or Disorders
The table summarizes the study type, wearable type, methodology, and accuracy for each study.
ρ, Spearman's rank correlation coefficient; app, application; bpm, beat per minute; DBP, diastolic blood pressure; ECG, electrocardiogram; MAP, mean arterial pressure; N/R, not reported; PPG, photoplethysmography; PSG, polysomnography; r, correlation coefficient; SBP, systolic blood pressure; SpO2, oxygen saturation; SPPB, Short Physical Performance Battery (measures of physical functioning); STD, standard deviation; SVT, supraventricular tachycardia.
Limitations Associated with Each Health Parameter Among the Included Studies
BMI, body mass index.
STUDY OVERVIEWS
HR and HRV estimation using PPG
HR, HRV, coronary heart disease, and HF are some of the most prevalent disorders being studied through PPG. 10,11 Six articles included in this review investigate the usability of PPG technology in determining HR and HRV. All articles use ECG as their gold standard in determining accuracy.
In the first study, Hwang et al. 12 assessed the accuracy of three different devices (Table 2) for HR monitoring. The HR was measured both at its baseline and during an induced episode of supraventricular tachycardia (SVT). The results showed that all the wearable wristbands in this study were able to measure baseline HR and the induced SVT HR with an accuracy of 94% and 87%, respectively. Although the accuracy of these devices was acceptable, it was observed that with increasing HR, this accuracy tended to decrease in all three devices.
In a similar study, Pelizzo et al. 13 applied a personal fitness tracker (PFT) PPG device for HR measurement in children (Table 1) during elective surgeries such as laparoscopy and open surgery. The HR values obtained from PFT correlated with continuous ECG and peripheral capillary oxygen saturation right (SpO2R) reading with r = 0.99 and r = 0.99, respectively. However, the PFT performance decreased when HR was >140 beat per minute (bpm).
To highlight the importance of validating health apps, Vandenberk et al. 14 studied the required factors for validating different HR apps. Beat-to-beat intervals of the FibriCheck HR app were compared against synchronized ECG recordings using RR intervals (RRI) and peak-to-peak intervals (PPI). They reported a positive correlation between the PPI from FibriCheck and the RRI from ECG (Spearman's rank correlation coefficient ρ = 0.993, root mean square deviation = 23.04 ms, and normalized root mean square error = 0.012). There were also no significant differences between these different intervals (p = 0.92). It was concluded that beat-to-beat simultaneous comparative measurement analysis of an HR app and an ECG system is the best method of evaluating accuracy for HR apps.
A series of other studies such as the studies by Antink et al. 15 and Koshy et al. 16 focused on studying HR and HRV and the fundamental differences between PPG and ECG signals. Antink et al. introduced a new algorithm, called the continuous local interval estimator (CLIE), that analyzes a signal's entire waveform. They concluded that it is a robust and accurate algorithm for PPG-derived HRV with a lowest relative root mean square error of 1.16%. In another study, Koshy et al. investigated the utility of PPG for HRV estimation using PPG signals from a smartwatch. It was illustrated that although the smartwatch obtained reasonable accuracies for HR in normal settings, it did not perform as well as ECG for HR in cases where the patient had moderate-to-severe cardiac arrhythmias.
Graham et al. 17 investigated whether PPG-derived HRV metrics are related to the physical functions of older individuals. After collecting HRV data through an Empatica E4 wrist-worn PPG device and quantifying them with the HRV triangular index (HRV TI), they examined the bivariate correlation between these data and validated clinical measures of physical function in a cohort of older adults. Clinical measures of physical functions used in the study included the Short Physical Performance Battery (SPPB), Timed Up and Go (TUG), and Medical Outcomes Study Short Form 36 (SF-36). The researchers found significant associations between HRV TI and SPPB performance (n = 52; Spearman ρ = 0.41 and p = 0.003), TUG (n = 51; ρ = −0.40, and p = 0.004), and SF-36 physical composite scores (n = 49; ρ = 0.37, and p = 0.009).
AF detection using PPG
AF is the most common cardiac arrhythmia. It is characterized by disorganized atrial activity that creates an abnormal cardiac rhythm and leaves patients with a fivefold increased risk of stroke. Thus, early AF detection plays an important role in effective prevention of primary and secondary stroke. 18 However, early detection of AF remains challenging due to asymptomatic or rarely occurring episodes. The advancement in PPG technologies as a noninvasive alternative technique could offer a potential solution to this problem, allowing AF detection to be carried out using different types of wearable devices. 19
Seven of the articles included in this review use PPG technologies for detecting and screening of AF. All the studies collectively used ECG as their ground-truth method; however, the type of ECGs used varied. AF is defined on ECG as an irregular rhythm with the absence of P-waves that lasts 30 s or more. 20
To accurately detect AF using PPG signals, it is imperative that subjects remain stationary during recording. This is currently the main limitation of PPG for AF detection using smartwatches. Three of the studies in the AF category 21 –23 focused on developing methods and algorithms that can overcome this limitation. In the studies, 21,23 two types of algorithms, motion and noise artifact detection algorithm versus supervised machine learning (ML) techniques, were proposed for quality assessment of PPG signals. These algorithms help ensure that irregular signals due to AF are not misidentified as poor-quality signals. By analyzing PPG signals collected through smartwatches, an overall accuracy of 90.75% and 97.54% was achieved for both studies, respectively (Table 2). Mutke et al. 22 make use of two separate algorithms combined to achieve an accuracy of 96.1%. Three of the studies in this category 24 –26 assessed the precision and accuracy of commercial devices during AF events.
The remaining study by Sološenko et al. 27 developed a robust PPG-based AF detector for use in a low-power wearable device, which does not rely on signal quality assessment using accelerometer information. The authors argued that using accelerometers on AF detecting devices does not always result in eliminating motion artifacts. For example, certain activities, such as finger movements, are not detectable with accelerometers but could cause deformation of internal tissues that worsens the signal quality. On the contrary, arm movements that do not always cause motion artifacts and could contain segments with good signal quality are always excluded when only relying on the accelerometer information. This work used the PPG signals for both quality assessment and AF detection. The AF detector scored an accuracy of 87.0% on the clinical database.
Although the results of these studies demonstrate that wearable devices have the capacity to be used for both detection and screening of AF, further research is necessary to overcome the limitations of these studies such as the small sample size, types of applied algorithms, motion and noise artifacts, and poor signal quality (Table 3).
BP monitoring using PPG
Chronic hypertension is one of the major risk factors for the development of cardiovascular disease (CVD) that can be detected early by monitoring of BP. 28 Cuff-based devices are widely used for this purpose; however, more than three in ten home BP monitors are inaccurate, 29 and continuous measurement is not possible. Thus, wearable devices and apps for accurate and continuous BP measurement are gaining tremendous attention.
The similarity between PPG and arterial BP morphologies in the frequency and time domains has encouraged researchers to validate the potential use of PPG to estimate BP. In a study by Eckstein et al., 30 the reliability and suitability of a proposed algorithm in estimating systolic blood pressure (SBP) were tested using the PPG signal of a smartphone camera. The results showed fair estimation of SBP for normotensive and grade I hypertensive participants; however, it was not suitable in case of hypertensive participants. An artificial neural network was used for the same purpose in a study by Wang et al. 31 First, the multitaper method was used for feature extraction, and subsequently, an artificial neural network was utilized for BP estimation. The authors claim a better accuracy in comparison with previous works and report the MAE of 4.02 ± 2.79 mmHg for SBP and 2.27 ± 1.82 mmHg for diastolic blood pressure (DBP).
A hybrid deep neural network, incorporating the temporal convolutional neural network (CNN) and long short-term memory (LSTM) layers, was used by Baker et al. 32 for estimation of SBP, DBP, and mean arterial pressure (MAP) from raw PPG and ECG signals. Time-dependent variables such as pulse transit time were avoided in this method. For SBP, DBP, and MAP, this model achieved “A” and “Pass” grades from the British Hypertension Society protocol and the Association for the Advancement of Medical Instrumentation (AAMI) standard, respectively. Low MAEs of 4.41 mmHg for SBP, 2.91 mmHg for DBP, and 2.77 mmHg for MAP were reported. The results showed that the proposed CNN-LSTM model is comparable with the sphygmomanometer.
To further develop a single PPG-based cuffless BP estimation algorithm with acceptable accuracy, Khalid et al. 33 compared three ML algorithms, including regression tree, support vector machine (SVM), and multiple linear regression (MLR), for this purpose. A major part of this study was the analysis of the estimation accuracy of the three algorithms among the normotensive, hypertensive, and hypotensive participants. Among the applied algorithms, only the regression tree was reported to be successful in the normotensive category because it is less sensitive to outliers. In terms of the estimation accuracy in each of the BP categories, it achieved the ISO standard for SBP (−1.1 ± 5.7 mmHg) and DBP (−0.03 ± 5.6 mmHg). Martínez et al. 28 examined the similarity and coherency of arterial BP and PPG waveforms. Full frequency directed transfer function and direct directed transfer function and significant coherence for coherence and partial coherence (p < 0.01) indicated that this method can distinguish between normotensive and hypertensive subjects using PPG signals. For all groups (Table 1), r > 0.9 was obtained, indicating a strong morphology similarity.
The authors combined all synchrony measures and reported 87.5% accuracy in detecting hypertension using a neural network classifier, suggesting that PPG can be used to measure BP. One of the articles in this category 34 evaluated the performance of a commercial smartwatch, the CareUp® smartwatch, against a sphygmomanometer. Two PPG waveforms were recorded with the watch: one from the back sensor in contact with the wrist skin and another one by putting the index finger of the opposite hand on the front oximeter sensor. The linear regression algorithm embedded in the watch is based on the time delay that it takes for the blood volume to travel from the heart to peripheral organs, referred to as pulse transit time, and can be estimated from two PPG sensors.
According to the p-values, the measurements obtained from the reference and CareUp shared the same median. They concluded that CareUp is easy to use, portable, and according to the accuracy metrics (Table 2), a suitable candidate for at-home continuous BP monitoring.
OSA monitoring using PPG
OSA is a condition characterized by periods of breathing cessation (apnea) and periods of reduced breathing effort (hypopnea) during sleep. This condition leads to a deficiency in arterial oxygen and in the long term can result in sleep-related issues and CVDs. Typically, the condition is diagnosed by evaluating a patient's apnea–hypopnea index (AHI) or the average number of oxygen desaturation events (oxygen desaturation index [ODI]) per hour of sleep, with a diagnosis of moderate-to-severe sleep apnea occurring when AHI/ODI >15. Overnight sleep study or polysomnogram is the gold standard for OSA diagnosis; however, this method is expensive, involves an overnight hospital stay, and must be done by a registered polysomnography (PSG) technologist (RPSGT). Digital wearable devices, equipped with high-quality sensors, can be utilized to help with the continuous monitoring and detection of OSA. 35,36 However, these devices still require clinical validation before they can be used widely in real-world app.
Four articles in this review investigated the viability of using PPG to detect sleep apnea. Steinberg et al. 37 recruited patients with suspected OSA to study the correlation between PSG and a PPG wearable device (Table 2). HR, HRV, SpO2, and RR were compared between PSG and the wearable device. Analysis of the data showed root mean square errors of 1.5 ± 0.7 bpm, 23 ± 10 ms, 2.9% ± 1.4%, and 3.4 ± 0.7 breaths per minute, respectively. Consistency was observed between SpO2 and HRV variations with apnea events. Minimum nocturnal SpO2 estimations from the wrist also correlated with the finger clip pulse oximetry measurements (coefficient of determination [R 2 ] = 0.9, p < 0.05).
Stevens et al. 38 recruited patients to undergo an overnight PSG monitoring observed by an RPSGT equipped with a Garmin 5X+ watch. The accuracy of the data was evaluated with linear regression analysis on ODI between PSG and algorithm output. The AHI of the cohort was 8.0 (2.6–25.7). When SpO2 data from the PSG were used as input, the R 2 in the ODI was 0.99 between the RPSGT and the automatic detector scoring. However, when correlated with the clinical AHI, the R 2 values of the RPSGT-scored ODI and automatic detector ODI were 0.95 and 0.84, respectively. The risk assessment of the watch presents an accuracy of 89% with a sensitivity and specificity of 96% and 98%, respectively, against the PSG AHI of 15 or above.
Thus, this wearable device is shown to differentiate patients with OSA with high specificity and accuracy. In a similar study, Chiu et al. 39 designed an automated OSA detection algorithm based on an ML technique that was embedded in a watch. PPG signals were collected from subjects with an AHI 15 or above. The AHI of the cohort was 10.1 ± 18.3 (0–82.7). The average sensitivity and precision were reported as 77.2% and 58.6%, respectively, with a Cohen's kappa of 0.46. The R 2 value between the watch and RPSGT was 0.81 (95% confidence interval: 0.61–0.91). The results showed that this wrist-worn device could be used to detect OSA with reasonable accuracy.
The study by Yeh et al. 40 utilized the Belun Ring Platform (BRP) as a PPG-based at-home OSA testing system with a proprietary deep learning algorithm to predict OSA. The results showed a good correlation between the ring-respiratory-event index and PSG-AHI (r = 0.83, p < 0.001). In case of AHI above 15, the accuracy, sensitivity, and specificity were 0.808, 0.931, and 0.735, respectively. In subjects taking HR-affecting medications, the results indicated that the sensitivity and specificity of BRP were not affected appreciably in prediction of OSA (p = 0.16 and 0.44, respectively).
BG monitoring using PPG
Diabetes mellitus is a condition that affects millions of individuals worldwide and has become a serious health concern during the last few decades. 41 This incurable chronic disease is characterized by an imbalance in the glucose level of the body, and as such, the best form of control of the disease is through constant glycemic monitoring. Currently available glycemic monitoring devices are generally invasive.
PPG can monitor BG by estimating glucose concentration in body fluids, such as saliva, urine, sweat, or tears 42 ; it can also monitor BG through its relationship to blood viscosity. 43 Alternatively, hypoglycemic episodes can be detected through PPG sensors using HRV. 44,45 Three of the articles included in this review investigate developing a noninvasive glucose monitoring device using PPG.
Zilberstein et al. 46 and Rodin et al. 47 followed the first approach and developed chemochromic sensors that use the level of metabolites in the sweat for glucose monitoring. The sweat components of the body change based on the changes in the BG concentration, and these sensors can measure the changes in optical characteristics of the light caused by the user's sweat component. Two different algorithms were used in these studies (neural network algorithms and a proprietary algorithm) to transform the identified changes into BG concentration (Table 2). Although different gold standards were used to evaluate the accuracy of their proposed PPG sensors, the Accu-Chek Aviva glucose monitor versus the YSI 2300 STAT Plus Glucose and L-Lactate Laboratory Bioanalyzer (YSI 2300), both studies resulted in a good overall reliability and achieved a correlation coefficient of >0.89 (Table 2).
Lee and Lee 35 followed the second approach, using a wrist-worn device. To increase the chance of acquiring a clean signal and a versatile PPG wavelength, they designed a device with a uniform linear sensor array. The device is equipped with a centralized state sensing algorithm that provides a better estimation of different health parameters and works for any nonlinear estimation task that uses multiple sensors. They evaluated the performance of their ML algorithm for noninvasive BG estimation using data collected from another PPG device and achieved an overall accuracy of 84.29% based on the ISO: 15197:2013.
HF monitoring/detection using PPG
HF is a condition in which the heart can no longer function to meet the needs of a body. It is often difficult to diagnose through physical examination. Shah et al. 48 proposed a method for diagnosing HF using wearable devices. In their study, data are taken from a total of 97 patients, 54 with previously diagnosed HF and the remainder having been admitted to hospital for various heart-related problems such as pulmonary disease. Data were collected using a smartwatch equipped with both PPG and accelerometry in 5-min intervals where the patients were at rest, followed by 1-min intervals where the patients performed the Valsalva maneuver. Each patient was evaluated for HF by a clinician, and this evaluation was considered the ground truth for this study.
To analyze the PPG signals as well as other factors, the authors used an SVM. The first analysis used only the PPG data and achieved a sensitivity and specificity of 90% and 54%, respectively. The second analysis used multiple features, including PPG signal, sample entropy, standard deviation of beat-to-beat intervals and accelerometer amplitude, HR, SBP, creatinine, and history of hypertension, and achieved sensitivity and specificity of 90% and 72%, respectively. The results of this study indicate that addition of features, including those surrounding medical history, can increase the accuracy of diagnosis using wearable devices.
RR monitoring using PPG
Continuous monitoring of RR enables physicians to detect abnormalities and apply proper treatments. Currently, the smart fusion (SF) algorithm is the standard method to estimate RR from PPG; however, this algorithm can fail if individual respiratory-induced variations are not prominent. Zhang et al. 49 develop a new algorithm that integrates covariance intersection fusion (CIF) to extract RR from the PPG signal independent of the quality of the estimates. The algorithm offers clinical required accuracy and high retention time. The gold standard RR for this study came from manually annotated data from capnography. The median root mean square error was 1.4 breaths/min for the CIF and 1.8 breaths/min for the SF. Using CIF led to a considerable increase in the retention rate distribution of all recordings from 0.46 to 0.90 (p < 0.001).
The agreement with the gold standard RR was high, with a Pearson's correlation coefficient of 0.94, a bias of 0.3 breaths/min, and limits of agreement of −4.6 and 5.2 breaths/min (Table 2). The proposed algorithm likely performed better than SF because it did not treat the estimates independently, but rather, the potential correlation and redundancy among estimates were considered. In addition, the CIF approach offers real-time monitoring. Because the authors focused on the accuracy and retention rate, the results showed that CIF could be of high interest for the estimation of RR from PPG using a wearable device.
Discussion
During the COVID-19 pandemic, the need for accurate home diagnostic tools to measure vital signs such as temperature, BP, and RR has never been greater, especially in case of home quarantine. 50 CVDs are the leading cause of death globally, and the number of patients continues to increase. This article provides a thorough overview of the current research on the use of PPG in wearable devices; it also unveils many challenges for comparing and analyzing the accuracies of various parameters used by the devices (Table 3).
One of the limitations common across all health parameters is the inconsistency of accuracy metrics reported. The use of different accuracy metrics makes it difficult to quantitatively compare the accuracy of devices and extract useful information such as whether devices are more accurate in settings other than in the real world, whether a given algorithm is more accurate than others, whether PPG provides more accurate health parameters in one field than another, whether the PPG wearables with peripheral sensors performed significantly better than counterparts without peripheral sensors, and whether the location of where a PPG device is worn affects the accuracy.
A second common limitation is the lack of a gold standard to provide ground-truth data against which to evaluate accuracy. For example, Lee et al. 35 took the ground-truth BG levels to be those provided by the industry partner to evaluate the accuracy of their device; however, they did not describe how the levels were obtained. Zilberstein et al. 46 used an Accu-Chek Aviva glucose monitor as their gold standard, whereas Rodin et al. 47 used a YSI 2300 STAT Plus Glucose and L-Lactate Laboratory Bioanalyzer. Similar situations are also found for BP where arterial lines and sphygmomanometers are used as references. For example, Baker et al. 32 used intra-arterial BP measurement as their gold standard, whereas Lazazzera et al. 34 used a sphygmomanometer. There can be discrepancies between the BP measurements from arterial lines and sphygmomanometers.
A comparison of noninvasive and intra-arterial BP measurements 51 shows that noninvasive methods underestimated intra-arterial BP measurements. Arterial lines may measure pressure more peripherally (or in different locations, femoral vs. radial), and one must contend with the effects of hydrostatic pressure differences from the heart to the wrist or pulse pressure amplification as the arterial tree gets smaller. Similar to the reporting of different accuracy metrics, use of differing gold standards makes it impossible to correctly and quantitatively compare different devices measuring the same parameter because one cannot account for differences in the gold standards. The analyses considered for HR, AF, and sleep apnea did not suffer from this limitation because all the articles listed for both the HR and AF parameters used ECG as their gold standard, and all the articles for sleep apnea used PSG as their gold standard.
However, even though the HR studies collectively used ECG as their gold standard, there are certain physiological differences between PPG and ECG that some studies failed to consider. For example, Hwang et al. 12 reported that although their device was accurate for normal HR, it declined in accuracy for a higher HR. This deviation is a known phenomenon when comparing PPG with ECG and may be due to differing physiologies between the signals. 52 Occasionally, spikes in the ECG signals (i.e., ECG QRS complexes) occur extremely close together during certain cardiac arrhythmias—including significantly increased HR—such that the ventricle does not entirely fill between the signals. This results in only one pulse arriving at the peripheral PPG signal; that is, the second QRS signal is “missed” by PPG.
However, this does not necessarily mean that PPG is less accurate than ECG; rather, PPG may provide slightly different information about the efficiency of a patient's HR with respect to heartbeats reaching their peripherals. In short, when discussing the accuracy of PPG compared with ground-truth devices, it is essential to consider any physiological differences between the signals, and additional information from those differences could be extracted for the use of remote health monitoring.
When considering variance in the accuracy for the above or below normal measurements, it is also important to consider the preprocessing and filtering of the device. Many PPG wearables utilize a band pass filter to eliminate noisy signals. Therefore, signals approaching the cutoff for those filters may be distorted, and the bounds of the band pass cutoff may need to be adjusted.
Other limitations within the study make it difficult to evaluate how the accuracy of PPG may perform in a real-world setting. One such limitation is that most studies contained a relatively small sample size. To meet the inclusion/exclusion criteria for this review, studies had to contain at least 30 participants; that number alone excluded hundreds of studies, including studies about additional important PPG health parameters, such as SpO2. Furthermore, a majority of the final included studies still have fewer than 100 participants (Table 1). When compared with the number of patients who could potentially be wearing a commercially deployed medical PPG device, this number is relatively small. As such, these studies fail to provide a comprehensive picture of how a device may perform when exposed to levels of diversity, such as varying age, skin color, or competency with electronic devices.
Many of these participants were also in a clinical setting, which greatly reduces real-world factors such as misuse of the device or consistency of wearing the device. These two factors expose a major gap in the research because it appears that there is no thorough study of the accuracy of using PPG devices for medical monitoring on a large scale in the real world.
Another such limitation is determining the usability of the devices for patients with health complications, that is, the patients who would most benefit from the device. One of the health parameters in which this is most prevalent is BP. The two articles 28,30 showed that although the device was accurate for healthy participants and those with mild hypertension, it was significantly less accurate for those with severe hypertension. The remaining BP studies were heavily biased toward measuring normal BP with low variability. Baker et al. 32 removed BP measurements below a 20 mmHg threshold and above a 60 mmHg threshold for arterial pulse, biasing the signals to represent normal BP. Khalid et al. 33 removed a majority of hypertensive and hypotensive signals, due to them being noisy. Finally, Lazazzera et al. 34 did not include hypertensive patients.
Furthermore, measurements from prior studies were taken when patients were at rest, meaning there was little variation in the BP measurements. Already, this only guarantees accuracy of the BP wearable device for a niche scenario in which a patient has normal BP that does not vary, and this limitation worsens when considering that some studies also did not report whether the training data consisted of multiple training segments taken from the same patient. If this is the case, the ML algorithms are likely overfitted: accurate for the handful of patients used to test the data, but generally inaccurate for any other measurement. Aside from BP, a handful of other studies did not include a sufficient number of patients with health complications 32,34,39,40,47 and thus did not report or could not draw meaningful conclusions about the usability of a PPG wearable device for patients with related health complications.
The reporting of additional medical information on the participants in the study is yet another limitation that makes determining accuracy of PPG in the real world challenging. Many studies neglected to report the demographic information of the participants such as age, height, body mass index (BMI), race, and sex. 12,13,15 –17,22,24,25,31 –34,37 –40 It is difficult to discern whether any of these factors will affect the accuracy of the device when used in a real-world setting. There are areas, particularly in those studies that utilized ML algorithms, in which including such attributes into the model could help improve the accuracy. An example of one such study that makes use of these additional attributes is a study by Shah et al. 48 However, this study is subject to a potential bias.
It can be seen from Table 2 that the introduction of these attributes greatly increases the accuracy and the specificity of the model at a high-sensitivity cut point (90%). Due to the small sample size, these demographics may be adding information that is uniquely identifying the participants, potentially allowing large networks to simply “memorize” and predict the same output every time it sees the same combination of demographics. This can be challenging especially if the training set has little variation in the data collected through PPG devices for each participant. One potential approach to evaluate this is to investigate whether the model can match the correct BP when it changes significantly for a given participant where demographics stay constant, but the PPG signals change.
Finally, there are many studies that do not consider motion artifacts. 12,13,15 –17,22,24,26,33 –35,37,40,46,47 Motion artifacts are distortions in the PPG signal caused by movement; such motion causes the sensor to shift positions and thus creates noise in the signal. Failure to account for these artifacts results in a decrease in accuracy in real-world settings (in fact, an AF study already reported PPG signals of poor quality due to motion artifacts 26 ) due to the fact that patients in the real world will be moving and not at rest.
Some studies, such as the previously mentioned studies by Baker et al., 32 Khalid et al., 33 and Lazazzera et al., 34 collected data only at rest and used those readings to determine the accuracy, hence neglecting motion artifacts. The studies by Khalid et al. 33 and Graham et al. 17 go as far to separate artifact-free PPG signals by hand. This, of course, is not feasible for a commercial wearable because signals cannot be separated manually in real time.
Aside from difficulties for comparing the accuracy of devices and their accuracy in the real world, there are also issues with the general usability of the devices in real-world apps. First, devices will need a long battery life to last for continuous monitoring in a real-world setting. 22 Second, devices must be able to securely communicate across the internet to provide health parameter data to doctors as well as patients in real time. 22 Although many of the algorithms in the reviewed articles are promising in terms of accuracy, none goes so far as to offer a complete design to connect patients to their health care providers.
Finally, to be used in large-scale remote health care monitoring apps, it is a great asset for devices to be FDA approved. Most studies did not mention FDA approval, and it is uncertain how many devices in this review are FDA approved or undergoing the approval process. Although determining the FDA approval status is outside the scope of this review, more information about FDA-approved PPG wearable devices can be found at the FDA-k database.
Conclusion
This scoping review article surveyed current research on the app of PPG wearables in health care. Several databases were searched, and the resulting articles were sorted into categories based on the identified health parameters.
Wearable devices utilizing PPG demonstrate acceptable performance in both detecting and monitoring various health parameters. In addition to the newly developed sensors designed for telemonitoring of health parameters, it was shown that PPG signals collected through commercially available devices can also be utilized for monitoring health parameters. However, further research is required to fill in some of the gaps exposed by this review, such as identifying a gold standard method for each health parameter, conducting studies with a much larger population, improving the usability of the devices in real-time and in real-world settings, and providing a more customized service to the users of these devices based on their biological factors such as age, height, BMI, race, and sex.
Already, the research into PPG in wearable devices for health care has begun to show the promise of continuous monitoring and early detection of disorders. For instance, further exploring the usage of PPG for continuous monitoring of BG can be a ground-breaking achievement, potentially benefiting millions of diabetes mellitus type I and II patients. By moving forward into the next stages of research for PPG in remote health monitoring, we come closer to the possibility of improving the quality of health care across the globe.
Footnotes
Authors' Contributions
S.K.: Drafted the Methods, AF Detection Using PPG, and BG Monitoring Using PPG sections and led the first draft of the article. J.L.: Drafted the Introduction, HF Monitoring/Detection Using PPG, Discussion, and Conclusion sections and tabulated the initial results. M.N.: Drafted the BP Monitoring Using PPG, OSA Monitoring Using PPG, and RR Monitoring Using PPG sections. C.G.: Led the database search, contributed to screening the abstracts, and assisted with drafting the abstract. M.H.C.: Drafted the
sections. S.K., J.L., M.N., and M.H.: Contributed to screening the abstracts and full-text articles and assisted with completing the Tables. S.S.-A.: Provided the concept of the article's content and provided direction and feedback. R.J.S.: Oversaw the review and provided direction, feedback, and finalized the article. All authors reviewed and approved the final version for publication.
Disclosure Statement
The authors declare that there are no competing interests.
Funding Information
R.J.S. gratefully acknowledges funding from the Natural Sciences and Engineering Research Council of Canada through its Discovery Grant Program (RGPN-2020-04467), Mitacs (IT27060, IT27564), and Refresh Enterprises, Inc. The funders had no other direct role in the review.
