Abstract
Introduction:
Patients with hoarseness of voice, previous neck operation, or suspicion of malignancy are at high risk of having pre-thyroidectomy vocal cord (VCP) palsy. Therefore, vocal cord (VC) functions should be evaluated before surgery. This study aimed to evaluate the accuracy of hoarseness, a voice-related questionnaire (Voice Handicap Index [VHI]-30), and transcutaneous laryngeal ultrasound (TLUSG) in diagnosing VCP, as well as the role of TLUSG in the evaluation of high-risk patients.
Methods:
A total of 1000 patients undergoing thyroidectomy or other endocrine-related neck procedures were prospectively included. Symptoms of hoarseness, the VHI-30 score, and TLUSG were evaluated. Validation laryngoscopies were performed by a separate endoscopist after performing TLUSG. All the assessments were performed one to seven days before surgery. The findings of hoarseness, the VHI-30 score, and TLUSG were correlated with laryngoscopic findings to evaluate the diagnostic accuracy.
Results:
Of 1000 patients, nine preoperative VCP were diagnosed with laryngoscopy. Sensitivity in detecting VCP by hoarseness, the VHI-30 score, and TLUSG were 33.3%, 62.5%, and 88.9%, respectively. A total of 342 patients were considered as high risk, and eight preoperative VCP were confirmed with laryngoscopy. Despite it not being possible to visualize the VCs in 26 (7.7%) patients, TLUSG had a higher accuracy in detecting VCP than the VHI-30 did (96.8% vs. 74.2%; p < 0.001). If patients had been selected who were unassessable or who had had VCP on assessment for confirmatory laryngoscopy, TLUSG saved more patients from laryngoscopic examinations than the VHI-30 did (87.7% vs. 71.3%; p < 0.001). A history of neck operation and suspicion of malignancy did not affect the assessment by TLUSG (p > 0.05).
Conclusion:
TLUSG is a feasible, non-invasive, and sensitive tool in detecting VCP in high-risk patients. It has safely precluded 87.7% high-risk patients from laryngoscopy. TLUSG should be incorporated as a part of the ultrasound examination of the thyroid.
Introduction
T
Flexible trans-nasal laryngoscopy remains the primary modality to examine VC function (7). However, it causes discomfort and apprehension, and thus leads to poor patient compliance (8,9). A non-invasive screening tool with minimal discomfort is a reasonable option before inserting an invasive laryngoscope. A questionnaire has been proposed to screen for VCP or laryngeal pathology before thyroidectomy (10). Transcutaneous laryngeal ultrasound (TLUSG) is another non-invasive and easy-to-learn modality to evaluate the movement of VCs. According to previous studies, 74–100% patients could be assessed by TLUSG (11 –14). The sensitivity in detecting a VCP was excellent and ranged from 93% to 100% (11 –14). However, the majority of these studies focused on either postoperative evaluation (12 –14) or without laryngoscopic validation (11,13). There is limited evidence on the validity of preoperative TLUSG. Therefore, this study evaluated the diagnostic accuracy and feasibility of two non-invasive screening tools: the Voice Handicap Index (VHI)-30 questionnaire and TLUSG in the preoperative VC examination together with laryngoscopic validation. Furthermore, it is unknown whether TLUSG is applicable to patients that are at high risk of having preoperative VCP. Therefore, the role of TLUSG in high-risk patients who needed to be evaluated with a laryngeal examination was also examined.
Patients and Methods
Patients
From June 2012 to September 2015, 1035 consecutive patients undergoing elective thyroidectomy or endocrine-related neck operations were prospectively included. After obtaining informed consent, all underwent standardized voice and VCs assessment one to seven days before thyroidectomy.
Voice complaints and the VHI-30
All patients were first interviewed by a nurse and asked a standard question: “Did you have any hoarseness of voice?” After that, a validated Chinese version of the VHI-30 questionnaire was distributed to them for completion (15). The VHI-30 is a validated 30-item questionnaire that assesses the impact of voice impairment on emotional, physical, and functional areas (16). It has been widely used to determine voice status before and after thyroidectomy (17 –19).
TLUSG and laryngoscopic examination
After the interview and completion of the questionnaire, each patient was directed to a room where TLUSG was performed. All TLUSGs were performed by one surgeon (K.P.W.) using the same portable ultrasound machine (iLookTM 25 Ultrasound System, Sonosite®; SonoSite, Inc., Bothell, WA) and a 5–10 MHz linear transducer (L25). During the assessment, the patient lay flat with the neck slightly extended and arms at the side. An ultrasound transducer was placed transversely over the anterior aspect of the middle portion of the thyroid cartilage. The transducer scanned caudo-cranially until both VCs were visualized. Three sonographic landmarks—false cords, true cords, and arytenoids—were identified whenever possible (Fig. 1) (20). To optimize the images, the depth of the ultrasound image was adjusted to 4 cm for female patients and 5 cm for male patients. The gray scale and frequency of the ultrasound were also adjusted until the false cords became hyperechoic while the true cords became hypoechoic.

A sonographic view of the normal symmetrical vocal cords. FC, false cord; TC, true cord; AR, arytenoid.
From June 2012 to January 2014 (phase 1), passive (i.e., quiet spontaneous breathing) and active (i.e., phonation with a sustained vowel “aa”) maneuvers were performed during TLUSG. From February 2014 to September 2015 (phase 2), a Valsalva maneuver was performed in addition to the passive and active maneuvers (21).
Immediately after TLUSG, each patient was directed to the endoscopic suite for laryngoscopic examination (LE). Flexible trans-nasal laryngoscopy (Olympus BF-P40, Bronchoscope; Olympus®, Tokyo, Japan) was performed by a separate experienced endoscopist. Both the TLUSG assessor and endoscopist were unaware of the quality of voice, VHI-30 scores, and mutual findings. The details of this set-up have been described previously (12).
Definitions
Patients were considered as high risk if they had voice complaints, history of head and neck or mediastinal operation at risk of RLN palsy, or if they were operated for a suspicion of malignancy. Patients were classified as low risk if they did not have any high-risk factors.
The VHI-30 was defined as assessable if patients completed 30 questions. A VHI-30 score >18 was considered as having a significant voice abnormality (16).
The TLUSG was defined as assessable if any one of the VC sonographic landmarks were clearly visualized and motion of VCs could be assessed. Assessability was defined as the proportion of VCs being visualized and assessed. On the other hand, if any sonographic landmarks showed reduced or no movement during TLUSG with passive, active, or Valsalva maneuvers, it was defined as VCP. Normal mobile VCs were reported when both VCs adducted and abducted symmetrically. Similarly, if there was reduced or no VC movement on LE, this was defined as VCP on LE.
To determine diagnostic accuracy, the presence of voice complaints, findings of a VHI-30 score >18, and TLUSG findings were correlated with LE findings. Table 1 shows the definition of true-negative, true-positive, false-negative, and false-positive results.
VHI-30, Voice Handicap Index-30; TLUSG, transcutaneous laryngeal ultrasound; LE, laryngoscopic examinations; VCP, vocal cord palsy; TP, true positive; FP, false positive; FN, false negative; TN, true negative.
Analysis and statistics
All voice complaints, demographic and clinicopathologic factors, score on the VHI-30, and findings of the TLUSG were prospectively collected and filled into a standardized database. The diagnostic accuracy of hoarseness, the VHI-30 score, and TLUSG in diagnosing VCP was evaluated in overall, high-, and low-risk patients.
Statistical analysis was performed using IBM SPSS Statistics for Windows v23.0 (IBM Corp., Armonk, NY). The chi-square test was used to compare sensitivity, specificity, and accuracy. To compare performance in differentiating VCP from normal between the three assessment tools, receiver operating characteristic (ROC) curve and area under curve (AUC) were evaluated. The chi-square test and the Fisher's exact test were used for comparison of dichotomous variables, and the Mann–Whitney U-test was used for comparison of continuous variables. To identify factors affecting assessability of TLUSG, binary logistic regression was performed. p-Values of <0.05 were considered statistically significant.
Results
Over 39 months, 35 patients, including eight high-risk subjects, refused LE and were excluded from the final analysis. Finally, 1000 patients (787 female) with a median age of 53 years were recruited (Table 2). Nine patients were found to have preoperative VCP by LE. Ninety-three patients reported a history of head and neck surgery, and 64 patients reported voice complaints. Two hundred and forty-seven patients were operated for suspicion of malignancy or cytologically confirmed malignancy. In total, 342 patients with a median age of 54.5 years were considered as high risk, and 658 patients were considered as low risk. Eight preoperative VCP were diagnosed by LE in high-risk patients.
VC, vocal cord.
Table 3 shows the characteristics of nine patients with preoperative VCP. Eight patients (I–VIII) were considered as high risk, and one patient (IX) did not have any high-risk factors. Eight out of nine VCP were identified by TLUSG. TLUSG failed to diagnose VCP in patient II whose LE showed reduced VC movement.
Of all patients (n = 1000), 38 (3.8%) failed to complete the VHI-30 questionnaire, and the VCs of 59 (5.9%) patients could not be visualized by TLUSG (Table 4). TLUSG had a higher sensitivity than hoarseness in detecting VCP (88.9% vs. 33.3%; p = 0.0498). Hypothetically, if only patients who were not assessable or had VCP on TLUSG for confirmatory LE (i.e. screening tool) were selected, TLUSG would save 93.6% patients from LE. Similarly, if only patients with hoarseness for confirmatory LE were selected, it would save 91.6% patients from LE, which is comparable to TLUSG (93.9% vs. 91.6%; p = 0.876). However, applying hoarseness as a screening tool missed more VCP than TLUSG (six patients vs. one patient). Even though a Valsalva maneuver was added in phase 2, the sensitivity, accuracy, assessability, and number of patients saved from LE for phase 2 were comparable to phase 1(p > 0.05).
All were validated by direct laryngoscopic examination afterward.
p < 0.001 between VHI-30 and hoarseness and VHI-30 and TLUSG.
p < 0.001 between hoarseness and TLUSG and VHI-30 and TLUSG.
p < 0.001 between hoarseness and TLUSG.
p < 0.05 between hoarseness and TLUSG.
p < 0.001 between VHI-30 and TLUSG.
p < 0.05 between VHI-30 and TLUSG.
PPV, positive predictive value; NPV, negative predictive value; AUC, area under the curve.
For high-risk patients (n = 342), 13 (3.8%) and 26 (7.6%) patients were unassessable by the VHI-30 and TLUSG, respectively. TLUSG had a higher specificity (97.1% vs. 74.8%; p < 0.001), accuracy (96.8% vs. 74.2%; p < 0.001), and ability to differentiate VCP from normal (AUC = 0.923 vs. 0.624) than the VHI-30 had (Fig. 2). TLUSG also had a higher positive predictive value than the VHI-30 had (43.8 vs. 4.7%; p < 0.001). If the VHI-30 and TLUSG were applied as a screening tool, TLUSG saved more patients from LE than the VHI-30 did (87.7% vs. 71.3%; p < 0.001). Similarly, TLUSG had a superior specificity, accuracy, and ability in differentiating VCP and saved more patients from LE than the VHI-30 did for low-risk patients (p < 0.001; Table 4).

A receiver operating characteristic (ROC) curve on assessing vocal cord palsy by the Voice Handicap Index-30 (VHI-30) questionnaire and transcutaneous laryngeal ultrasound (TLUSG) in high-risk patients.
Table 5 shows univariate and multivariate analyses of factors leading to unassessable VCs in high-risk patients. Male sex, higher body mass index, a longer distance from hyoid cartilage to the cricoid cartilage, and a shorter distance from the cricoid cartilage to the sternal notch were associated with unassessable VCs during TLUSG (p < 0.05). The presence of hoarseness, history of head and neck surgery, or operation for suspicion of malignancy did not affect the assessability. On multivariate analysis, male sex was the only independent predictor of unassessable VCs during TLUSG (odds ratio = 11.57 [confidence interval 1.43–93.69], p = 0.022).
OR, odds ratio; CI, confidence interval.
Discussion
To detect VCP before thyroidectomy, hoarseness of voice is neither sensitive nor specific (1,22). In a cohort of 340 patients, up to 32% of patients with VCP did not have any voice complaints (22). This study confirms that the sensitivity of hoarseness in the detection of VCP is low. Six out of nine patients with VCP were asymptomatic. If voice complaints are relied on solely to diagnose VCP or to select patients for laryngoscopy, a large proportion of asymptomatic patients with VCP, such as patient IX, would be overlooked. Similarly, hoarseness is a poor predictor for the presence of VCP. Only 3/64 patients with hoarseness were diagnosed with VCP by LE. Though voice complaints saved more patients from LE compared with the VHI-30 and TLUSG, its inferior sensitivity, predictive value, and power of differentiation (AUC = 0.636) make it an unsuitable tool in selecting patients for LE. Although the VHI-30 had a higher sensitivity and ability to differentiate VCP from normal than hoarseness did, it saved fewer patients from LE (79.1% vs. 93.9%). There are limitations in using the VHI-30 for the identification of VCP. In this study, 38 patients could not complete the questionnaire because of illiteracy. In addition, poor compliance in completing a 30-question questionnaire was another reason of failure. Compared with hoarseness and the VHI-30 score, TLUSG had the highest power in differentiating VCP from normal (AUC = 0.935) and saved >90% of patients from LE. It also had a high sensitivity in diagnosing VCP (88.9%). It did not depend on literacy or patient efforts, and the examination lasted only 0.5–2 minutes (23). As it is non-invasive with minimal discomfort, patient compliance is not a problem.
TLUSG on high-risk patients had comparable assessability, sensitivity, and accuracy compared to TLUSG in the whole cohort of 1000 patients. If only patients who were not assessable or had VCP on TLUSG for confirmatory LE (i.e., screening tool) were selected, TLUSG not only saved a significant proportion of patients from unnecessary LE (91.6%) in the entire cohort, but also about 88% in high-risk patients. Although male sex was the only predicting factor of failure in assessing VCs, the majority of patients having thyroid disease are female. Similar to the present study, about 80% of patients undergoing thyroidectomy are female. Therefore, TLUSG is applicable in the majority of patients who require pre-thyroidectomy assessment. On the other hand, the presence of hoarseness, a history of head and neck surgery, and surgery for suspected malignancy did not affect the assessability. The presence of any high-risk factors did not hinder the use of TLUSG. TLUSG is easy to learn, applicable, and feasible to both Western and Eastern patients (13,20). By performing around seven TLUSG, new assessors master the basic technique of TLUSG, and they are proficient after about 40 TLUSG (23). Incorporating TLUSG as a part of the ultrasound examination of the head and neck region should be recommended, as it is applicable in both low- and high-risk patients. If any patients' VCs are not assessable or they are found to have VCP on TLUSG, LE can be arranged for further assessment (Fig. 3). Theoretically, there is no additional cost if VCs are examined during ultrasonography of the thyroid or the neck region.

Proposed algorithm for selecting patients for laryngoscopic examinations. TLUSG, transcutaneous laryngeal ultrasound; VC, vocal cord.
With improvement in technology of ultrasonography, the assessability and diagnostic accuracy of TLUSG have much improved. Early studies showed that the sensitivity in detecting a VCP with TLUSG was only 66% (24). More recent studies have demonstrated a higher sensitivity ranging from 93.3% to 100% (13,14,25). To optimize the image, the depth, gray scale, and frequency of the ultrasound wave were adjusted. In the authors' experience, a ultrasound wave frequency of around 8 MHz is optimal for visualization of VCs. Since only about 36% of true cords could be visualized and the identification of more sonographic landmarks did not improve the accuracy, the identification of either the false cords or arytenoid is sufficient. (20). Valsalva maneuvers were added from February 2014 in order to evaluate which maneuvers provide higher assessability and diagnostic accuracy. The results have been reported previously (21). A passive maneuver tends to have more false-positive results than active or Valsalva maneuvers (5.4% vs. 2.3% vs. 2.5%, respectively; p = 0.054), while sensitivity and accuracy are comparable. On the other hand, a passive maneuver has a higher ability to differentiate VCP from normal mobile VCs than active or Valsalva maneuvers do (AUC = 0.942 vs. 0.863 vs. 0.893, respectively). Therefore, it is recommended that all three maneuvers should be performed. If VCs show reduced or no movement during any one of the maneuvers, this would be considered as abnormal or VCP.
Despite these improvements, false-negative results (e.g., patient II) were infrequently present. It has been previously reported that one out of seven patients with laryngoscopic VC paresis (decreased movement) is falsely diagnosed as normal by TLUSG. Carneiro-Pla et al. reported that one out of three patients with laryngoscopic VC paresis is falsely diagnosed as normal (12,25). In contrast, all patients with VC paralysis (no movement) were correctly diagnosed by TLUSG. This reflects that TLUSG is inferior in diagnosing VC paresis (decreased VC movement) but excellent in diagnosing VC paralysis (no movement). Due to the low incidence of VCP, the actual rate of false-negative results remains unclear (26). Clinical impact, proper treatment, and prognosis of patients with false-negative results are poorly studied at the moment. It is unclear whether the outcome of those patients with VC paresis missed by TLUSG is different from patients with VC paresis diagnosed by TLUSG. It is uncertain whether these false-negative findings represent another entity of VCP that is less significant. Further study focusing on patients with false-negative results would be worthwhile.
Some drawbacks in this study are recognized. As all the TLUSG were performed by one single assessor, it remains to be demonstrated whether satisfactory results can be reproduced by other assessors. From a previous study on eight surgical residents without prior ultrasound experience, high assessability (92.5–98.9%) and accuracy (88.0–98.6%) were achieved after short formal training (23). It is believed that this result could be reproduced by other assessors after training and adequate experience. To assess the diagnostic accuracy of preoperative TLUSG, this study is underpowered. However, to the authors' knowledge, this study is currently the largest reported series on preoperative TLUSG with laryngoscopic validation. Even though 1000 patients were included in this study, there were only nine preoperative VCP. As preoperative VCP is very rare, conducting a study with sufficient power on an assessment tool is very difficult. Finally, the VCs of about 6% of patients could not be assessed by TLUSG. There is a need to develop a technique that permits assessability to be maximized, especially for male patients. Woo et al. reported 100% assessability of the VCs of 82 male patients when the ultrasound probe was placed over the lateral lamina of the thyroid cartilage (14). While this sounds promising, the reproducibility of these results needs to be confirmed.
Conclusion
Compared with hoarseness and a validated questionnaire, TLUSG is more sensitive and accurate in detecting preoperative VCP. The presence of any high-risk factors did not affect the assessability of TLUSG. The VCs of 92.4% of the studied patients could be evaluated by TLUSG, and a high sensitivity of 87.5% was achieved. More than 87% of patients could be saved from a LE if TLUSG was performed as a first line preoperative examination of the VCs. TLUSG should be incorporated as a part of the ultrasound examination of the head and neck region.
Footnotes
Acknowledgments
This study was presented as a poster presentation at the 37th Annual Meeting of the American Association of Endocrine Surgeons 2016 in Baltimore, Maryland.
Author Disclosure Statement
Nothing to disclose.
