Abstract
Background:
The majority of autistic adults experience social communication differences that may lead to frustrations during the job application process, even among those with strong work-related skills or histories. Compared with traditional lecture or role-play training, virtual reality (VR) technology generates more realistic and immersive simulated environments, which is crucial for autistic adults who often have difficulties envisioning real-world situations. The study developed a Cognitive Behavioral Therapy (CBT) VR Job Interview and Workplace Communication Training (CBT-VR-JIWCT) program, considering the features of autistic adults, and examined its feasibility and preliminary efficacy.
Methods:
The eight-session program contained various VR scenarios delivered via head-mounted displays as a highly immersive training environment for practicing job interviewing and communication skills. We examined the feasibility and efficacy of the program through the interviewer-rated Mandarin Chinese Version of the Adapted Mock Interview Rating Scale (adapted MIRS-Chinese), speech analysis, and content analysis. Eighty autistic adults were randomized into the intervention group (n = 40) and the control group (n = 40). Among them, 30 from the intervention group eventually completed the training program, whereas 36 from the control group completed pre- and post-tests without intervention.
Results:
Overall, autistic adults reported minimal simulator sickness symptoms. The attendance rate was high. Compared with the controls, the intervention group had significantly better interview performance, as rated by significant group-by-time interactions on the total adapted MIRS-Chinese scores (p < .001), speech volume (p = .006), as well as a nonsignificant larger group-by-time interaction observed on the frequency of achievement words (p = .056).
Conclusion:
CBT-VR-JIWCT is feasible and effective in enhancing job interview skills among autistic adults. Speech and content analyses demonstrated value in objectively assessing improvements in interview performance, such as word choice and presentation styles, within this population. Overall, this study underscores the potential of VR technology as a promising tool for enhancing employment opportunities among autistic adults.
Community Brief
Why is this an important issue?
Autistic adults frequently face challenges in the context of job applications that primarily stem from inadequate interviewing skills and social communication difficulties. The recurrently experienced frustration can foster conditioned anxiety, diminish self-confidence, and curtail motivation to seek employment opportunities, leading to high unemployment rates within the autistic population.
What was the purpose of this study?
This study developed a unique virtual reality (VR) training program. It integrated a highly immersive VR environment, generated through a head-mounted display, with cognitive behavioral therapy (CBT)-based coaching lessons tailored for autistic adults. This study examined both the feasibility of a using VR technology with autistic adults and the preliminary efficacy of this innovative program.
What did we do?
We recruited 80 autistic adults. We then randomly assigned them to the VR group (n = 40) who underwent our VR training program or the control group (n = 40) who did not receive the training. The VR training program included coaching on job interviewing and communication skills for various workplace situations based on VR simulated environment. We collected the adverse effects when they used the VR platform. To evaluate the preliminary efficacy of the VR training program, we assessed their interview performance on the rating scale, as well as content and prosodic features by speech analysis methods, to compare the performance before and after the training.
What were the results of the study?
Overall, autistic adults reported minimal discomfort when viewing VR materials. The attendance rate for the VR training program was good. Compared with the controls, the VR group demonstrated significantly improved interview performance. Moreover, they exhibited a greater increase in speech volume, total word count, total phrase count, and the use of achievement-related words compared with the control group.
What do these findings add to what was already known?
Previous studies have demonstrated the effectiveness of using computer screens for job interview training; however, empirical evidence on using highly immersive VR equipment in job training remains limited. Our randomized controlled trial further confirmed the efficacy of using head-mounted displays (HMDs) in enhancing job interview skills. Also, in addition to traditional assessment on rating scales, speech and content analyses may help decompose objective improvement in interview performance, including word choosing and ways of presentation.
What are the potential weaknesses in the study?
The relatively small sample size may potentially limit the study's ability to detect small-effect differences after training. Additionally, the study assessed outcomes in a laboratory environment, which may not capture the complexities and nuances of real-life job interviews and employment scenarios.
How will these findings help autistic adults now or in the future?
The results suggest that our VR training program is both feasible and effective in enhancing job interview skills among autistic adults, demonstrating its potential as a valuable tool for addressing the unique challenges faced by this population in the employment context. With the incorporation of innovative technology, our program successfully engaged and motivated unemployed autistic adults who had previously given up searching for jobs. This study paves the way for continued advancements in technology-driven interventions and support services for autistic adults in the job market.
Background
Autistic individuals often demonstrate distinct variations in verbal and nonverbal communication, social perception, and social styles compared with their non-autistic peers. These differences can impede effective communication with coworkers, hinder integration into workplace culture,1,2 and lead to bias and misunderstandings from non-autistic colleagues.3,4 Moreover, the challenges in social interaction may pose a significant obstacle during traditional job interviews, which prioritize communicative and social skills over the specific skills required for a given position. 4 Furthermore, anxiety during interviewing may pose further difficulties, 5 possibly further hindering interview performance of autistic adults. The above mentioned is reflected in higher rates of unemployment among autistic adults compared with their non-autistic peers. 6 Therefore, it is crucial to identify effective strategies for enhancing occupational communication and social skills in this population.
However, providing job training for autistic adults poses significant challenges. First, past workplace experiences of being treated unkindly or misunderstood may lead to reluctance to attempt real-world exposure. Second, many workplace scenarios, such as noisy and crowded environments or public speaking, are difficult to simulate effectively through traditional one-on-one role-play. Third, the preliminary efficacy of interventions in a simulated environment may not readily transfer to real-world settings. In response to these challenges, recent evidence strongly supports the integration of innovative virtual reality (VR) technology as a means to improve learning and living skills in autistic adults.7–9 VR technology creates artificial environments that closely resemble real-world scenarios, providing realistic visual and auditory sensations to the users. This allows autistic adults to easily envision themselves in various situations and repeatedly practice relevant skills without the fear of negative consequences in the real world.
Several studies have examined the feasibility and efficacy of VR systems in improving skills in autistic individuals. While most of these studies have focused on children and adolescents, they have addressed various areas, including emotion recognition,10–12 social skills,13,14 and communication skills.13,15,16 Some studies have also specifically examined the use of VR technology for job interview training (JIT).17–19 One noteworthy study by Smith et al. 20 examined a unique computerized job interview simulator system called the Virtual Reality Job Interview Training (VR-JIT). This system allows trainees to watch video clips of the interviewer, answer questions, and receive immediate feedback within a virtual platform. The study found that the VR-JIT improved job interview skills and subsequent employment outcomes in autistic adults.20,21 To enhance the applicability to autistic youths and assess the effectiveness in community settings, Smith et al. developed a modified version called Virtual Interview Training for Transition Age Youth (VIT-TAY). The VIT-TAY includes three difficulty levels and focuses on 10 essential job interviewing skills based on existing literature. 22 In a randomized controlled trial (RCT), 48 transition-age autistic youth received school-based pre-employment services along with additional VIT-TAY training, whereas 23 transition-age autistic youth received services as usual. The results demonstrated significant improvement in job interview skills and increased job access among those who received VIT-TAY, highlighting the potential benefit of VR technology combined with JIT. 23 A recent study (Adiani et al. 2022) demonstrated a closed-loop adaptive VR-based JIT platform named Career Interview Readiness in VR. 24 This program combined a real-time physiology-based stress detection module and gaze detection module to provide individualized adaptation and showed initial feasibility among autistic users. Overall, these studies support the potential of VR technology as a valuable tool for improving various skills, including job interview skills, in autistic adults.
It is important to acknowledge that the studies mentioned earlier, which explored the use of VR systems for JIT, primarily used technologies such as computer screens or tablets. In contrast, head-mounted displays (HMDs) offer users a more immersive and realistic simulated environment.25,26 HMDs provide a heightened sense of immersion by excluding external visual stimuli and directing the user’s attention solely toward the simulated scenarios. This characteristic of HMDs minimizes the disparity between the artificial training environment and the real-world settings, thereby enhancing the transfer of learned skills from the simulated environment to the actual job interview settings. Pioneering studies have demonstrated the feasibility and efficacy of HMD-assisted VR equipment for social skill training in autism.25,27 On this basis, we specifically focused on the application of HMD technology in JIT.
The present study aimed to develop a training program, Cognitive Behavioral Therapy VR Job Interview and Workplace Communication Training (CBT-VR-JIWCT) program, that incorporates both CBT coaching and a virtual training environment displayed through HMDs. We conducted a RCT to evaluate the feasibility and preliminary efficacy of this program in improving job interview skills among autistic adults. We hypothesized that participants who received CBT-VR-JIWCT program would demonstrate better interview performance and verbal expression than those in the control group. Moreover, previous studies evaluated training effects predominantly by interviewer-rated job interview performance. We take a step further, aiming to evaluate the preliminary efficacy of both interviewer-rated job interview performance and objective assessment, such as speech analysis, content analysis, and coherence analysis, in order to provide a more comprehensive evaluation of the intervention effect.
Methods
Participants
We recruited autistic adults from the Adult Autism clinics in the Department of Psychiatry, National Taiwan University Hospital. All participants met the diagnostic criteria of Autism Spectrum Disorder (ASD) of the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5). The inclusion criteria were: (1) Young adult aged between 18 and 45 who had a diagnosis of ASD by a licensed psychiatrist; (2) having a full-scale Intelligence Quotient (IQ) >70 on the Wechsler Adult Intelligence Scale, Fourth Edition (WAIS-IV); (3) and scored ≧26 on the Autism Spectrum Quotient (AQ), indicating a significant level of autistic traits. Participants with the following conditions were excluded: (1) having a history of major psychiatric disorders (e.g., bipolar disorder, schizophrenia, or other psychotic disorders) or major neurological diseases and (2) having sensory sensitivities that would preclude participation in CBT-VR-JIWCT program.
We stratified a total of 80 participants by sex and randomized them into two groups: the “CBT-VR-JIWCT” group (VR group, n = 40) or the control group (n = 40) (Fig. 1). Six participants in the intervention group refused VR practice due to uncomfortable side effects (an overview of the side effects is given in “Simulator Sickness Questionnaire” later). Three of them agreed to reallocate into the control group. This relocation may lead to underestimating the severity of the side effect in the VR group, but was expected to have limited influence on the main outcome. The VR group completed the eight-session training; the control group did not receive any specific coaching on job-related skills. Both groups underwent a pre-test and a post-test. Throughout the training period, both groups attended the Adult Autism Clinics regularly every 1–3 months. During these visits, the psychiatrist spent approximately 10–20 minutes discussing coping strategies for stressful life events. All participants reported not receiving professional training outside the study during the study period. All participants voluntarily engaged in the training without payment. In the VR group, we reminded the participants weekly to attend training sessions and offered flexibility on the training schedule concerning their availability. In the VR group (n = 34), 30 completed the intervention and post-test, whereas in the control group (n = 43), 36 completed the post-test, resulting in a final sample of 30 (males n = 23, 77%) vs. 36 (males n = 30, 83%) for analysis. The mean ages of each group were 26.43 (standard deviation [SD] 5.79) and 28.75 (SD 5.84), respectively (Table 1). The two groups did not statistically differ in age, sex, and the levels of autistic characteristics.

Flow diagram of study design. IQ, Intelligence Quotient; AQ, Autism Spectrum Quotient; SRS, Social Responsiveness Scale; EQ, Empathy Quotient; VR, virtual reality.
The Demographic Data of the Study Sample
VR, virtual reality; SD, standard deviation; IQ, Intelligence Quotient.
CBT-VR-JIWCT
The training program consists of eight 60-minute sessions, using an HTC VIVE HMD to provide immersive VR scenarios for practicing job interviewing and communication skills in various workplace scenarios, such as briefings and handling disagreements. Each theme was practiced at basic and advanced difficulty levels, with the basic scenario more supportive and the advanced scenario containing harsh comments and doubts to train participants to respond calmly and appropriately.
Session 1 focused on teaching participants about anxiety, providing relaxation techniques to cope with it, and using the Anxiety Rating Scale to quantify anxiety levels (0–10). 28 In the subsequent sessions (sessions 2–7), participants underwent relaxation training before VR exposure, and they practiced the same VR scenario twice, with specific coaching in between. The training program addressed various key aspects necessary for a successful job interview. 22 These included portraying oneself as a diligent, dependable, and cooperative team player who demonstrates professionalism and negotiating skills, maintains a positive and honest demeanor, expresses genuine interest in the position, and establishes overall rapport with the interviewer. During the briefing scenarios (sessions 4–5), participants acquired skills for delivering concise presentations to an audience of 50 people in a conference room. They received coaching on speech organization and presentation skills aimed at capturing the audience’s attention. This involved avoiding overly technical or detailed language, providing specific examples, summarizing key presentation points, using eye contact to engage the audience, and using body language to emphasize key ideas. In the disagreement scenarios (sessions 6–7), participants learned to handle a conflict situation with a colleague where they were challenged by a colleague. They were coached to proficiently adopt the six steps of the Program for the Education and Enrichment of Relational Skills (PEERS®) program 29 in these scenarios, including “keep cool, listen, repeat, explain, say sorry, and solve the problem.” Session 8 served as a review and conclusion of the main points covered in each session. Supplementary Table S1 provides an overview of all the VR scenarios in the eight sessions.
Throughout the eight sessions, we incorporated several elements from the PEERS Social Skills Training program, 29 which is shown as effective in improving overall social skills in autistic young adults in Taiwan. 30 These elements included coaching with specific steps, role-play demonstrations and rehearsals, enhancing perspective-taking ability, and emphasizing appropriate body language. After exposure to the VR training scenario in each session, we provided individualized coaching, such as “avoiding criticizing former employers” or “providing one example to support your advantage,” based on the participants’ performance. We pointed out inappropriate responses, demonstrated appropriate examples, and encouraged immediate rehearsal practices. We prompted participants to consider the perspective of the interviewer. For example, the question “How would you react if you were not selected for the position?” may assess the interviewee’s frustration tolerance toward challenging questions. Therefore, it is important to show determination to continuous self-transcendence and applying the position again after self-enhancement rather than simply saying, “I would just move on to the next job.” Finally, we consistently emphasized appropriate body language, including maintaining eye contact, offering social smiles and courtesies, as well as adjusting speech volume, speed, and pitch. In addition, in each session, we used cognitive restructuring based on CBT principles to address unhelpful or dysfunctional cognitive patterns, such as all-or-nothing thinking and catastrophizing, which could lead to poor performance. Our goal was to identify alternative thinking to replace these maladaptive beliefs. Personalized coaching was an integral part of our approach, with participants receiving immediate feedback on their practice performance. To provide immediate feedback to participants, we outlined their strengths and areas to be improved and offered specific coaching accordingly. We demonstrated appropriate responses on the spot and encouraged them to practice again. If the performance was improved, we promptly gave verbal encouragement. Repetitive exposure to VR scenarios was a deliberate strategy to enhance the efficacy of the training and desensitize anxiety responses. We summarize the specific details of the training protocol in Supplementary Table S2. The VR scenarios experienced by the participants are shown in Supplementary Figure S1. The training team consisted of one psychiatrist and three research assistants with backgrounds in psychology. To ensure fidelity to the program, they underwent extensive training that included demonstration, observation, and close supervision before they began delivering the training independently. Before the implementation of this clinical trial, we invited three autistic adults to try this VR training program. Their valuable feedback on the training format and the VR platform helps optimize the training program.
Procedures
The Research Ethics Committee of National Taiwan University Hospital approved this study before implementation [No.201712172RINC]. After informed consent procedures in the clinics, all participants completed baseline assessments, including IQ, AQ, 31 Social Responsiveness Scale (SRS), 32 and Empathy Quotient (EQ) 33 before randomization. The enrolled participants were later randomized into the VR group or the control group. The participants in both groups underwent “Mock job interview” at the 1st week and the 8th week of the training period, as the assessments of pre-test and post-test performance. We recorded the participants’ responses in the pre-test and post-test to evaluate the interview performance on the Mandarin Chinese Version of the Adapted Mock Interview Rating Scale (adapted MIRS-Chinese) and speech characteristics. The evaluators were blind to the grouping status of the participants.
Measures
AQ
The AQ31 is a self-report questionnaire developed to quantify autistic traits in adults with normal IQ on WAIS. It consists of 50 theoretically derived statements depicting personal views, habits, and preferences pertinent to the unique profile of autistic people. Each statement is rated on a four-point scale, with answer categories “definitely agree,” “slightly agree,” “slightly disagree,” and “definitely disagree.” The former two are scored “1” and the latter two are “0,” leading to the total score ranging from 0 to 50, where a higher score depicts the autistic end of the continuum. The AQ has satisfactory internal consistency (0.82) and test–retest reliability (0.70) and good discriminative validity and screening properties for the diagnosis of ASD at a threshold score of 26. 34 The Chinese AQ had good psychometric properties with a 5-factor structure, that is, socialness, mindreading, patterns, attention to details, and attention switching. 35
EQ
The EQ33 is a measure of empathy. Around 81% of autistic adolescents and adults score less than 30 on the EQ, compared with 12% of non-autistic controls; the groups together report excellent internal consistency (0.92) and test–retest reliability (0.97). The Chinese version of EQ has satisfactory reliability and validity (Huang HY & Gau SS, unpublished).
SRS
The SRS32 is a 65-item rating scale on the level of autistic characteristics in natural settings over the past 6 months. It includes domains of social awareness, social information processing, capacity for reciprocal social communication, social avoidance, and autistic mannerisms. Items were rated on a 4-point Likert scale from “0” (not true) to “3” (almost always true). Higher scores reflect greater difficulties in social communication and more prominent autistic features. The Chinese-language SRS has demonstrated a satisfactory four-factor structure with high internal consistency (Cronbach’s alpha, 0.94–0.95), that is, social communication, stereotyped behaviors/interest, social awareness, and social emotion, 36 and has been widely used to assess social difficulties in Taiwan.
Simulator sickness questionnaire
The simulator sickness questionnaire (SSQ) is a 16-item questionnaire for the assessment of simulator sickness. 37 Each SSQ item receives a rating between 0 and 3. The SSQ comprises three categories as follows: nausea, oculomotor, and disorientation. A total simulator sickness score is defined as the accumulation of the sum scores of all three categories multiplied by a constant of 3.74. 38 The higher scores indicate more severe sickness symptoms. The severity of sickness symptoms was “negligible” if the total score <5, “minimal” if 5–9, “significant” if 10–14, “concerning” if 15–19, and “bad” if ≧20. 39 The SSQ is recently used to provide an optimal virtual environment for HMDs. This study used the SSQ to assess the adverse effects of the VR platform. A total of 107 autistic adults were invited to test on the VR platform (see Supplementary Table S3). Some experienced mild sickness on the VR testing platform. The most common symptoms included mild eyestrain (21%), blurred vision (16%), difficulty focusing (15%), general discomfort (15%), fatigue (13%), and vertigo (13%). The SSQ total score was 7.83, indicating a “minimal” level of sickness symptom severity. 39
Adapted MIRS-Chinese
We referred to the Mock Interview Rating Scale (MIRS), 40 a validated tool for evaluating job interview performance in adults with schizophrenia, which has recently been applied in autistic adults. 41 We translated the MIRS into Mandarin Chinese, maintaining the original items and 5-point rating anchors, while incorporating additional items assessing speech amount, speech volume, appropriate body language, and confidence specific for autistic individuals. This adapted version, the “adapted MIRS-Chinese,” consisted of 14 items: (1) comfort level, (2) negotiation skills, (3) conveying oneself as a hard worker (dependable), (4) sounding easy to work with (teamwork), (5) sharing things in a positive way, (6) sounding honest, (7) sounding interested in the position, (8) sounding professional, (9) establishing overall rapport with the interviewer, (10) overall impression, (11) speech amount, (12) speech volume, (13) appropriate body language, and (14) confident attitude. Each item was rated on a scale from 1 to 5. We summed the scores of the 14 domains of each individual to derive a total score. The higher score indicated better performance.
The mock job interview entailed standard questions like: “Please give a one-minute brief introduction of your past work experience and reasons for leaving,” “Why do you want to apply for this job?,” and “What are your strengths that make you more suitable for this job than others?” We recorded the trainees’ performance into audio files for scoring purposes and randomly assigned these audio files to three raters who were blind to the experimental condition. Before this study began, the three raters independently rated the same five audio files and attained a high degree of reliability (intraclass correlation coefficient = 0.91).
Data Analysis
Speech analysis
We used the openSMILE V2.0 software 42 to analyze the audio files of the pre- and post-test performance. The analyzed performance dimensions included voice volume, pitch, and quality (i.e., the trembling tendency of a voice). The volume was the mean intensity of the voice. The pitch was defined as the fundamental frequency (F0) of a speech on a logarithmic semitone scale. The voice quality was quantified by several acoustic features, such as jitter (indexing aperiodicity of the F0 signal), shimmer (the difference of the peak amplitudes of consecutive F0 periods), and Harmonics-to-Noise Ratio (indexing the relative amount of additive noise in the voice).
All these performance-indexing speech characteristics indicate a speaker’s emotional state. Specifically, emotions that relate to a higher arousal state or activation, such as anger and happiness, are associated with higher volume, pitch, and pitch variability (the standard deviation of the pitch). In contrast, sadness and boredom usually relate to lower volume, pitch, and pitch variability.43–46 Anxiety, often found in the interviewees during job interviews, was associated with an increased pitch, probably due to increased tension of laryngeal and vocal fold muscles under an anxious state. 47 In contrast, anxiety was related to lower pitch variability,46,48 increased proportion of pauses, 46 and change in voice quality (increased jitter). 49
Content analysis
Linguistic Inquiry and Word Count (LIWC) 50 is a text analysis program that counts the words relevant to psychologically meaningful categories in written texts or speeches. LIWC transferred qualitative description into quantitative data (i.e., the relative proportions of categories in texts) and can avoid subjectivity and inconsistency in human coding. Chinese LIWC51 was developed from English LIWC dictionary and revised according to Chinese language characteristics. It contains a total of around 6800 words across 30 linguistic categories and 42 psychological categories. The reliability and validity of Chinese LIWC and its equivalence to English LIWC have been proven satisfactory. 51 In this study, we first selected occupation-related word categories from Chinese LIWC. We then compared the percentages of each word category spoken by the VR and control groups between the pre-test and post-test.
Coherence analysis
To investigate text coherence within and between paragraphs, we applied the Sentence-Bidirectional Encoder Representations from Transformers (SBERT), 52 which integrates the Siamese network with a pre-trained Bidirectional Encoder Representations from Transformers (BERT) model. BERT 53 is a pre-trained language model developed by Google for solving Natural Language Processing (NLP) problems. Compared with traditional NLP approaches, the BERT model takes the position and context of a word in a sentence into account. The SBERT extends from BERT and can output a semantically meaningful vector representation for an inputted text, be it a word, a sentence, or a paragraph.
These vector representations outputted from SBERT allowed us to quantify the content coherences of a trainee’s pre- or post-test speeches. Specifically, we operationalized the coherence between two consecutive pieces of text by the semantic similarity score calculated from the cosine similarity between the two vectors (−1: highly opposite meanings; 1: highly similar meanings). 54 For each paragraph, a sliding window of two consecutive sentences yielded several coherence scores, which were then averaged to produce a mean coherence for each paragraph. Similarly, but at a larger textual scale, we could represent each paragraph by a vector, and we could calculate a coherence score from two consecutive paragraphs. We could then average these between-paragraph coherence scores to derive a mean between-paragraph coherence.
Feasibility study
Feasibility was operationalized as the level of adverse effect while using VR platform, adherence to the intervention, and user experience. Specifically, we evaluated the adverse effects of the VR platform by the SSQ total scores in all participants. As for adherence, we calculated the attendance rate for the VR group, which was presented by the percentage of intervention sessions attended by the participants. User experience was assessed through verbal inquiries regarding the ease of using VR platforms and the level of engagement.
Statistical analyses
We conducted all statistical analyses using SAS 9.4 software. We compared the demographics, degree of autistic traits, and other characteristics between the VR and control groups using Student t-tests and chi-square analyses. For the primary outcome, we conducted generalized estimating equations (GEEs) 55 to evaluate whether CBT-VR-JIWCT was associated with changes between the pre-test and the post-test in the adapted MIRS-Chinese scores. Specifically, we examined whether there was a significant group-by-time interaction to evaluate the intervention effect after 8-week training.
For speech, content, and coherence analyses, considering that more than half of the variables did not follow a normal distribution, we used nonparametric Mann–Whitney U tests to examine the differences between the pre-test and post-test within the two groups. We also conducted GEE to evaluate group-by-time interaction in voice volume, pitch, quality, selected LIWC categories, and the two coherence scores. GEEs are suitable for analyzing data with a repeated measures structure, 56 whereas Mann–Whitney U tests may be more efficient or robust for highly skewed or non-normal data, 57 as in our case. In addition, GEEs primarily focus on modeling mean differences, 58 whereas Mann–Whitney U tests provide insights into the overall distribution of the data, not just the central tendency. 59 Therefore, as both sets of information are important, we presented them together. The effect size r of each Mann–Whitney U test was also computed. 60 In general, an absolute value for r below 0.3 was considered a small effect, between 0.3 and 0.5 as a medium effect, and above 0.5 as a large effect.
Results
Job interview skills
We compared job interview role-play performance measured on the adapted MIRS-Chinese between the pre-test and post-test. We observed a significant group-by-time interaction (z = 6.33, p < 0.001) on the adapted MIRS-Chinese total scores, indicating that the VR group had more improvement on performance scores after 8-week training compared with the control group. (Fig. 2 and Supplementary Table S4)

The comparison of the adapted MIRS-Chinese total scores between the pretest and post-test in the VR and the control groups.
Speech analysis
In the post-test interview, both VR and control groups showed a significantly increased volume and decreased pitch than in the pre-test interview (Supplementary Table S5). When comparing the difference in speech volume, pitch, and quality between the pre-test and post-test performance, we found that after the 8-week training, the improvement in volume was significantly greater in the VR group (mean 0.0034, SD 0.0047) compared with the control group (mean 0.007, SD 0.0030) (U = 655, p = 0.010), whereas improvements in pitch and speech quality were not significant between the two groups (Table 2). These results aligned with the findings from the GEE analyses, which also showed a significant group-by-time interaction (z = 2.76, p = 0.006) on speech volume. (Table 2 and Supplementary Table S4)
The Comparison of the Improvement After Training in Speech Analysis, Content Analysis, and Coherence Analysis Between the Virtual Reality and the Control Groups
There’s a significant group-by-time interaction shown in GEE analysis.
VR, virtual reality; SD, standard deviation; GEE, generalized estimating equation.
Content analysis
Both VR and control groups showed significant gains in total word count, total phrase count, “I,” “social process,” “affect process,” “positive emotion,” “cognitive process,” and “work related” words at the post-test compared with the pre-test. However, in the control group, not only useful words but also redundant words such as “non-fluencies” and “fillers” increased significantly at post-test than pre-test (Supplementary Table S6).
Using the nonparametric Mann–Whitney U test to compare between-group differences on the change in total word counts, total phrase counts, and the category words after intervention (Table 2), we found that the VR group had significantly more improvement in total word count, total phrase count, and the “achievement” words compared with the control group. In GEE analysis, the significance of the group-by-time interaction disappeared, but a nonsignificant larger group-by-time interaction was observed in “achievement” words (z = 1.91, p = 0.056) (Supplementary Table S4).
Coherence analysis
Using the SBERT to analyze coherence within or between paragraphs (Table 2), we found a non-significant improvement of between-paragraph coherence in the VR group than the control group by the nonparametric Mann–Whitney U test, which aligned with the findings from the GEE analyses (Supplementary Table S4).
Feasibility study
The attendance rate during the 8-week training was 88.2% for the VR group. The mean SSQ scores were 7.83, indicating a “minimal” level of sickness symptom severity during the use of the VR platform. Almost all participants verbally reported having an enjoyable experience using VR training platform.
Discussion
As one of the studies applying VR technology in training autistic adults, this study demonstrated the high feasibility of using VR technology in this population due to its well-tolerated nature. The adverse effects were generally rare and mild, and the attendance rate during the 8-week training was acceptable. The preliminary efficacy of this VR training program was validated by a significant group-by-time interaction when evaluating the adapted MIRS-Chinese scores and speech analysis. Overall, autistic adults in the VR training group had significantly increased adapted MIRS-Chinese total scores, speech volume, and trend-level increased “achievement” words after training, compared with the control group. Besides, they exhibited an attitude of being more accountable, expressed genuine interest in the position, portrayed themselves as a cooperative team player, and maintained a more positive demeanor after training. Therefore, this study provides evidence to support that CBT-VR-JIWCT may be a feasible and efficacious program to enhance job interview skills for autistic adults.
The findings of improved job interview performance in the VR group align with previous studies using VR technology for training autistic adults. For instance, Strickland et al. 19 demonstrated that practicing interview skills through role-play with human-controlled avatars in computerized environments led to enhanced content and delivery of responses during job interviews. Moreover, Smith et al.20,23 presented the effectiveness of a computerized job interview stimulator system in improving job interview skills and subsequent employment outcomes. These findings, combined with ours, suggest that VR technology is promising in training autistic adults for job interview scenarios.
Apart from previous studies using computer screens or tablets as training platforms,17–20,23,61 our program represents a novel approach as one of the VR job training programs designed for autistic adults, utilizing innovative VR equipment, HMDs, which offer a more realistic simulation environment. This high degree of realism theoretically increases the potential to apply the acquired skills in real-world situations. Kourtesis et al. 27 recently developed an HMD-assisted VR system for training social skills in autistic adults (n = 25). Although their system had high acceptability and feasibility, there was no control group, and the improvement of the performance was not quantified after training. Other existing RCTs focusing on HMDs for life skills in autistic individuals recruited relatively smaller sample sizes (n < 15). 25 Our study stands out with its RCT design and larger sample, providing novel evidence for the efficacy of implementing HMDs in job training for autistic adults.
Furthermore, we assessed the preliminary efficacy of the VR training program through multiple means, including interviewer-rated job interview performance and objective quantitative assessments involving speech analysis, content analysis, and coherence analysis. Both VR and control groups exhibited a significant increase in volume and a decrease in pitch during the post-test compared with the pre-test, with the VR group showing a notably greater improvement in volume and greater group-by-time interaction compared with the control group. Previous research has highlighted how anxiety can influence various prosodic features, leading to a higher pitch, 47 reduced pitch variability,46,48 increased proportion of pauses, 46 and alterations in voice quality (increased jitter). 49 Moreover, successful interviews have been associated with higher voice intensity62,63 and greater pitch variability.64,65 In our study, the increase in volume and decrease in pitch observed in both groups during the post-test suggest that repeated practice with mock job interviews contributed to anxiety reduction and improved overall interview performance. In addition, the VR group demonstrated a more significant improvement in volume after the intervention, further supporting the preliminary efficacy of the VR program in enhancing interview-related speech characteristics. These findings were consistent with the results obtained from interview role-play scoring, where both groups showed a significant increase in total adapted MIRS-Chinese scores at the post-test. Notably, the VR group demonstrated a significantly greater group-by-time interaction in total adapted MIRS-Chinese score in comparison with the control group, indicating the importance of individual coaching and VR scenario practice during the 8-week intervention. Our subsidiary analysis revealed a significant correlation between the adapted MIRS-Chinese total score and the speech volume at the post-test (Spearman’s ρ = 0.412, p < 0.001), indicating that speech volume was correlated to overall interview performance (Supplementary Fig. S2, Supplementary Table S7). Further investigation is warranted to explore the impact of the intervention on pitch variability and its association with interview performance.
Regarding the content analysis, both VR and control groups showed significant gains in total word count, total phrase count, “I,” “social process,” “affect process,” “positive emotion,” “cognitive process,” and “work related” words at the post-test compared with the pre-test, suggesting that repeated practice itself could lead to more abundant speech content in autistic adults. However, it became apparent that the increased speech content in the control group was not solely attributed to an augmentation of useful words but was also influenced by an increase in redundant words such as “non-fluencies” and “fillers,” which was not observed in the VR group. These results indicate that while repeated practice alone can enrich the speech content, the VR training program might have more effectively increased the use of pertinent and valuable words during interviews compared with the control group. Previous study also indicated that interview performance was associated with not only more fluent speech and an increase in positive words but also related to fewer filler words and minimizing negative words. 66
Furthermore, the VR job training group exhibited a notable increase in total word count, total phrase count, and the use of “achievement” words after the training compared with the control group using Mann–Whitney U test, whereas only a nonsignificant larger group-by-time interaction still preserved in “achievement” words in GEE analysis. Findings in total word count, total phrase count, and the use of “achievement” words were not replicated by the GEE method. Future studies may consider a larger sample size to achieve adequate statistical power for conducting the GEE method and may explore the longitudinal changes in participants’ performance during and after training. In addition, Naim et al. 66 demonstrated an association between better interview performance and the use of more unique words and more “we” (versus “I”), which was not observed in our study. Incorporating additional word categories to thoroughly examine the program’s efficacy should be considered in the future.
Limitations
There were several limitations in this study. First, the study was underpowered to definitively establish the intervention’s efficacy; however, it did provide preliminary insights that can guide the design of future trials. Future studies may consider larger and more diverse samples. As for the feasibility, we only collected adverse effect while using VR platform, adherence to the intervention, and user experience to assess feasibility. Future studies may consider using standard questionnaires to evaluate satisfaction or in-depth interviews to assess user experience. Second, the study measured all outcomes in a laboratory environment with artificial test scenarios, which may not fully replicate real-life situations. Therefore, there was a possibility that the intervention did not correlate with actual increases in obtaining or retaining employment in real-world settings. The transferability of the intervention effect from the experimental environment to the real world should be examined. It would be valuable to incorporate real-life outcomes, such as employment rates, in the follow-up studies to examine the connection between our training program and ecologically valid outcomes. Third, we chose highly immersive VR equipment, HMD, to enhance training realism. However, its limited availability in communities restricts the scalability of our program. Despite this, most autistic participants are interested in this novel technology 67 that might increase their motivation toward the training. Another advantage is that our replayable simulated scenarios provide an opportunity for repeated exposure practice and may potentially reduce the labor costs of traditional intervention. In the future, investigating partnerships with community institutions will be valuable to enhance scalability and reach more potential beneficiaries. Implementation preparation costs could also be a significant area for further research. In addition, the use of scripted VR scenarios may lead to unexpected responses from autistic adults if the scenarios do not align with their personal situations. Future VR-JIT programs may address these issues specifically. Moreover, although all participants reported not receiving professional training during the study period, there were no objective measurements for this claim. As one of the novel VR job training programs for autistic adults and the first study to examine training effects for autistic adults through speech and content analyses, our findings need to be validated in larger independent samples. Regarding job interview performance, we successfully applied speech and content analyses to evaluate job interview performance, but a larger sample may be required to enhance statistical power for detecting small-effect differences in voice characteristics and content. Furthermore, future studies may consider analyzing additional prosodic information (e.g., pitch variability, portion of speech pauses, and so on) and expanding the range of word categories. Exploring correlations between these prosodic and lexical features and job interview performance would provide a more comprehensive understanding of the intervention’s impact and refine its implementation for autistic adults. Finally, due to the small size of the female subsample, this study lacked sufficient statistical power to conduct a subsample analysis. Verifying whether sex differences affect interview performance and identifying the possible reasons for these differences would be valuable in the future.
Conclusion
In conclusion, CBT-VR-JIWCT has demonstrated both preliminary efficacy and feasibility in improving job interview performance among autistic adults. By leveraging innovative technology, it has successfully engaged many unemployed autistic adults who had previously abandoned their job-seeking efforts due to unsuccessful experiences. The scripted VR scenarios are valuable in helping autistic adults manage anxiety during exposure to job interview situations. This study underscores the promising potential of VR technology as a motivating tool for work preparation among autistic adults. Furthermore, the incorporation of speech and content analyses can be valuable in assessing job interview performance in this population. These analyses offer valuable insights that can inform future interventions and research in this area, ultimately contributing to improved support and opportunities for autistic adults in the job market.
Authors’ contributions
Y.-L.C. conceptualized the study, designed the curriculum, recruited participants, and conducted clinical assessment. M.O. and R.-H.L. designed virtual reality training materials. T.-R.H. performed speech, content, and coherence analyses. P.-R.C. performed statistical analyses and drafted the article. All authors edited and approved the final article. The article has been submitted solely to this journal and is not published.
Footnotes
Acknowledgments
The authors thank research assistants Ms. Su-Chin Pan, Mr. Zu-Chin Ting, Ms. Fang-Yu Hsiao, and Ms. Yi-Jing Lee for their contributions in this training program. The authors also thank Mr. Yi-Hao Wei for assisting the analysis. The authors also thank the staff of National Taiwan University Hospital-Statistical Consulting Unit (NTUH-SCU) for statistical consultation and analyses.
Author Disclosure Statement
All authors declare no conflicts of interest.
Funding Information
The work presented in this article is supported by grants from Minister of Science and Technology (MOST 108-2628-H-002-009-MY3, 111-2410-H-002-156-MY2), National Taiwan University Hospital (UN109-007, UN112-0037), and National Health Research Institutes (NHRI-EX110-11008PC, NHRI-EX111-11008PC, NHRI-EX112-11008PC), Taiwan. The funding bodies had no roles in the study design, collection, analysis, interpretation of data, and article writing.
Supplemental Material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
