Abstract
In recent years, a plethora of artificial intelligence (AI) chatbots have been developed and made available to the public. Consequently, an increasing number of individuals are integrating AI chatbots into their daily lives for various purposes. This trend has also raised concerns regarding AI chatbot dependence. However, a valid and reliable scale to assess AI chatbot dependence is yet to be developed. Therefore, this study was designed to develop and validate an AI chatbot dependence scale. We obtained initial items from previous publications and in-depth interviews. Subsequently, item analysis, exploratory factor analysis (EFA), confirmatory factor analysis (CFA), reliability, and validity analyses were performed to validate the AI chatbot dependence scale. Seventeen items underwent item analysis and EFA, resulting in a single-factor model with eight items explaining 58.42% of the total variance. The CFA indicated that our AI chatbot dependence scale had acceptable model fitting indices, with standardized loadings ranging between 0.50 and 0.76. In addition, this scale exhibited good reliability and validity. Thus, the current AI chatbot dependence scale can effectively evaluate individuals’ dependence on AI chatbots in their daily lives.
Introduction
Artificial intelligence (AI) is a complex concept that has evolved with different concepts in various fields and periods.1,2 Its earliest definition was proposed by AI pioneer John McCarthy in 1955 as “the science and engineering of making intelligent machines.”3,4 As of today, AI has been developing for over 50 years.5,6 However, it seems that AI has only recently begun to impact everyone in the world. This change is driven by the emergence of easy-to-use functionalities, such as ChatGPT.
ChatGPT is a large language model developed by OpenAI, based on the Generative Pretrained Transformer (GPT) architecture. It has garnered worldwide attention for its ability to handle challenging language understanding and generation tasks in a conversational format. 7 Within five days of its launch, ChatGPT attracted 1 million users, 8 and to date, it has amassed over 100 million monthly active users. 9 Following the success of ChatGPT, many similar programs have been developed. These conversational AI programs, based on large-scale language models and commonly referred to as AI chatbots, can interact with users through text or voice, provide relevant information, answer questions, and even simulate natural and friendly conversation experiences.8,10–12 Currently, AI chatbots are being applied across various sectors such as medicine,11,13 health,11,13 scientific research, 14 business, 15 social media, 15 and education. 16 As their popularity continues to grow, researchers are becoming increasingly concerned about potential negative effects, particularly the risk of dependence on AI chatbots.17,18
While traditional technology dependence behaviors—such as dependence on smartphones, social media, and text messaging—have been extensively studied,19–24 AI chatbots differ fundamentally from these technologies. For instance, AI chatbots provide interactive conversational engagement, whereas traditional technology dependence often revolves around content consumption.25,26 These differences suggest that existing measurement tools for traditional technology dependence may not be fully suitable for assessing AI chatbot dependence. However, there is currently no specialized tool to measure dependence on AI chatbots. For this reason, we conducted this study to fill this gap.
Study 1: The Development of AI Chatbot Dependence Scale
Methods
Measurement item generation and refinement
First, the current study defines AI chatbot dependence as the psychological and behavioral reliance individuals develop on AI chatbots in their daily lives.18,27,28 Next, we derived potential items from previous publications, including the scale of intelligent machines dependence, 27 AI dependence, 29 and other technology dependence scales,19–24 as well as from in-depth interviews. Sixteen and 15 heavy users who self-reported using AI chatbots almost daily were recruited for two rounds of in-depth interviews, respectively. Subsequently, the expert and target group assessed the content validity. The group consisted of three active researchers in psychology and three heavy users. Based on their feedback, we made modifications to our items.
Participants and procedure
We conducted a cross-sectional survey from January 20, 2024 to January 25, 2024. The questionnaire was uploaded to the “Sojump” platform. 30 Twenty college students were recruited to share a QR code linked to our questionnaire in their various online chat groups. The study’s topic was described as investigating AI chatbot use behavior, and details of the research questions were not disclosed during participant recruitment. Informed consent was obtained from all participants. The inclusion and exclusion criteria for participants are detailed in Supplementary Table S1. This study was approved and supervised by the ethics review board of Southwest University.
Data analyses
The critical ratio analysis, correlation coefficient analysis, reliability analysis, and factor analysis were used to conduct item analysis. The criteria for item deletion are detailed in Supplementary Table S2. The exploratory factor analysis (EFA) was conducted using the principal axis method. A Kaiser–Meyer–Olkin (KMO) value above 0.7 and Bartlett’s test of sphericity with a p value > 0.05 are considered suitable for factor analysis.31,32 The eigenvalues greater than one rule and cumulative percentage of variance method were used to determine the number of common factors. All statistical analyses were conducted using SPSS 26.0 (SPSS Inc., Chicago, IL, USA), and p values < 0.05 were considered statistically significant.33,34
Results
Measurement item generation and refinement
Twenty-nine items were obtained from previous publications and in-depth interviews. Subsequently, we made modifications, mergers, and deletions to these items based on the feedback from the expert and target group. Eventually, we obtained an initial set of 17 items for further item analysis and EFA. The refinement process is detailed in Supplementary Table S3.
Participants’ characteristics
A total of 233 participants were included, with 55.8% being male. The majority of participants were young people. Most participants reported to have a bachelor’s degree or higher (Table 1).
The Participants’ Characteristics
Item analysis and EFA
No item was excluded due to the critical ratio analysis, correlation coefficient analysis, and reliability analysis. However, in the factor analysis, eight items (items 1, 3, 4, 5, 6, 8, 9, and 15) were excluded due to the common factor being less than 0.4. In addition, one item (item 13) was excluded due to a factor loading over 1, possibly indicating multicollinearity issues.
We conducted EFA on eight items, and the analysis revealed a KMO value of 0.93 and Bartlett’s test of sphericity with χ2 = 886 and p < 0.05, indicating that the sample was suitable for EFA. The indicator of EFA showed that only one factor had eigenvalues greater than 1, explaining 58.42 percent of the total variance. Thus, the single-factor model with eight items was used to conduct further study (Table 2).
The Results of Exploratory Factor Analysis
Study 2: The Validation of AI Chatbot Dependence Scale
Methods
Participants and procedure
We conducted a cross-sectional survey from February 8, 2024 to February 13, 2024, to obtain data. The procedures of participant recruitment and inclusion were identical to Study 1.
Instruments and measurements
The single-factor model comprising eight items obtained from Study 1 was further validated in another different sample. Considering that overuse of technology is often one of the manifestations of dependence, we examined the association between the total score of the final scale and AI chatbot usage behavior. The assessment of AI chatbot usage was based on frequency, with participants answering the following question: “In the last month, how often did you use AI chatbot tools each week?” Demographic information, including gender, age, and education level, was also surveyed.
Data analyses
We conducted a confirmatory factor analysis (CFA) on the eight items, excluding those with standardized loadings below 0.5 or above 0.95. We utilized χ2/degrees of freedom (df), goodness of fit index (GFI), root mean squared error of approximation (RMSEA), comparative fit index (CFI), and normed fit index (NFI) to evaluate model fit. The goodness of fit was assessed using the following criteria: χ2/df <3.00; GFI >0.90; RMSEA <0.08; CFI >0.90; and NFI >0.90.35–37 In addition, a two-dimensional CFA model was developed as an alternative to the initial model. The Akaike information criterion (AIC) was used to assess and compare the fit of the competing model against the initial one. 38
The scale’s reliability was assessed using Cronbach’s alpha and Spearman–Brown coefficients, and its validity was evaluated through convergent and construct validity. The criteria for evaluating reliability and validity are detailed in Supplementary Table S4. The Spearman’s rank correlation was used to examine the relationship between the total score of the final scale and AI chatbot usage. 39 The multiple-group CFA was used to assess measurement invariance across gender.
All statistical analyses were conducted using SPSS 26.0 and AMOS 23.0 software (SPSS Inc., Chicago, IL, USA), and p values < 0.05 were considered statistically significant.33,34
Results
Participants’ characteristics
A total of 269 participants were included in the study, with 63.2% being male (Table 3). As in Study 1, the majority of participants were young adults and held a bachelor’s degree or higher.
The Participants’ Characteristics
AI, artificial intelligence.
Confirmatory factor analysis
The single-factor model obtained in Study 1 adequately fitted the data (χ2/df = 2.97; GFI = 0.95; RMSEA = 0.80; CFI = 0.96; NFI = 0.94, and AIC = 91.58). In addition, no items were excluded due to inappropriate standardized loadings (Fig. 1). Furthermore, the two-dimensional CFA model failed to replace the initial model. The details of the alternative model are shown in Supplementary Figure S1.

The standardized loadings of the single-factor model.
Reliability and validity analyses
Our single-factor model demonstrated good reliability (Cronbach’s α coefficient = 0.88 and Spearman–Brown coefficient = 0.86). In addition, it showed a strong convergent validity (factor loadings >0.50, average variance extracted = 0.50, and composite reliability = 0.79) and structural validity (indicated by the acceptable model fitting indices). The final version of the AI chatbot dependence scale is exhibited in Table 4.
The Final Version of the Artificial Intelligence Chatbot Dependence Scale
A 7-point Likert scale was used to collect answers, where 1 = strongly disagree, 7 = strongly agree; the original scale is in Chinese. The original version was translated using a back-and-forth method into English. Note that the translated version has not been validated.
Measurement invariance and correlation analysis
Our multiple-group CFA results indicate that the model fails to demonstrate measurement equivalence across gender groups (p < 0.05). There was a positive correlation between the AI chatbot dependence score and AI chatbot usage, although the correlation was relatively weak (correlation coefficient = 0.27; p < 0.01).
Discussion
In past psychological research, dependence measurement tools have been widely developed and used.40,41 However, studies that focused specifically on AI chatbot dependence remain limited. One study developed a three-item scale to measure dependence on intelligent machines, but it was primarily designed for work-related scenarios, which limits its applicability for evaluating general AI chatbot dependence. 27 Another study adapted a smartphone addiction scale to assess AI dependence, showing good reliability. 29 However, the inherent differences between smartphones and AI chatbots suggest that such adaptations may not fully capture the unique aspects of AI chatbot dependence, potentially leading to biased results. To address this gap and enhance the accuracy of future research on AI chatbot dependence behaviors, we developed and validated an eight-item AI chatbot dependence scale, which provides a practical and reliable tool for future studies.
Due to the lack of established and validated AI chatbot scales, we were unable to conduct a rigorous concurrent validity test. As an alternative, we examined the relationship between AI chatbot usage and dependence. This is because overuse of technology is often one of the manifestations of dependence. 25 Our findings demonstrated a positive but modest correlation between AI chatbot dependence and usage frequency (r = 0.27, p < 0.01). This result supports the notion that AI chatbot dependence may be reflected, to some extent, in increased usage frequency.
Our scale did not demonstrate measurement equivalence between genders, indicating that it may function differently for each group, making direct comparisons problematic. However, the smaller female sample size in Study 2 (36.8%) could have reduced the power to detect measurement invariance. Moreover, we did not observe significant score differences between genders (male: 35.28 ± 9.25; female: 35.33 ± 10.43; Mann-Whitney U test, p = 0.84). We suggest that future research recruit more balanced samples to properly validate measurement equivalence.
Some limitations should be noted when interpreting the findings of the current study. First, due to research limitations, the current validation of the scale is not fully comprehensive. We recommend that future studies further validate the scale by conducting tests such as concurrent validity and predictive validity. Second, our scale was originally developed in Chinese, which may introduce cultural bias. In addition, the English translation provided in this study has not yet been validated. Therefore, its reliability and accuracy need to be confirmed before using the English version. Finally, the majority of participants in both studies were young individuals with a high level of education, a demographic more likely to be exposed to and adapt to new technologies. Future research should evaluate the effectiveness of this scale among older adults and individuals with lower levels of education to ensure its broader applicability.
Conclusions
To support future research on AI chatbot dependence, we developed an eight-item unidimensional scale that demonstrates strong psychometric properties. Its concise format facilitates easy incorporation into most studies focused on assessing AI chatbot dependence. To advance research in this area, we recommend that future studies evaluate the scale’s predictive validity and test its stability over time.
Footnotes
Authors’ Contributions
X.Z.: Conceptualization, formal analysis, investigation, methodology, writing—original draft, and writing—review and editing. M.Y.: Investigation and writing—review and editing. M.Z.: Investigation and writing—review and editing. Z.L.: Writing—review and editing. H.L.: Investigation, supervision, writing—original draft, and writing—review and editing.
Author Disclosure Statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Funding Information
This research received no external funding.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
