The Development and Validation of an Artificial Intelligence Chatbot Dependence Scale

Abstract

In recent years, a plethora of artificial intelligence (AI) chatbots have been developed and made available to the public. Consequently, an increasing number of individuals are integrating AI chatbots into their daily lives for various purposes. This trend has also raised concerns regarding AI chatbot dependence. However, a valid and reliable scale to assess AI chatbot dependence is yet to be developed. Therefore, this study was designed to develop and validate an AI chatbot dependence scale. We obtained initial items from previous publications and in-depth interviews. Subsequently, item analysis, exploratory factor analysis (EFA), confirmatory factor analysis (CFA), reliability, and validity analyses were performed to validate the AI chatbot dependence scale. Seventeen items underwent item analysis and EFA, resulting in a single-factor model with eight items explaining 58.42% of the total variance. The CFA indicated that our AI chatbot dependence scale had acceptable model fitting indices, with standardized loadings ranging between 0.50 and 0.76. In addition, this scale exhibited good reliability and validity. Thus, the current AI chatbot dependence scale can effectively evaluate individuals’ dependence on AI chatbots in their daily lives.

Introduction

Artificial intelligence (AI) is a complex concept that has evolved with different concepts in various fields and periods.^1,2 Its earliest definition was proposed by AI pioneer John McCarthy in 1955 as “the science and engineering of making intelligent machines.”^3,4 As of today, AI has been developing for over 50 years.^5,6 However, it seems that AI has only recently begun to impact everyone in the world. This change is driven by the emergence of easy-to-use functionalities, such as ChatGPT.

ChatGPT is a large language model developed by OpenAI, based on the Generative Pretrained Transformer (GPT) architecture. It has garnered worldwide attention for its ability to handle challenging language understanding and generation tasks in a conversational format.⁷ Within five days of its launch, ChatGPT attracted 1 million users,⁸ and to date, it has amassed over 100 million monthly active users.⁹ Following the success of ChatGPT, many similar programs have been developed. These conversational AI programs, based on large-scale language models and commonly referred to as AI chatbots, can interact with users through text or voice, provide relevant information, answer questions, and even simulate natural and friendly conversation experiences.^8,10–12 Currently, AI chatbots are being applied across various sectors such as medicine,^11,13 health,^11,13 scientific research,¹⁴ business,¹⁵ social media,¹⁵ and education.¹⁶ As their popularity continues to grow, researchers are becoming increasingly concerned about potential negative effects, particularly the risk of dependence on AI chatbots.^17,18

While traditional technology dependence behaviors—such as dependence on smartphones, social media, and text messaging—have been extensively studied,^19–24 AI chatbots differ fundamentally from these technologies. For instance, AI chatbots provide interactive conversational engagement, whereas traditional technology dependence often revolves around content consumption.^25,26 These differences suggest that existing measurement tools for traditional technology dependence may not be fully suitable for assessing AI chatbot dependence. However, there is currently no specialized tool to measure dependence on AI chatbots. For this reason, we conducted this study to fill this gap.

Study 1: The Development of AI Chatbot Dependence Scale

Methods

Measurement item generation and refinement

First, the current study defines AI chatbot dependence as the psychological and behavioral reliance individuals develop on AI chatbots in their daily lives.^18,27,28 Next, we derived potential items from previous publications, including the scale of intelligent machines dependence,²⁷ AI dependence,²⁹ and other technology dependence scales,^19–24 as well as from in-depth interviews. Sixteen and 15 heavy users who self-reported using AI chatbots almost daily were recruited for two rounds of in-depth interviews, respectively. Subsequently, the expert and target group assessed the content validity. The group consisted of three active researchers in psychology and three heavy users. Based on their feedback, we made modifications to our items.

Participants and procedure

We conducted a cross-sectional survey from January 20, 2024 to January 25, 2024. The questionnaire was uploaded to the “Sojump” platform.³⁰ Twenty college students were recruited to share a QR code linked to our questionnaire in their various online chat groups. The study’s topic was described as investigating AI chatbot use behavior, and details of the research questions were not disclosed during participant recruitment. Informed consent was obtained from all participants. The inclusion and exclusion criteria for participants are detailed in Supplementary Table S1. This study was approved and supervised by the ethics review board of Southwest University.

Data analyses

The critical ratio analysis, correlation coefficient analysis, reliability analysis, and factor analysis were used to conduct item analysis. The criteria for item deletion are detailed in Supplementary Table S2. The exploratory factor analysis (EFA) was conducted using the principal axis method. A Kaiser–Meyer–Olkin (KMO) value above 0.7 and Bartlett’s test of sphericity with a p value > 0.05 are considered suitable for factor analysis.^31,32 The eigenvalues greater than one rule and cumulative percentage of variance method were used to determine the number of common factors. All statistical analyses were conducted using SPSS 26.0 (SPSS Inc., Chicago, IL, USA), and p values < 0.05 were considered statistically significant.^33,34

Results

Measurement item generation and refinement

Twenty-nine items were obtained from previous publications and in-depth interviews. Subsequently, we made modifications, mergers, and deletions to these items based on the feedback from the expert and target group. Eventually, we obtained an initial set of 17 items for further item analysis and EFA. The refinement process is detailed in Supplementary Table S3.

Participants’ characteristics

A total of 233 participants were included, with 55.8% being male. The majority of participants were young people. Most participants reported to have a bachelor’s degree or higher (Table 1).

Table 1.

The Participants’ Characteristics

Variable	Category	n	Percentage
Gender	Male	130	55.8%
Gender	Female	103	44.2%
Age (year)	18∼25	179	76.8%
	26∼30	42	18.0%
	>30	12	5.2%
Educational background	Primary school	0	0%
	Junior high school	0	0%
	High school	4	1.7%
	Junior college	16	6.9%
	Undergraduate	164	70.4%
	Master	39	16.7%
	Doctoral	10	4.3%

Item analysis and EFA

No item was excluded due to the critical ratio analysis, correlation coefficient analysis, and reliability analysis. However, in the factor analysis, eight items (items 1, 3, 4, 5, 6, 8, 9, and 15) were excluded due to the common factor being less than 0.4. In addition, one item (item 13) was excluded due to a factor loading over 1, possibly indicating multicollinearity issues.

We conducted EFA on eight items, and the analysis revealed a KMO value of 0.93 and Bartlett’s test of sphericity with χ² = 886 and p < 0.05, indicating that the sample was suitable for EFA. The indicator of EFA showed that only one factor had eigenvalues greater than 1, explaining 58.42 percent of the total variance. Thus, the single-factor model with eight items was used to conduct further study (Table 2).

Table 2.

The Results of Exploratory Factor Analysis

Item	Factor loading	Communality
2	0.64	0.41
7	0.79	0.62
10	0.70	0.49
11	0.65	0.42
12	0.74	0.55
14	0.76	0.57
16	0.72	0.52
17	0.79	0.62
Eigenvalue	4.67
Percentage of variance	58.42%

Study 2: The Validation of AI Chatbot Dependence Scale

Methods

Participants and procedure

We conducted a cross-sectional survey from February 8, 2024 to February 13, 2024, to obtain data. The procedures of participant recruitment and inclusion were identical to Study 1.

Instruments and measurements

The single-factor model comprising eight items obtained from Study 1 was further validated in another different sample. Considering that overuse of technology is often one of the manifestations of dependence, we examined the association between the total score of the final scale and AI chatbot usage behavior. The assessment of AI chatbot usage was based on frequency, with participants answering the following question: “In the last month, how often did you use AI chatbot tools each week?” Demographic information, including gender, age, and education level, was also surveyed.

Data analyses

We conducted a confirmatory factor analysis (CFA) on the eight items, excluding those with standardized loadings below 0.5 or above 0.95. We utilized χ²/degrees of freedom (df), goodness of fit index (GFI), root mean squared error of approximation (RMSEA), comparative fit index (CFI), and normed fit index (NFI) to evaluate model fit. The goodness of fit was assessed using the following criteria: χ²/df <3.00; GFI >0.90; RMSEA <0.08; CFI >0.90; and NFI >0.90.^35–37 In addition, a two-dimensional CFA model was developed as an alternative to the initial model. The Akaike information criterion (AIC) was used to assess and compare the fit of the competing model against the initial one.³⁸

The scale’s reliability was assessed using Cronbach’s alpha and Spearman–Brown coefficients, and its validity was evaluated through convergent and construct validity. The criteria for evaluating reliability and validity are detailed in Supplementary Table S4. The Spearman’s rank correlation was used to examine the relationship between the total score of the final scale and AI chatbot usage.³⁹ The multiple-group CFA was used to assess measurement invariance across gender.

All statistical analyses were conducted using SPSS 26.0 and AMOS 23.0 software (SPSS Inc., Chicago, IL, USA), and p values < 0.05 were considered statistically significant.^33,34

Results

Participants’ characteristics

A total of 269 participants were included in the study, with 63.2% being male (Table 3). As in Study 1, the majority of participants were young adults and held a bachelor’s degree or higher.

Table 3.

The Participants’ Characteristics

Variable	Category	n	Percentage
Gender	Male	170	63.2%
Gender	Female	99	36.8%
Age (year)	18∼25	201	74.7%
	26∼30	49	18.2%
	>30	19	7.1%
Educational background	Primary school	0	0%
	Junior high school	0	0%
	High school	4	1.5%
	Junior college	9	3.3%
	Undergraduate	209	77.7%
	Master	37	13.8%
	Doctoral	10	3.7%
AI chatbot usage	Never used	4	1.5%
	Average one day a week	67	24.9%
	Average two days a week	29	10.8%
	Average three days a week	53	19.7%
	Average four days a week	55	20.4%
	Average five days a week	37	13.8%
	Average six days a week	8	3.0%
	Average seven days a week	16	5.9%

AI, artificial intelligence.

Confirmatory factor analysis

The single-factor model obtained in Study 1 adequately fitted the data (χ²/df = 2.97; GFI = 0.95; RMSEA = 0.80; CFI = 0.96; NFI = 0.94, and AIC = 91.58). In addition, no items were excluded due to inappropriate standardized loadings (Fig. 1). Furthermore, the two-dimensional CFA model failed to replace the initial model. The details of the alternative model are shown in Supplementary Figure S1.

FIG. 1.

The standardized loadings of the single-factor model.

Reliability and validity analyses

Our single-factor model demonstrated good reliability (Cronbach’s α coefficient = 0.88 and Spearman–Brown coefficient = 0.86). In addition, it showed a strong convergent validity (factor loadings >0.50, average variance extracted = 0.50, and composite reliability = 0.79) and structural validity (indicated by the acceptable model fitting indices). The final version of the AI chatbot dependence scale is exhibited in Table 4.

Table 4.

The Final Version of the Artificial Intelligence Chatbot Dependence Scale

Items
1. 如果无法使用人工智能聊天机器人, 我会感到焦虑或不适。
1. If unable to use AI chatbots, I would feel anxious or uncomfortable.
2. 在开始工作或任务前, 我必须先启动人工智能聊天机器人。
2. I need to open AI chatbots before starting work or tasks.
3. 如果无法使用人工智能聊天机器人, 我会感到难以获取所需的信息。
3. If unable to use AI chatbots, I would find it difficult to obtain the necessary information.
4. 即使面对那些自己能轻松完成的工作或任务, 我也倾向于让人工智能聊天机器人帮忙。
4. Even when facing tasks or jobs that I could easily complete myself, I tend to seek assistance from AI chatbots.
5. 相对于其他人或物, 我更愿意将时间花费在人工智能聊天机器人上。
5. Compared with other people or things, I prefer to spend time on AI chatbots.
6. 即使不使用人工智能聊天机器人, 我也要保持它在后台的登录/启动状态。
6. Even if not actively using AI chatbots, I keep them logged in or running in the background.
7. 我花费在人工智能聊天机器人上的时间越来越多了。
7. I am spending increasingly more time on AI chatbots.
8. 对我而言, 没有人工智能聊天机器人的生活对是不方便的。
8. For me, life without AI chatbots would be inconvenient.

A 7-point Likert scale was used to collect answers, where 1 = strongly disagree, 7 = strongly agree; the original scale is in Chinese. The original version was translated using a back-and-forth method into English. Note that the translated version has not been validated.

Measurement invariance and correlation analysis

Our multiple-group CFA results indicate that the model fails to demonstrate measurement equivalence across gender groups (p < 0.05). There was a positive correlation between the AI chatbot dependence score and AI chatbot usage, although the correlation was relatively weak (correlation coefficient = 0.27; p < 0.01).

Discussion

In past psychological research, dependence measurement tools have been widely developed and used.^40,41 However, studies that focused specifically on AI chatbot dependence remain limited. One study developed a three-item scale to measure dependence on intelligent machines, but it was primarily designed for work-related scenarios, which limits its applicability for evaluating general AI chatbot dependence.²⁷ Another study adapted a smartphone addiction scale to assess AI dependence, showing good reliability.²⁹ However, the inherent differences between smartphones and AI chatbots suggest that such adaptations may not fully capture the unique aspects of AI chatbot dependence, potentially leading to biased results. To address this gap and enhance the accuracy of future research on AI chatbot dependence behaviors, we developed and validated an eight-item AI chatbot dependence scale, which provides a practical and reliable tool for future studies.

Due to the lack of established and validated AI chatbot scales, we were unable to conduct a rigorous concurrent validity test. As an alternative, we examined the relationship between AI chatbot usage and dependence. This is because overuse of technology is often one of the manifestations of dependence.²⁵ Our findings demonstrated a positive but modest correlation between AI chatbot dependence and usage frequency (r = 0.27, p < 0.01). This result supports the notion that AI chatbot dependence may be reflected, to some extent, in increased usage frequency.

Our scale did not demonstrate measurement equivalence between genders, indicating that it may function differently for each group, making direct comparisons problematic. However, the smaller female sample size in Study 2 (36.8%) could have reduced the power to detect measurement invariance. Moreover, we did not observe significant score differences between genders (male: 35.28 ± 9.25; female: 35.33 ± 10.43; Mann-Whitney U test, p = 0.84). We suggest that future research recruit more balanced samples to properly validate measurement equivalence.

Some limitations should be noted when interpreting the findings of the current study. First, due to research limitations, the current validation of the scale is not fully comprehensive. We recommend that future studies further validate the scale by conducting tests such as concurrent validity and predictive validity. Second, our scale was originally developed in Chinese, which may introduce cultural bias. In addition, the English translation provided in this study has not yet been validated. Therefore, its reliability and accuracy need to be confirmed before using the English version. Finally, the majority of participants in both studies were young individuals with a high level of education, a demographic more likely to be exposed to and adapt to new technologies. Future research should evaluate the effectiveness of this scale among older adults and individuals with lower levels of education to ensure its broader applicability.

Conclusions

To support future research on AI chatbot dependence, we developed an eight-item unidimensional scale that demonstrates strong psychometric properties. Its concise format facilitates easy incorporation into most studies focused on assessing AI chatbot dependence. To advance research in this area, we recommend that future studies evaluate the scale’s predictive validity and test its stability over time.

Footnotes

Authors’ Contributions

X.Z.: Conceptualization, formal analysis, investigation, methodology, writing—original draft, and writing—review and editing. M.Y.: Investigation and writing—review and editing. M.Z.: Investigation and writing—review and editing. Z.L.: Writing—review and editing. H.L.: Investigation, supervision, writing—original draft, and writing—review and editing.

Author Disclosure Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Funding Information

This research received no external funding.

Supplementary Material

References

Saghiri

, Vahidipour

, Jabbarpour

, et al. A survey of artificial intelligence challenges: Analyzing the definitions, relationships, and evolutions. Appl. Sci, 2022; 12(8):4054.

Kok

, Boers

, Kosters

, et al. Artificial intelligence: Definition, trends, techniques, and cases. Artificial Intelligence, 2009; 1(270–299):51.

Mintz

, Brodie

. Introduction to artificial intelligence in medicine. Minim Invasive Ther Allied Technol, 2019; 28(2):73–81.

Grewal

, Dean R & D Desh Bhagat University, Mandi Gobindgarh, India. A critical conceptual analysis of definitions of artificial intelligence as applicable to computer engineering. Iosrjce, 2014; 16(2):9–13.

Flasiński

, Flasiński

. History of artificial intelligence. In Introduction to Artificial Intelligence. Springer: Cham, 2016, pp. 3–13.

Haenlein

, Kaplan

. A brief history of artificial intelligence: On the past, present, and future of artificial intelligence. California Management Review, 2019; 61(4):5–14.

, He

, Liu

, et al. A brief overview of ChatGPT: The history, status quo and potential future development. IEEE/CAA J Autom Sinica, 2023; 10(5):1122–1136.

Kim

. Application of artificial intelligence chatbot, including ChatGPT in education, scholarly work, programming, and content generation and its prospects: A narrative review. J Educ Eval Health Prof, 2023; 20:38.

Duarte

. Number of ChatGPT Users (Apr 2024). 2024. Available from: https://explodingtopics.com/blog/chatgpt-users

10.

Au Yeung

, Kraljevic

, Luintel

, et al. AI chatbots not yet ready for clinical use. Front Digit Health, 2023; 5:1161098; doi: 10.3389/fdgth.2023.1161098

11.

Lee

, Bubeck

, Petro

. Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine. N Engl J Med, 2023; 388(13):1233–1239; doi: 10.1056/NEJMsr2214184

12.

Pan

, Musheyev

, Bockelman

, et al. Assessment of artificial intelligence chatbot responses to top searched queries about cancer. JAMA Oncol, 2023; 9(10):1437–1440; doi: 10.1001/jamaoncol.2023.2947

13.

Lee

. AI-based healthcare chatbot. Int Res J Eng Technol, 2023; 10:563–568.

14.

Fatani

. ChatGPT for future medical and dental research. Cureus, 2023; 15(4):e37285.

15.

Asha

, Reddy

, Nithya

, et al. Implication and Advantages of Machine Learning-Based Chatbots in Diverse Disciplines. IEEE, 2023.

16.

Chang

, Lin

MP-C

, Hajian

, et al. Educational design principles of using AI Chatbot that supports self-regulated learning in education: Goal setting, feedback, and personalization. Sustainability, 2023; 15(17):12921.

17.

Hassan

, Abdelfattah

, Al Halbusi

, et al. Me and My AI Bot: Exploring the ‘AIholic’ Phenomenon and University Students’ Dependency on Generative AI Chatbots - Is This the New Academic Addiction? Research Square, 2023.

18.

Farhi

, Jeljeli

, Aburezeq

, et al. Analyzing the students’ views, concerns, and perceived ethics about chat GPT usage. Comput Educ Artificial Intell, 2023; 5:100180.

19.

Igarashi

, Motoyoshi

, Takai

, et al. No mobile, no life: Self-perception and text-message dependency among Japanese high school students. Comput Hum Behav, 2008; 24(5):2311–2324; doi: 10.1016/j.chb.2007.12.001

20.

Jenkins-Guarnieri

, Wright

, Johnson

. Development and validation of a social media use integration scale. Psychology of Popular Media Culture, 2013; 2(1):38–50.

21.

Elphinston

, Noller

. Time to face it! Facebook intrusion and the implications for romantic jealousy and relationship satisfaction. Cyberpsychol Behav Soc Netw, 2011; 14(11):631–635.

22.

van den Eijnden

RJJM

, Lemmens

, Valkenburg

. The Social Media Disorder Scale. Comput Hum Behav, 2016; 61:478–487; doi: 10.1016/j.chb.2016.03.038

23.

Andreassen

, Torsheim

, Brunborg

, et al. Development of a Facebook addiction scale. Psychol Rep, 2012; 110(2):501–517.

24.

Trub

, Barbot

. The paradox of phone attachment: Development and validation of the Young Adult Attachment to Phone Scale (YAPS). Comput Hum Behav, 2016; 64:663–672.

25.

De-Sola Gutiérrez

, Rodríguez de Fonseca

, Rubio

. Cell-phone addiction: A review. Front Psychiatry, 2016; 7:175.

26.

Elhai

, Hall

, Levine

, et al. Types of smartphone usage and relations with problematic smartphone behaviors: The role of content consumption vs. social smartphone use. CP, 2017; 11(2).

27.

Tang

, Koopman

, Yam

, et al. The self-regulatory consequences of dependence on intelligent machines at work: Evidence from field and experimental studies. Human Resource Management, 2023; 62(5):721–744; doi: 10.1002/hrm.22154

28.

Haman

, Školník

. Behind the ChatGPT hype: Are its suggestions contributing to addiction? Ann Biomed Eng, 2023; 51(6):1128–1129.

29.

Huang

, Lai

, Ke

, et al. AI Technology panic—is AI dependence bad for mental health? A cross-lagged panel model and the mediating roles of motivations for AI use among adolescents. Psychol Res Behav Manag, 2024; 17:1087–1102.

30.

Shen

. Introduction of social media to aid active-learning in medical teaching. Interactive Learn Environ, 2022; 30(10):1932–1939.

31.

Shrestha

. Factor analysis as a tool for survey analysis. AJAMS, 2021; 9(1):4–11.

32.

Shkeer

, Awang

. Exploring the items for measuring the marketing information system construct: An exploratory factor analysis. IRMM, 2019; 9(6):87–97.

33.

Zhao

, Hu

, Feng

, et al. Association of cycling with risk of all-cause and cardiovascular disease mortality: A systematic review and dose–response meta-analysis of prospective cohort studies. Sports Med, 2021; 51(7):1439–1448.

34.

Jayedi

, Gohari

, Shab-Bidar

. Daily step count and all-cause mortality: A dose–response meta-analysis of prospective cohort studies. Sports Med, 2022; 52(1):89–99.

35.

Schermelleh-Engel

, Moosbrugger

, Müller

. Evaluating the fit of structural equation models: Tests of significance and descriptive goodness-of-fit measures. Methods Psychol Res, 2003; 8(2):23–74.

36.

L-T

, Bentler

. Fit indices in covariance structure modeling: Sensitivity to underparameterized model misspecification. Psychological Methods, 1998; 3(4):424–453.

37.

Zhang

, Feng

, Peng

, et al. Using structural equation modeling to examine pathways between physical activity and sleep quality among Chinese TikTok users. Int J Environ Res Public Health, 2022; 19(9):5142.

38.

Zhang

, Browning

, Luo

, et al. Can sports cartoon watching in childhood promote adult physical activity and mental health? A pathway analysis in Chinese adults. Heliyon, 2022; 8(5):e09417.

39.

Schober

, Boer

, Schwarte

. Correlation coefficients: Appropriate use and interpretation. Anesth Analg, 2018; 126(5):1763–1768.

40.

Miele

, Carpenter

, Cockerham

, et al. Substance Dependence Severity Scale (SDSS): Reliability and validity of a clinician-administered interview for DSM-IV substance use disorders. Drug Alcohol Depend, 2000; 59(1):63–75.

41.

Hausenblas

, Downs

. How much is too much? The development and validation of the exercise dependence scale. Psychology and Health, 2002; 17(4):387–404.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.11 MB