Abstract
Electronic commerce (EC) has become the most critical business activity in the world. China has become the world’s largest market for EC. Over the past three decades, numerous researches have examined the current status of the development of monolingual EC research in specific scenarios. However, the paradigm shift in EC development through the analysis of the dynamic evolution of semantic information has not yet been examined, and the distinctions and connections between multilingual EC studies have not yet been established. This study analyzed 16,207 English and 17,850 Chinese EC-related articles from the Web of Science database and CNKI by combining the BERTopic topic model and SBERT sentence embedding-based similarity computations. The results reveal the distributions of global and local topics in the English and Chinese EC literature, analyze the semantic intricacies of topic convergence and evolution across continuous time, as well as the distinctions and connections between English and Chinese topics. Finally, the evolutionary patterns and life cycle of three crucial English and Chinese topics are explored respectively, including their emergence, development, maturity, and decline. Overall, this study provides a comprehensive overview of EC studies from a topic perspective.
Introduction
Electronic commerce (EC) is the commercial activity of purchasing, selling, shipping, and trading goods, services, or data over the Internet or other networks [1]. EC is the basic medium for the majority of consumer and customer-related activities in today’s market. Businesses have made EC a priority due to the increasing user demand for online services and their capacity to provide a competitive advantage [2].
Recognize the expanding significance of EC research, as evidenced by the continued growth of research interest and scholarly production. Recent studies summarize the level of development of particular dimensions of EC. For example, Hazarika and Mousavi [3] presented the main themes in the context of cross-border EC (CBEC) and provide a framework to explain the main factors contributing to CBEC. Li et al. [4] offered some recommendations for potential uses of EC credit in the future as well as an analytical exploration of how credit risk develops in the sector from an economic perspective. Through a bibliometric analysis of 229 publications published in top information systems journals, Bawack et al. [5] examined the present state of research on artificial intelligence in EC.
It is common knowledge that China’s EC began late, in 1997. However, into the 21st century, China’s Internet economy has advanced significantly in terms of technology, business models, applications and cross-border integration [6]. China has become the largest EC market in the world and the most attractive market for investment since 2013 [7]. According to market research firm Nielsen [8], EC is the most effective channel for entering China.
Academic advancement is intrinsically tied to economic growth, and research on EC in China is expanding daily. Numerous reviews on the level of EC research in China have been published in international journals. For instance, Giuffrida et al. [9] conducted a literature study on cross-border EC logistics in China and discuss the role that Greater China plays in enabling cross-border EC logistics. On the basis of village-level survey data and the Heckit approach, Chao et al. [6] provided empirical evidence of the influence of EC on poverty reduction in rural China. Fang and Fang [10] subdivided the study period into four five-year based on China’s five-year plan for national economic and social development, and a co-word network-based technique identified topic clusters for Chinese EC research in each phase.
Existing research has greatly enriched the body of knowledge on EC, providing useful insights into EC-related studies from a macro perspective. However, the majority of existing studies focus on a limited number of scenarios and have yet to demonstrate a paradigm shift in EC development through a dynamic analysis of semantic information. Moreover, past researches are centered on the study of monolingual publications, and the distinctions and connections between multilingual EC investigations are still unknown. Publications in English represent the international level of EC research, whereas monolingual studies emphasize country-specific research priorities. Therefore, this work will concentrate on answering the following research questions (RQs): RQ1: What are the primary topics discussed in English and Chinese EC research and what topics are discussed at different stages? RQ2: What are the evolutionary patterns of each of the identified English and Chinese EC topics? RQ3: What are the differences and connections between the identified English and Chinese topics at overall and different stages of time? RQ4: What is the developmental dynamic and life cycle of crucial global topics?
Overall, in this study, we provide a framework that combines the BERTopic topic model and SBERT sentence embedding-based similarity computation for revealing the distributions of global and local topics in the English and Chinese EC literature, analyzing the semantic intricacies of topic convergence and evolution across continuous time, as well as the distinctions and connections between English and Chinese topics. Specifically, we employed the BERTopic topic model to extract the global topics from the overall English and Chinese corpus respectively as well as local topics at each time slice. Then, the similarity between topics was calculated based on sentence embedding, which reflects the patterns of monolingual topic evolution and the connections between the English and Chinese topics. Finally, by using the strength of correlations between global topics and local topics on time slices, we obtained the specific development dynamics and life cycles of crucial global EC topics. The main contributions of this study are summarized as follows. This study makes significant contributions to the EC literature by identifying the evolution of research topics in English and Chinese EC literature, as well as highlighting inter-thematic distinctions and connections. It enhances our understanding of the trends in EC research topics and provides valuable insights into the future directions of the field. Additionally, the comparative analysis of English and Chinese literature aids policymakers and practitioners in developing targeted localization strategies when implementing scientific and technical actions. This study introduces a novel and effective framework for conducting multilingual comparative research. Unlike traditional topic modeling approaches, BERTopic has recently emerged as the most promising model for topic extraction, offering more reliable results in this study. Moreover, in previous multilingual studies, researchers relied on BERT-based sentence embeddings [11]. However, it has been proven that BERT-based sentence embeddings lack semantic information, leading to significant differences in sentence vectors for supposedly similar sentences. In contrast, the SBERT technique employed in this study surpasses BERT by providing a superior advantage in capturing semantic similarity of text.
The remainder of the paper is structured as follows: The relevant work is outlined in Section 2. The data and methodology are described in Section 3. In Section 4, the outcomes of the topic evolution in English and Chinese literature were discussed respectively, as well as the connections between multilingual global topics. In addition, the entire lifecycle of three crucial global topics is explored. In Section 5, we summarize and discuss the research results, limitations and future research.
Related work
Topic modeling for topic evolution
Topic modeling is a task that utilizes unsupervised learning approaches with a high degree of flexibility and independence from domain specialists to analyze the knowledge structure of a particular topic [12]. Topic modeling is unquestionably a reliable method for illustrating the general landscape of large-scale textual data and identifying the hidden topics from a batch of documents [13]. In reality, a significant amount of research has been conducted to show the knowledge structure of a domain utilizing topic modeling techniques, including fuzzy research [14], information literacy research [4], 3-D printing technology [15], human resource management [16], artificial intelligence [17].
One of the most commonly used techniques in the work that provides an overview of a particular research area based on topic models is Latent Dirichlet allocation (LDA) [18]. LDA is a bag-of-words-based model that describes each document as a distribution of topics, with each topic described as a probability distribution of words. However, this bag-of-words approaches extract topics based solely on document-word co-occurrence frequency attributes and disregard the contextual semantics of each word. In the era of deep learning, traditional topic modeling techniques that ignore semantic information have been upgraded to the approach based on pre-trained word embeddings. Typically, these algorithms compute a vector representation of the document and a vector representation of the words using a pre-trained model and embed them in the same semantic space. After clustering based on density in the semantic space, topic words are then extracted from each cluster. Obviously, the approach based on pre-trained word embeddings is more concerned with the text’s semantic information than conventional topic models. Examples are Top2Vec [19] and CTM [20].
Later, the research identified the problem of inconsistency between density-based clustering and center-based sampling of topic terms in the previous approach, especially for Top2Vec. Techniques using BERT-based architectures have been developed in topic modeling tasks [21]. The emergence of BERTopic model addressed the inconsistency issue and became the most promising method for topic modeling in recent years [22]. It employs a class-based variant of TF-IDF to extract the topic words for each cluster, demonstrating greater synthesis capabilities on several well-known large datasets. The model employs Sentence-BERT to effectively acquire comprehensive semantic information embeddings and has been used for topic modeling in several applications [23]. Therefore, BERTopic was selected in this study to provide more reliable results for topic extraction.
Similarity analysis in topic evolution
Evolutionary relationships between scientific themes are critical for capturing key domain developments and tracking innovation and knowledge flows [24]. In efforts to track changes in scientific development topics and identify patterns of topic evolution, semantic patterns between words are potential drivers of topic evolution [25]. The semantic relevance of topic words may play a significant role in explaining why thematic evolution occurs [26].
Historically, the traditional method for defining the transformation of topic semantics neglected the fact that topic words held distinct semantic information in different contextual contexts. Such as, the phenomenon of a word departing from one topic and then reappearing in another was termed “word migration” through thematic channels by [27]. Wang et al. [28] employed the distance of two words in the co-word network to evaluate their associations.
With the development of machine learning technology, word embedding-based models were demonstrated in topic evolution studies to determine the semantic similarity between topics. Word embedding techniques obtained from large-scale unlabelled text corpora are able to represent the words into distributed dense semantic low dimensional vector spaces that capture the semantic meaning of each word. In the work of determining topic evolution, words are first represented as vectors with specific dimensions by pre-trained models and then, different methods are used to discover semantic similarities between words. For example, Chen et al. [29] expressed words as a high-dimensional vector space based on the Word2vec model and then compare the cosine similarity of the vectors to examine the features of the topic evolution process. Xie et al. [12] represented the topic vector using an average tensor derived from a pre-trained BERT model of 30 sentences that contained the first 5 topic words, with the final topic similarity equal to the LDA probability value of each topic word multiplied by that value. Ma et al. [30] vectorized each keyword in the topic by the Word2vec model and then obtained the vector representation of the topic by weighted summation.
As mentioned previously, approaches based on pre-trained models are successful at recognizing semantic similarities between words in topics and have produced useful results in investigations of topic evolution. In the background of pre-trained models, this article uses SBERT to perform sentence embedding on sentences containing the target topic words and then employs the average sentence vector to characterize the topic meaning, which enhanced the contextual information contained in the topic words. Universal sentence embedding learning has been widely studied since it is a fundamental task in NLP [31]. SBERT is the most advanced sentence embedding method that can provide more comprehensive and trustworthy semantic information.
Data and method
Data collection and pre-processing
Figure 1 shows the research framework of this paper. To analyze the evolution and association of English and Chinese topics in the field of EC, the dataset used mainly consists of two aspects, English papers and Chinese papers, from Web of Science (WoS) and China National Knowledge Infrastructure (CNKI;www-cnki-net.web.bisu.edu.cn), respectively. We selected the SCI-Expanded and SSCI databases of WoS and conducted a topic search with keywords such as “e-commerce” or “ecommerce” or “electronic commerce” or “e-business” or “electronic business” or “electronic markets”, with papers published up to the date of search, limited to articles and reviews. In this case, the languages of the papers were confined to “English”. CNKI is the largest database of Chinese journals, and we conducted a similar topic search for Chinese papers. To ensure the quality of research results, only articles and reviews published in core journals were downloaded. The data download was completed on October 16, 2022, and all papers with empty titles or abstracts were removed to improve the quality of the topic modeling corpus, resulting in the retention of 16,207 English papers and 17,850 Chinese papers.

The research framework of this paper.
This study requires time-slicing in order to observe the evolution and connections between monolingual and bilingual topics at different stages. By observing the annual publication volume of EC studies, we identified five-time slices, namely, before 2002, 2003–2007, 2008–2012, 2013–2017, and 2018-2022. Table 1 shows the number of papers collected at each stage.
The number of papers in different languages under 5-time slices
The collected documents were utilized to construct the corpus for topic modeling. To prepare the text data for analysis, the title, abstract, and author keyword texts were subjected to the following preprocessing steps. Firstly, numbers, punctuation, and stop words were eliminated. Additionally, the copyright notice present in the abstract section of English articles was removed. Secondly, the jieba tool for classic Chinese word separation was used for lemmatization of Chinese text. For English text, tokenization was performed followed by N-gram extraction using Bigram and Trigram models, and subsequently lemmatization was applied. Words that occurred less than 5 times or more than 50% of the documents were filtered out. Lastly, the lemmatized words were reconstructed into their corresponding sentences. It is noteworthy that lemmatization is not mandatory for the topic modeling approach chosen in this study. However, the aforementioned preprocessing steps contribute to a more refined differentiation of topics by BERTopic. These tasks were executed using the Natural Language Toolkit (NLTK) in Python.
This study is based on the latest topic modeling framework, BERTopic, which uses the advanced deep learning models of NLP to cluster documents into topics. The BERT model, by learning contextual information from vast amounts of text data, can better capture the semantic relationships between words, overcoming the limitation of traditional topic models that overlook the differences in word meanings across different contexts. We followed the following three steps to generate meaningful topic clusters. Wx,c is the importance score of word x in category c. Where tfx,c denotes the occurrence frequency of word x in category c, f
x
denotes the occurrence frequency of word x in all categories, and A denotes the average number of words in each category. The larger the f
x
, the smaller the importance score of x, indicating that an overly common word cannot refer to the category it is in. Finally, MMRC (Maximal Marginal Relevance Coherence) algorithm, a method of redefining document order values, is implemented to improve word coherence and diversity.
Based on the topic modeling results, we obtained the meaningful topics in each time slice as well as the contained topic words and their probabilities. Referring to the topic similarity formula established by Xie et al. [11] the similarity between two topic words is represented by the average tensor of a number of sentences containing that target topic word. We use Sentence-BERT for sentence embedding, and in comparison with previous studies, Sentence-BERT overcomes the drawback that the sentence vector obtained by Bert does not have semantic information.
For each topic word, we first perform sentence embedding for all sentences containing the topic word under the current time slice, and calculate the similarity between all the obtained sentence tensors and the topic word tensor, and finally filter out the top 10 sentence tensors in terms of similarity. The average value of the top 10 sentence tensors is used to characterize the true semantics of the topic word in the current corpus. Integrating average sentence tensors to represent a topic, as opposed to other studies that only consider word vectors of topic words and ignore the situational semantics of words in a text, can accurately interpret the information of a topic in a text and help us better explore the relevance between different topics.
The specific formula for the similarity between topics is shown below:
Descriptive results
BERTopic has the capability to automatically determine the optimal number of topics by considering factors such as the information entropy, importance, and similarity of each topic. Its goal is to strike a balance between these factors and generate a cohesive and meaningful set of topics. To ensure the resulting topic clustering is informative and comprehensive, only topics with a document count exceeding 1% of the total number of documents are retained. Twelve global topics were extracted from each of the English and Chinese datasets using BERTopic. The results of the global topics are displayed in Table 2, which includes the topic id, the suggested topic tags, and the five most probable words. The global topics indicate the hottest topics in EC at a macro level and facilitate the analysis of data regarding the evolution of topics on local time slices. When comparing the global topic labels, the 12 topics that were extracted by topic modeling in Chinese and English varied significantly. In terms of the most popular topics, the Chinese studies focus on the level of EC development in China, while the most popular English studies focus on EC recommendation.
Global topics description in English and Chinese
Global topics description in English and Chinese
In addition, topic extraction is performed on the Chinese and English data under each time slice. By virtue of the BERTopic feature, we only kept the main topics under each time slice (outlier topic-1 was excluded), which was judged based on the number of documents contained in the topic. Table 3 lists the number of English and Chinese topics under the five time slices.
The Number of topics in different languages under 5-time slices
Tables 4 and 5 provide the top five important topic words that characterize the meaning of the topics in English and Chinese, respectively. It can be observed from the two tables that some of the topics have the same or similar topic words in different stages, which indicates that they share similar knowledge in both stages.
For English EC research, prior to 2007, EC agent research was a popular issue in English EC research, corresponding to the global topic 8. From 2008 to 2017, EC recommendation became the most popular global topic and grew in popularity. After 2018, academics began to focus on EC platforms, which did not feature in the global theme, indicating that this is the most recent emerging theme that may be revived in the future. For Chinese EC research, tax policy was a prominent topic before 2002. From 2003 to 2007, the research focus changed to accounting and auditing in the EC, corresponding to the global topic 3. After 2008, the topic of talent development became the subject of study for two lengthy time slices. In the past four years, the topic of power in EC has been exhaustively researched.
Topic description in English
Topic description in Chinese
To clarify the respective evolutionary patterns of English and Chinese topics, in this section the similarity calculation score between topics was used to visualize such patterns. A lower similarity would be detrimental to the generation of topic evolution maps, so we set a threshold for the value of similarity. Based on the experience of existing studies, the threshold was set to the upper quartile of similarity between all topics in two adjacent time periods, which was used to characterize the strong correlation between topics [31]. Connections below this threshold were discarded. We reveal thematic convergence and divergence through topics and their predecessor-descendant linkages, which helps us to uncover potential knowledge flows and knowledge reorganization among EC research topics.
English topic evolution
Figure 2 shows the trend of topic evolution for English topics over five-time spans. Adjacent topic nodes are connected by gray streams to indicate the strength of the correlation between temporal topics. It is worth noting that the topics generated by BERTopic were ranked based on the number of documents they contain, and after removing the similarity links with low thresholds, the original top-ranked major topics will not necessarily contribute major knowledge for the next period. For example, 92-02_topic1(agent, negotiation, software) is the topic with the largest share in the first stage, but it only has a high similarity with 03-07_topic7(ontology, semantic, domain) in the next stage, which indicates that the first stage’s most well-liked research did not become the second stage’s most popular topic.

Topic evolution of the English papers from 1992 to 2022.

Topic evolution of the Chinese papers from 1996 to 2022.
Clearly, in the early stages of EC development, the study of topics 92-02_topic2 and 92-02_topic7 became the knowledge base, by contributing the largest share of similarity. 92-02_topic2 (model, business, process) is a conceptually broad topic that focuses on application programming and model building in EC research. The applications of EC in this period cover many aspects, including inter-organizational systems (IOS), electronic payment systems (EPS), financial services, retailing, and so on [31]. 92-02_topic7 (market, electronic, buyer) focuses on the characteristics of EC in the context of electronic marketplaces, such as assessing the cost differences between traditional marketplaces and EC from the perspective of buyers and sellers [32].
Most of the topics in the 03-07 phase are better carry-over topics from the two phases, they inherit well the knowledge from the previous time period and develop into various sub-themes. 03-07_topic2 and 03-07_topic3 are a direct absorption of previous period topics. The advent of EC brings a huge opportunity for manufacturers and retailers, but also significant challenges [33]. This phase of research focuses on the technological aspects of the Internet, and technical issues are an essential subfield of EC research that can assist users and businesses in solving business difficulties.
Disappearing topics are topics that have had little or no relevance in the subsequent period, such as 08-12_topic8 and 08-12_topic10, which can be considered as a breakpoint in the development of a certain topic. 08-12_topic10 (negotiation, agent, contract) pertaining to a variety of Internet technologies from the previous era, focusing on agents and negotiation. Agents can significantly enhance the services provided to buyers and sellers in EC in order to maximize the benefits for their users [34]. Negotiation capabilities are also critical for B2C EC systems. 08-12_topic2 (scheme, protocol, signature) is still an extension of the technical issue, powered by workflow model in previous topic (03-07_topic4). During this time period, a substantial amount of study centered on the systems viewpoint of auction mechanisms (08-12_topic3), with an emphasis on the types of auction processes, related algorithms, and the effectiveness of auction types. Experiments, simulations, and analysis of completed auction data are typical study methodologies.
Emerging topics are topics that have had little or no relevance in the previous period, such as 08-12_topic_5 (privacy, information, disclosure). This topic points to the support and implementation of EC, including the perception of privacy, trust, and risk, which is related to the health of EC. Privacy concerns can lead to a reduction in willingness to shop socially [35]. As EC continues to expand, it is essential to identify strategies to assist consumers in overcoming their perceptions of uncertainty and risk and to persuade them to continue using EC (Jones and Leonard, 2008).
A topic of interest for the next period is the 13-17_topic5 (tourism, hotel, travel) of EC in the tourism industry. EC is the primary medium for customer- and client-related activities, enabling the entire customer experience, including design, communication, delivery, and evaluation [36]. The rapid and synergistic interplay between e-business and tourism brings about significant changes in the industry. 08-12_topic2 and 08-12_topic9, two topics related to technical issues, were implemented in this period to a practical user level (13-17_topic2), supporting the improvement of security in terms of user ratings. Security and privacy have a direct and significant impact on consumers’ trust in the website [37]. Trust is seen as an important prerequisite for people to adopt e-services [38]. 13-17 topic7 (manufacturer, price, retailer) mainly absorbs the mobile payment (08-12_topic6) and enterprise workflow (08-12_topic7), with a focus on pricing decisions and EC channels. Varied items suffer different levels of competition in the EC channel, necessitating that manufacturers and retailers need distinct pricing tactics [39].
In the last stage, just three topics that are remarkably similar to the previous topics appear on the evolutionary map. 18-22_topic4 (tourism, hotel, tourist) remains focused on EC in the hotel and tourism industry, showing a clear topic evolution path for tourism EC to this point. 18-22_topic5 (sentiment, review, aspect) emphasizes the use of sentiment analysis to solve some key problems in EC, such as categorizing the polarity of product reviews, or personalizing recommendations based on users. Sebastianelli and Tamimi [40] demonstrated a significant interaction between product ratings and the number of reviews, an effect that suggested the importance of online reviews for perceived EC credibility. The credibility of online reviews among potential consumers has a positive effect on consumers’ purchase intentions [41-43]. Therefore, this topic bears a high degree of resemblance to topics such as user privacy and trust. 18-22_topic6 (urban, city, delivery) is considered to be the topic about logistics and transportation that examines the enhancement of EC efficiency and sustainability through last mile logistics. The operational model of EC, the mapping of customer preferences, and pricing policies have remained closely related to it, as shown by the four topics associated with it in the last time slice. In addition, 03-07_topic8 (tourism, tourism industry, travel agencies) is an emerging theme in this period, mainly focusing on the study of tourism EC concept, the current situation and countermeasures of tourism EC, information technology in tourism EC and tourism consumer behavior.
In this section, the analysis is centered on the development of EC study in China, as revealed by the Chinese dataset. Topic words contained in the topics are translated.
Developing countries offer a huge potential market for mobile commerce service providers [44]. Boateng et al. [45] believed that e-commerce has increased access to local markets for companies in developed countries, and that developing country companies may therefore face new competition and challenges. From 1996 to 2002, with the rapid development of telecommunications and network infrastructure, the cost of Internet access dropped dramatically, creating the conditions for the spread of EC in China. The first generation of EC platform companies was born one after another. The largest topic in this period, 96-02_topic1 (taxation, taxation, taxation), focuses on the taxation in public policy, and Chinese scholars have engaged in heated debates on whether and how to tax cross-border EC. From the perspective of the evolution of topic, 96-02_topic2 (economy, network, Internet), 96-02_topic4 (logistics, logistics distribution, system), and 96-02_topic5 (countermeasures, development, EC) contribute the most knowledge base in this period.
The development of EC posed challenges to traditional accounting and auditing, which became the major topic in this period (03-07_topic1). 96-02_topic5 and 03-07_topic1 (accounting, auditing, accounting information) form an interrupted line of thematic evolution, with the former focusing on responses to the development of EC in China and the latter being a subdivision of the former. Most similar to last time’s topics are 03-07 topic9 (model, trust, customer) and 03-07 topic2 (development, China, countermeasures). EC is a new momentum in the development of the digital economy, with the increase in demand for EC talents, how to cultivate EC talents who can adapt to the needs of the industry in the digital economy era has become an urgent problem. 03-07_topic7 (professional, teaching, talent cultivation) worthy of attention in this period, which emphasized the cultivation of EC talents. Chinese colleges have made some useful forays into the training of EC experts as the country’s burgeoning EC industry enters a period of fast development. Many different suggestions have been made by academics regarding the direction of e-business personnel training and the design of an e-business educational system.
Based on the diffusion mechanism among themes, 08-12_topic5 (international trade, trade, facilitation) has the highest similarity with the existing knowledge base in the next period. EC is considered as a new form of international trade, which is not only changing domestic trade but also gradually changing the pattern of international trade and trade patterns. The highest similarity is between 08-12 topic7 (logistics, logistics industry, logistics distribution) and 03-07 topic5 (taxation, taxation, taxation), suggesting that these subjects are frequently discussed together. There has been a period of rapid diversification and intelligence in China’s EC logistics mode, with many different logistics technologies in use and many different logistics models coexisting. Researchers are interested in figuring out how to create novel policies for the online taxation of EC logistics trade.
The most central topic evolution in 2013–2017 is the continuation of logistics issues (13-17_topic8). In China, e-tailers have begun their competition in logistics services, in addition to price [46]. 13-17_topic3 (o2o, model, business model) is an emerging theme in this stage. EC in the development of the model presents a diversified phenomenon, O2O model is a typical representative. 13-17_topic7 (finance, banking, commercial banking) and 08-12_topic5 (international trade, trade, facilitation) have the highest level of similarity, indicating that the integration of Internet finance and EC has deepened and the models have been enriched.
There is a strong emphasis on the growth of Chinese feed businesses in the latter four years of the novelty topic 18-22_topic4 (feed, business, marketing). Large and medium-sized feed businesses alike, in the EC setting, look to internet marketing methods to spread the word about their wares. This topic is most similar to 13-17_topic8, which represents the issue of inadequate logistics capacity in the growth of EC in feed businesses. The most comprehensive examination of the effects of EC on regional economic growth is found in 18-22_topic5 (city, space, development), which looks at the different stages of EC development across Chinese cities and regions. The development of rural EC is a crucial aspect of the later period (beginning in 2008) with a clear development trajectory. The Chinese government aggressively supports a series of precise anti-poverty programs based on EC platforms, with the aim of developing a systematic and complete rural EC ecosystem to boost the economic vitality of rural markets.
Topic similarity relations in different language
This section explores the correlations between global topics in English and Chinese research at a macro level, as well as the correlations between local topics in different time periods, demonstrating the contrasts and links in the evolution of English and Chinese EC research.
As shown in Fig. 4 (a), the top portion of the chord diagram is separated into English topics, while the bottom portion is divided into Chinese topics. We concentrate our comparison study on the Chinese and English topics with the highest degree of similarity at each stage, i.e., the topics with the largest color blocks. English topics EN_Mobile payments and EN_Sentiment analysis in EC have the highest similarity to the Chinese study among the global topics, with EN_Mobile payments and CN_Tourism having the highest similarity scores. Even though the global topics in English and Chinese are vastly different, the text analysis similarity results indicate that there is still a flow of knowledge between the two languages.

Topic similarity relations in English and Chinese research.
From a micro perspective, Fig. 4(b)-(f), show the semantic intersection between the English and Chinese research topics on each time slice. For example, prior to 2002, the popular English topics focused on agents and models of EC, while the Chinese topics emphasized taxation and Internet businesses. The two English topics with the greatest relation to Chinese topics at this time are en_topic4 and en_topic6. The topics en_topic6 and cn_topic8 with the greatest similarity scores may suggest that early Chinese EC research was heavily focused on privacy and security issues.
The similarity between the English and Chinese topics can assist us in identifying the intersection of the content of EC studies in different languages and does not rely on machine translation, like the text-based study methodology we employed. Table 6 presents the top three English and Chinese topics with the highest degree of similarity across five time slices, along with the subject phrases that best explain the correlation.
The top 3 most similar topics in English and Chinese under 5-time slices
The chain topic evolution paths found in the preceding section emphasize semantic changes in local topics; further evolutionary details from a macro perspective, such as the developmental dynamic and life cycle of global topics, also should be investigated. In this part, we compute the correlation between the global topic and the local topic for each time period in order to clarify the evolution of the global topic at variable time slices. Based on the theory of [27], the greater the similarity between global and local topics, the more developed the global topic was during that time period. We picked a correlation greater than the mean (0.5) of all correlations as the weak positive correlation threshold, which, together with medium correlation (0.75) and strong correlation (0.85), describes the dynamic growth of the global topic during the time frame.
Three crucial English global topics
We first analyzed the most prevalent topic in English EC research, EC recommendation, with the evolutionary pattern shown in Fig. 5(1). Recommender systems allow companies to use knowledge about their consumers to find items or develop personalized experiences that may be of interest to them, suited to each customer’s needs [47]. Early on, EC recommendation was tenuously related to model design and e-marketplaces, indicating that the topic was in an early stage of development. During the 03-07 period, a topic centered on the design of recommendation systems emerged as moderately related. Recommendation systems seek for products that are comparable to those that the customer has already purchased, and recommendation algorithms determine the most suited products for the user. During this time span, the algorithm for collaborative filtering was by far the most used. One of the keys to recommendation technology is the use of collaborative filtering algorithms to calculate the similarity between an item and a user. EC recommendation first gained maturity during phases 08-12 and remained the most popular maturity topic during the subsequent phase. By stages 18-22, EC recommendation gradually separates into research on sentiment analysis and travel EC applications, probably indicating a stronger emphasis on consumer sentiment research and recommendation system application scenario development.

The life cycle of three crucial English topics.
Next, we extract the evolutionary pattern of the second most popular global topic, Consumer privacy, which demonstrates a similar evolutionary pattern to EC recommendation, with both being in a state of development or adjustment in the early stages and reaching maturity in the middle and late stages. Figure 5(2) demonstrates this. It is worth mentioning that consumer privacy is not a popular topic at either stage of maturity. It is worth noting that consumer privacy was not a popular topic at either stage of reaching maturity. This suggests that consumer privacy issues are long-standing in EC and that technological change is creating increasingly complex privacy issues.
Finally, we give a case of a topic that matured at a later stage, Travel EC. The synergistic and expedited interaction of EC with hospitality and tourism has resulted in fundamental changes to the industry and increased business climate competition [37]. In the early years, Travel EC maintained a weak relevance to local topics, and it was evident that the topic of travel was not the focus of early study on EC. Not until the 13-17 phase did the topic mature and sustain a medium level of relevance to the topic of mobile payments. The trend toward high cell phone penetration offers the opportunity to shop anytime, anywhere, thus enhancing the benefits of e-commerce [48]. The last period saw the topic reach a higher intensity of maturity and an increase in popularity. In the future, Travel EC is anticipated to do once more advance steadily.
In a similar manner, we analyzed the evolutionary dynamics of three important Chinese global topics. As depicted in Fig. 6, the most popular Chinese topic, Current status of EC development in China, is manifestly an early-focused topic, as it reaches maturity during the first two historical periods. Local topics in phases 08-12 and 13-17 are less relevant to this global topic, and based on the content of the local topics, the emphasis in these two phases is on specific studies rather than a macro understanding of the development of EC research in China. It is not until 18-22 phases that Current status of EC development in China gets a moderately relevant topic that focuses on digital economy research in China, a similar macro topic.

The life cycle of three crucial Chinese topics.
Subsequently, an investigation of the evolutionary patterns of the topic EC taxation reveals an important result, namely that taxation has been a topic of interest for Chinese EC research for a long time. China has encountered issues in terms of tax collection methods, tax jurisdictions, and tax elements as a consequence of the rapid growth of EC Since its appearance, EC taxation has experienced four successive phases of maturity, indicating that taxation issues have been examined throughout EC research. There was a poor association with 18-22_topic5 until recently, which may indicate that taxation research is once again on the move.
Lastly, we examine a representative Chinese topic that matured in the later stages, EC in agriculture, which has also been the focus of Chinese EC research in recent times. In the first stage of development, EC in agriculture maintains a weak correlation with the two local topics until the 08-12 phase when it gains a medium correlation with 08-12_topic2(agricultural products, agricultural, rural), which signals the beginning of the development of EC in agriculture research. In the subsequent phase, there is a medium correlation between EC in agriculture and a logistical topic, 13-17_topic2(logistics, logistics distribution, logistics industry). The development of logistics is crucial to the distribution of agricultural products, which invigorates the growth of EC agriculture. During this time, the most relevant local topic (13-17_topic5) emerges, signifying the maturity of the growth of EC in agriculture. The development of EC in agriculture provides an effective solution to the issue of poor distribution caused by the structural, seasonal, and regional characteristics of agricultural products. In the last stage, EC in agriculture is still mature and will likely continue to expand steadily in the future.
The identification of significant issues and the measurement of their evolutionary dynamics is of tremendous interest to scholars in all disciplines. Numerous earlier researches have offered literature reviews and classification strategies in the field of EC from various lenses. Based on recent work in the area of e-commerce, we first discuss the similarities and differences in research findings regarding topic-based paradigm shifts.
Bawack et al. [5] found that the main research themes of artificial intelligence in English e-commerce are sentiment analysis, trust, etc., so as to provide users with a more personalized recommendation system. Extensive research has also been conducted on how to optimize the performance of algorithms for recommender systems. In addition, privacy, explainability and ethics are also relevant to the study of AI in e-commerce. This is consistent with our findings in the middle of EC development. In the mid-term, EC recommendation has emerged as a prominent topic of EC research, with close ties to model design and e-marketplaces. There is an increasing amount of research on the design and technological keys to recommender systems.
Fang and Fang [11] emphasized that Chinese EC research switched from public policy to talent development and consumer rights gaining popularity in the latter phase, which is similar to our findings. It might be argued that early EC research in China focused on establishing information systems and Internet infrastructure for EC, resolving numerous technical issues, then shifting to enterprise EC applications. With the rise of cross-border EC, international trade became a pillar of EC development in the later period, and was fully integrated into China’s economic and social development. As a result, there are a rising number of research on logistics and distribution, rural EC, consumer rights, and talent development, which has contributed to the substantial development of EC in China.
In addition, this paper offers a complete assessment of the evolution of topics. It summarized that the most popular topics in English are agents and EC models, but the most popular topics in Chinese are taxation and Internet business. The early knowledge origins of English EC research are application programming, model construction, and the characteristics of EC in the context of electronic markets. The origins of knowledge for Chinese EC research include the Internet economy, logistics and distribution, and the Chinese EC policy. Over time, the information from these originating topics has been assimilated and expanded into other topics.
From an evolutionary perspective, both the English and Chinese EC environments have improved over time. The English language literature reflects the international level of EC research. Early EC research centered on EC agents, including the technologies, development tools, and platforms that enable them, as well as the trust challenges involved in picking trading partners via agents. Mid-term, EC recommendation has emerged as a prominent topic of EC research, with close ties to model design and e-marketplaces. More and more studies are being conducted on the design of recommendation systems and technical keys. In the latter phase, sentiment analysis research and travel EC application research increasingly became distinct, reflecting a larger emphasis on consumer sentiment research and recommendation system applications. As IT increasingly penetrates the industry in its own progress and with the support of adaptive business models, traval EC has shown great dynamism by absorbing Internet and IT-mediated business processes, such as consumer-to-consumer (C2C) social activities.
Despite disparities in thematic overviews, English and Chinese EC studies share many similarities in textual meanings based on a comparison of global topics. This characteristic is seen as a degree of developmental resemblance. Given the disparities in economic position and research contexts, it is noteworthy that Chinese EC studies comprise a significant number of China-specific EC studies.
Finally, we explore the developmental and maturational trajectories of crucial global topics in terms of their evolutionary dynamics. For English global topics, two popular topics mature in the medium term, with development tending towards a post-maturity split. Whereas travel EC is expected to continue to grow steadily in the future and is likely to be a popular topic in the future. For Chinese global topics, the most popular global topics matured in the early years and later developed from macro to more specific research topics. It is worth noting that taxation issues have been a long-standing focus of EC research in China, largely throughout its development. Along with significant advancements in logistics technology, the development of rural EC gained maturity at a later stage and is anticipated to increase gradually in the foreseeable future.
Conclusion
Overall, this paper investigates the knowledge distribution, evolutionary patterns, and their differences and connections in the English-Chinese EC literature by combining the BERTopic topic model and SBERT sentence embedding-based similarity calculations. The results of the study answer the four proposed RQs. Firstly, we use BERTopic to identify global and local topics in English and Chinese literature in order to answer RQ1. Then, based on the semantic similarity of local topics, we determine the evolution of English-Chinese EC topics in each of the five time slices and analyze the information communicated by the evolution in order to answer RQ2. RQ3 is answered by calculating the global and local topics between English and Chinese and identifying the local topics with the highest similarity in each time slice, so revealing the connections between English-Chinese EC studies. Using the strength of correlation between global and local topics on the time slices, we were able to determine the exact developmental dynamics of crucial global EC topics, including when they developed and when they reached maturity, thereby answering RQ4.
This study provides valuable insights for researchers and practitioners, facilitating a comprehensive understanding of global and local trends and themes in the field of EC. By identifying the emergence and development of new EC themes, this research enables the prediction of their future trajectories and the anticipation of potential trends and innovations. Consequently, organizations can make informed decisions, develop targeted strategies, and maintain competitiveness in the rapidly evolving EC environment. The technique proposed in this study serves the purpose of analyzing both English and Chinese EC literature, effectively avoiding machine translation errors. This multilingual approach to EC research proves invaluable for conducting cross-cultural market analyses and identifying similarities and differences in consumer behaviors, preferences, and trends across various regions and languages. By understanding these cross-cultural dynamics, international cooperation and knowledge exchange are further promoted as differences and connections between diverse language perspectives are established.
Although we have made every effort to ensure that this study is as exhaustive as possible, there are still some limits. In order to ensure the quality of the articles, the English data sources were restricted to SCI-Expanded or SSCI-indexed journals, which may not contain the entirety of the field’s material. Second, the time span is divided artificially, necessitating the exploration of scientific solutions in the future. Thirdly, it is challenging to accurately characterize all aspects of a topic due to thematic overlap and conceptual misleading language, a prevalent issue in thematic modeling studies. In order to ensure the reliability of the experimental results, this study adheres exactly to the methodology of prior research. Lastly, trends in topic evolution as explained by the semantics of the topic words also merit further exploration in the future.
Author contributions
Xiaorong He: Conceptualization, Methodology, Reviewing and Editing. Anran Fang: Conceptualization, Methodology, Software, Visualization, Writing-Original draft preparation. Dejian Yu: Conceptualization, Reviewing and Editing.
Funding
This manuscript was supported by the Jiangsu Provincial Social Science Foundation (No. 20GLC010) and The Ministry of education of Humanities and Social Science project (19YJC630208).
Data availability
All data included in this study are available upon request by contact with the corresponding author.
Declarations
Conflict of interest
The authors declare that they have no conflicts of interest to report regarding the present study.
Ethics approval
Not applicable.
Consent to participate
Consent.
Consent for publication
Consent.
