Abstract
Named entity recognition (NER) is a core task in natural language processing that identifies and classifies entities, such as people, organizations, and locations within text. It has traditionally been applied in areas like text summarization, machine translation, and question answering. In recent years, NER has gained growing importance in health care, where electronic clinical records and online platforms generate large amounts of unstructured medical data. However, applying NER in clinical contexts introduces unique challenges due to the complexity of medical terminology and the need for high accuracy. In this study, we focused on the development of a real-time, low-latency NER system designed for cross-lingual speech-to-text applications, with a particular emphasis on cancer therapy-related clinical records and traditional Chinese medicine (TCM). We explored the integration of deep learning (DL) architectures optimized for low-latency neural processing to extract structured information from multilingual spoken content in medical settings, particularly in multimodal environments. We evaluate DL-based methods and propose a semi-supervised approach that combines TCM-specific corpora with biomedical resources to improve recognition accuracy. The findings provide both a systematic review of current methods and practical insights for building real-time clinical applications that support decision-making and information management in health care.
Keywords
Introduction
Named entity recognition (NER) is a key natural language processing (NLP) task that involves identifying and categorizing predefined semantic types—such as person (P), location (L), organization (O), medical codes, quantities, temporal expressions, and percentages—from unstructured text.1,2 Serving as a foundational tool for information extraction (IE), NER supports a variety of applications including knowledge base construction, machine translation, text understanding, automatic summarization, question answering, and information retrieval. 3
The task of “Named Entity” (NE) involves detecting names, P-L-O, medical codes, quantities, time, and percentage expressions, and categorizing them into predefined classes within unstructured text.4,5 In real-time clinical environments, especially those involving speech-based input such as electronic clinical records (ECRs) or voice-assisted diagnostics, NER systems must function with both high accuracy and low latency. Deep learning (DL) techniques have significantly enhanced NER performance by effectively modeling complex linguistic patterns through nonlinear activation functions. These advancements are especially critical in cancer treatment, where timely and accurate extraction of clinical information from textual systems can aid in diagnosis and therapy planning.
Over the past few years, NER has garnered increasing interest, prompting numerous scientific conferences (e.g., Semantic Evaluation, 6 Computational Natural Language Learning [CoNLL03], 7 and International Research & Exchanges Board [IREX]) to dedicate significant time and effort to this subject. NE is defined as a proper noun serving as a name for something or someone, according to Wacholder et al. 8 A common challenge in NER is semantic ambiguity, where appropriate nouns can carry multiple meanings depending on the context. The term “Named,” as per Kripke, 9 confines the task to entities with rigid designators. Mansouri, as defined in Mansouri et al., 10 encompasses natural kind terms and proper names of identical biological species and substances. Notwithstanding variations in NE classifications, researchers arrived at a consensus regarding the types of NEs to be recognized. NEs are generally classified into generic categories (person/location) and domain-specific categories (enzymes, genes, and proteins). Specifically, this study focuses on both generic and domain-specific NEs within the context of uni-lingual (English language) traditional Chinese medicine (TCM).
The scope of this article does not encompass NER works in multilingual contexts (https://corenlp.run/). A National Cancer Institute-designated comprehensive cancer center analyzed ECR data on 1587 patients with stage I to III breast cancer and colorectal cancer treatment data to determine the frequency of switches and discontinuations of adjuvant endocrine drugs in real-world settings, as well as to examine possible causes of drug switches and discontinuations.11,12 Detection and treatment of diseases at an early stage are beneficial for patients. As well as measuring adverse events, recurrence, and death, as well as drug switch and discontinuation rates, we tracked adverse events and menopause status as potential factors. NLP is increasingly prevalent in the field of TCM analysis, aiding in the automated processing and examination of ECR. For training and testing automated NER systems, a human-annotated corpus of named entities is essential.
In the realm of ECRs in English, various systems and databases contribute to NER, including medical terminology systems, clinical oncology systems, and medical databases like DrugBank. Notable examples encompass the Unified Medical Language System (UMLS), Clinical Ontology System (SNOMED CT), and others. 13 Similarly, the Traditional Chinese Medicine Language System (TCMLS) serves as a standard for TCM terminology, with specific resources developed in China to facilitate NER tasks. 14 Presently, certain entity types within Chinese clinical records have been annotated, encompassing medications, anatomy, treatments, tests, symptoms, body parts, temporary words, drugs, and operations.15–19 In the biomedical field, studies have showcased the efficacy of deep neural networks in enhancing NER, as evidenced by endeavors like GeneView, 20 BioNER, 21 BioCreative, 22 Hunner, 23 GENIA, 24 and PubTator. 25 This study’s focus lies in detecting entries for TCM-NER within clinical records pertinent to therapies in cancer treatment using Chinese herbs. NER methodologies encompass four key streams of techniques: Rule-based, Unsupervised learning, Feature-based supervised learning, and DL-based approaches, each of which is succinctly elucidated in the section “TCM-NER Background”.
In addition, this study provides a comprehensive review and taxonomy of DL-based NER techniques for TCM within cancer-related clinical records, with particular emphasis on therapies using Chinese herbs.
26
Previous works have mainly concentrated on general biomedical NER or single aspects of input representations, whereas our study extends the scope by also proposing a semi-supervised methodology for extracting clinical terms from TCM corpora and integrating biomedical resources, thereby offering broader support for future research and applications. Following are our main contributions.
Providing researchers and practitioners with an up-to-date understanding of DL techniques in NER through an intensive review of their applications in therapies in cancer treatment in clinical records. We are introducing a semi-supervised methodology to extract clinical terms related to TCM using feature words as a foundation. This approach involves amalgamating TCM-NER techniques with biomedical corpora to offer valuable resources for the broader research community. Presenting a comprehensive survey on DL techniques for TCM-NER entries in ECRs for treating cancers using Chinese herbs in online internet. We proposed a new taxonomy; thus, we extracted multimodal environments such as Chinese medicine clinical terms from a TCM-related corpus, organized DL-based TCM-NER methods and the distribution of input representations, context, and tag (encoder/decoder).
The remainder of this article is structured as follows: The second section reviews the relevant literature and background; the third section describes the methodology; the fourth and fifth sections present the results and implications, respectively; and the final section concludes the study.
TCM- NER Background
TCM-NER
Medical relationships and terms between medical NEs in medical and prescriptive data are the main information components of TCM medical knowledge base systems. For examples, NEs are in a general and biomedical domain or clinical records such as P-L-O and drug, protein, disease names, and gene, respectively. In NER, named entities in the clinical text records are located and classified into predefined categories. TCM has demonstrated efficacy in treating and preventing colorectal cancer through thousands of years of practice using Chinese herbs as the main source.27–31 This study’s main focus in this research is to detect the TCM-NER in clinical records while treating various cancer diseases (lung cancer). TCM-NER also classifies the benefits of low adverse reactions and toxicity as natural medicines. About 400 herbs in the Chinese Herbal register are used to treat lung, liver, brain, stomach, prostate, and breast cancer. This natural medicine represents natural medicine with over 2000 herbs registered.32,33
Formally, input individual sample text is distributed into a sequence of tokens

NER system recognizes three named entities from the given single sentence. NER, named entity recognition.
TCM-NER database
Below, we provide a summary of extensively utilized datasets in monolingual tools (Chinese) for NER tasks within ECR focusing on cancer treatment therapies. These datasets were predominantly employed for coarse-grained NER tasks, primarily involving the annotation of entity types in news articles prior to 2005. Notably, the tag types employed in MUC-II to MUC-VII exhibit considerable similarity. 34 In addition to these domain-specific datasets, we present an array of text-based datasets that have been curated with pertinent information, as outlined in Table 1.
List of annotated datasets for mono-lingual NER (English)
Wikipedia.
NER, named entity recognition.
Table 2 summarizes widespread for multilingual NER by medical industry and academia. Consequently, the annotations in multilingual are included in CoNLL03, Urdu (Ur), German (Gr), Chinese (Chi), Arabic (AR), Italian (It), French (Fr), and Dutch (Du). On the internet, you can find a number of tools for NER with pretrained models available.
List of annotated datasets for bilingual NER
aWikipedia.
NER, named entity recognition.
Dataset
The central focus of the content encompasses five essential terms: clinical manifestation, syndrome, disease, treatment law, and herbs. Additionally, ancillary information relevant to TCM clinical treatment records, including the patient’s age and name, is included, as depicted in Table 3. A total of three tag types of entities with TCM term categories have been annotated in the texts, as shown in Table 4.
An example of the TCM clinical records for a patient with various symptoms
An herb is a plant with a soft, nonwoody stem that dies down at the end of the growing season.
bPDB ID: 2XU1.
TCM, traditional Chinese medicine.
The complete statistics of the ACE 004 dataset
aAutomatic Content Extraction dataset: developed by the U.S. National Institute of Standards and Technology (NIST).
For this study, we selected the ACE 2004 dataset as the foundation for training and evaluation. This dataset is widely recognized in the NLP community and provides a reliable benchmark for tasks such as event extraction, relation detection, and entity recognition. Its balanced and diverse linguistic coverage makes it suitable for testing generalization across different domains. We focus exclusively on the Chinese portion of the dataset, as it offers a sufficiently large sample size (approximately 105k instances) to support robust model training and evaluation. The 80:20 split between training and testing ensures a fair distribution for both learning and performance assessment, as shown in Table 5.
Five types of TCM clinical terms label types (BIO-tags)
TCM, traditional Chinese medicine.
Moreover, the dataset’s well-annotated structure, inclusion in prior research, and frequent adoption as a standard benchmark justify its selection, as it allows for comparability with existing approaches and ensures that results can be interpreted within a broader research context. The dataset annotations were validated by a domain expert holding a PhD, which ensured the reliability and accuracy of the entity labels for TCM clinical records.
Cancer recurrences survival
Colorectal cancer is a type of cancer that develops from the colon or rectum (parts of the large intestine). Cells that grow abnormally and invade or spread to other parts of the body are called cancers. 35 It causes people to feel tired all the time, have blood in their stool, and have changes in their bowel movements. These signs and symptoms harm the people’s health. 36 Consequently, early diagnosis has become one of the research directions of the future. It is helpful for patients if they are detected early and treated early. According to Grundner et al., 37 an algorithm based on the genetic markers of Colorectal patients was trained using the genetic markers of colorectal patients. In addition to predicting overall survival (OS), death-free survival (DFS), and recurrences survival rates, their model can predict other clinical outcomes. Based on clinical data, Peng et al. 38 developed a prognostic artificial neural network scoring system for stage IIA Colorectal patients which predicts OS and DFS during the next 10 years, as shown in Table 6.
Artificial intelligence and predicted colorectal cancer
TCM, traditional Chinese medicine.
Therapies in Cancer Treatment Method
The proposed framework is shown in Figure 2. It has become increasingly common for TCM-NER models based on DL to achieve SOTA (state-of-the-art) results in recent years.41,42 It is beneficial to use DL for automatic discovery of hidden features compared with feature-based approaches. In the next section, we will briefly discuss the role of DL that is important for TCM-NER. Knowledge-based TCM-NER systems.

The framework of the proposed method.
Knowledge-based TCM-NER systems rely on semantic similarity rules in the word sense disambiguation (WSD) method. The various rules can be designed for specific purposes based on syntactic-lexical patterns, domain-specific gazetteers,18,43,44 and biomedical WSD. 44 Yin 18 proposed to use knowledge-based Q/A (KBQA) rule for smart customer service. This system generates fuzzy rules which provide automation in matching queries and combines the NER model, as shown in Table 7. In the biomedical domain, the ProMiner was designed by Hanisch et al., 46 which preprocessed synonyms to identify proteins in the dictionary and associate genes in biomedical text. In the next section, we will discuss the details of the proposed methodology.
Knowledge-based Q/A for TCM-NER system previous proposed model and their limitation
TCM, traditional Chinese medicine; NER, named entity recognition.
Research design
Supervised learning approaches
In sequence-labeling and multi-class classification, tasks are involved in supervised learning for TCM-NER, especially within the context of cancer therapies.47–49 Features for each training example are meticulously designed using annotated data samples, which often relate to specific applications such as ultrasound, targeted destruction of tumors, drug delivery systems, and advanced imaging techniques. To recognize similar patterns in previously unseen biomedical data, particularly within ECR focused on cancer therapy, DL and machine learning algorithms are employed.
A supervised TCM-NER system requires feature-level and word-level engineering. There are three kinds of feature vector representation: Boolean, numeric, and nominal 50 where a word is represented. The most widely used supervised NER are morphology, case, and part of speech tag categorized as word-level features.3,51,52 Wikipedia gazetteer and DBpedia gazetteer represent the list lookup features, supporting entity recognition relevant to drug delivery methods and cancer therapies.53–57 Additionally, local syntax and multi-occurrence features within documents and corpora help capture context associated with ultrasound diagnostics, targeted tumor destruction techniques, and imaging modalities.58–62
In supervised NER applications, including those designed to support advanced cancer therapies, algorithms such as support vector machines (SVMs), 39 hidden Markov models (HMMs), 63 conditional random fields (CRFs), 64 maximum entropy or likelihood models, 40 and decision trees 65 have been applied based on these features. Shen et al. 66 proposed an HMM-based biomedical domain NER system to deal with the cascaded phenomenon and a simple algorithm to solve the abbreviation problem, which are critical for accurately annotating records in targeted tumor destruction and drug delivery. In addition, Richman et al. 67 shows Wikipedia format developed a bootstrap TCM-NER, in a multilingual NER system by using generating corpus as the training set.
Major merit in determining the type of NE being proposed uses the category structure inherent to Wikipedia and demonstrates how it can be used with the Wikipedia format to identify possible named entities. Chieu et al. 68 proposed a maximum entropy approach by applying global features; such global features enhanced the performance of the NER, a technique echoed in other maximum entropy-based studies relevant to imaging techniques and cancer treatment modalities.40,69
Alokaili and Menai 70 used an ensemble learning method based on SVM features to overcome the NER problem in a new way. The SVM kernel functions, including linear, radial basis function (RBF), and polynomial kernels, were studied in the context of multiple ensemble learning algorithms, including bagging, boosting, and voting. The documents and sentences are tokenized and each token is classified into eight classes by each classifier in a binary fashion, using BIOES tags. In our study, the sequence of the token is important which is organized as B- (Beginning), I- (Inside), O- (Outside), and E- (End) for PER, ORG, LOC, and MIS tags. In the limitation of SVMs, the neighboring words are not taken into account when predicting entity labels. The context of a CRF is taken into consideration by controlling the sequence of each individual tagging sequence. Settles 71 used a CRF with a range of traditional and new features to enable simultaneous recognition of protein, DNA/RNA, cell line, and type BiNER occurrences. During the BioNLP/NLPBA 2004, experiments were conducted on a training and evaluation set, showing that this approach can achieve an overall F1 measure around 0.70.
In addition to biomedical text,51,72 tweets, 73 and chemical text,74,75 CRF-based NER has been used widely in a variety of domains. The first stage of CRF is responsible for the extraction of named entities using ML method. The second CRF creates a feature set from the statistics of token classes for training a new classifier based on these statistics.
Unsupervised learning approaches
An unsupervised learning system is typically a clustering approach, 19 in which clustering-based named entities extract TCM-NER systems from the clustered-based similar context; thus, the idea is to compute statistics and lexical resources on a large corpus of words in order to infer name mentions. Cucchiarelli et al. 76 presented two unsupervised algorithms, which are based on machine learning and fine-grained evidence, namely syntactic and semantic contextual knowledge, to classify NE. Similarly, KNOWITALL 43 extracts 50,000 class instances and provides the three rules (pattern learning, subclass extraction, and list extraction) from a small set of generic patterns.
Deep learning role in TCM-NER
The traditional machine-learning technique was limited in its capability to process complex raw natural data effectively.
Recent advances in DL have opened avenues for improved data processing, especially in medical imaging and cancer treatment domains.77–80 Techniques such as focused ultrasound, high-intensity focused ultrasound (HIFU), low-intensity pulsed ultrasound (LIPUS), and super resolution ultrasound imaging (SRUI) have significantly benefited from DL approaches. A comprehensive review of DL-based recommendation systems, examining compiled studies along dimensions such as application systems, therapeutic tasks, clinical alertness, purposive properties, and domain prevalence, underscores the advantages of integrating DL methods into these emerging medical technologies. For instance, HIFU and LIPUS, utilized extensively in treating cancers such as gliomas, prostate, pancreatic, liver, breast, and soft tissue sarcoma, are increasingly enhanced by DL-based nonlinear models. The DL-based approaches facilitate capturing intricate and nonlinear relationships in clinical data through activation functions, thus improving diagnostic and therapeutic outcomes. Furthermore, SRUI, empowered by DL techniques, demonstrates notable improvement in imaging resolution and tumor detection sensitivity, showcasing a compelling initial sensitivity rate of 94% in ongoing clinical trials. Overall, the adoption of DL models significantly enhances the analytical power and efficacy of ultrasound-based medical treatments and diagnostic systems.
Learning input distributed representations
One-hot vector representations are straightforward ways to represent words. An orthogonal representation is found when two different words are represented in one-hot vector space.81,82 A distributed representation uses low-dimensional latent feature vectors, similar to a learning representation. With distributed representations, meaning and syntactic properties of words can be automatically captured, despite the fact that they are not explicitly present in NER’s input, as shown in Figure 3.

The DL-based TCM-NER. The input sentence is represented to encode and decode for predicting BIOE tags, which consists of distributed representations for output, encoder, and decoder. DL, deep learning; TCM, traditional Chinese medicine; NER, named entity recognition.
The architecture of a DL-based NER model for TCM text uses the BIOE tagging scheme. The input sentence is first transformed into multilevel embeddings, including pretrained word vectors (WE), character embeddings (CE), and POS tags. These representations are processed by neural encoders such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), or transformers, with a bidirectional long short-term memory (Bi-LSTM) capturing bidirectional context. Each token is then assigned a label under the BIOE scheme—marking beginnings (B), insides (I), ends (E), or nonentities (O).
For example, “Dujiangyan City, Sichuan Province” is recognized as a location (B-Loc to E-Loc), while “Good Agricultural Practice of Medicinal Plants and Animals” is tagged as an organization (B-Org to E-Org), as shown in Figure 4. Finally, a SoftMax layer predicts the most probable tag sequence, enabling accurate entity recognition.

RNN-based and transformer-based models for mining character-level and word-level representation for a word and sentence level. The three kinds of embedding with BIOE tags are also connected and linked to SoftMax. RNN, recurrent neural network.
As a next step, we will discuss the use of three types of distributed representations in NER models: character-level, word-level, and hybrid representation. The modeling of language as distributions over characters has become possible with recent advances in RNNs. NER models can either be trained using pretrained word embeddings or fine-tuned by using them as input. Several studies 83 incorporated character-based word representations, acquired through end-to-end neural models, as opposed to using solely word representations. Character-based models capture morpheme-level patterns and deduce representations for previously unseen words. In a similar vein, Xiong et al. 45 introduced an improved semantically enhanced text matching approach for open health NLP in clinical data. Additionally, Luo et al. 84 proposed a hierarchical contextualized representation model, catering to sentence-level and document-level contexts. Peters et al. 85 put forward Embeddings from Language Models (ELMO) word representation models, incorporating two-layer bidirectional language model (LM) with character-level convolutions.
Zhai et al. 86 proposed LSTMs and CNNs in BiLSTM-CRF models to recognize chemical and disease entities (NER) using character embeddings. With either type of character-level embedding ensemble with the BiLSTM-CRF models, the corpus Chemical–Disease Relation (CDR) and BioCreative demonstrates comparable performance over the SOTA, and B-NER (Biomedical Named Entity Recognition) to reduce the training complexity. Kuru 83 proposed stacked bidirectional LSTMs which receive the input character and output tag likelihood for each character. An automated Viterbi decoder converts these probabilities into consistently NE tags.
Kim et al. 87 proposed Korean NER model improved to include sub-character-level representation, jamo, enabling more effective capture of the language’s unique syntactic structure and rich morphological variations. Bojanowski et al. 88 propose two alternatives to the classical RNN model. First, character-level representation is conditioned on previous word representation. Alternatively, the output probability can be conditioned on the character history. It is generally pretrained using unsupervised algorithms like continuous bag-of-word and continuous skip-gram models on large text collections89–91 to implement word-level representations. 92 A recent study 93 has shown the importance of NER in DL techniques. Lin et al. 94 present a hierarchical attention neural semi-Markov conditional random fields (semi-CRF) model to control sequence labeling by using a segmental recurrent neural network (ASRNN). Wang et al. 95 proposed an integrated framework that leverages word embedding clustering along with CNNs for expanding short texts. Qiu et al. 96 proposed a CRF with a residual dilated (RD) CNN (RD-CNN-CRF). Additionally, fastText 97 and GloVe 98 are widely used in NER applications. An example of an embedded word in a sentence of ECR instance is shown in Figure 5.

The architecture of RNN-based context encoder for input representation, computing two hidden state features for each node. RNN, recurrent neural network.
To be more specific, in the proposed model, the RoBERTa (Robustly Optimized Bidirectional Encoder Representations from Transformers approach) is an ensemble with LSTM to examine sentiments over time. As a result of RoBERTa, words are mapped into a compact meaningful word embedding space, and long-distance contextual semantics are captured effectively by the LSTM model. Thus, this survey categorizes these context-sensitive language-model embeddings as hybrid representations.
Context encoder architectures
In this section, we examine common context encoder architecture, such as recursive neural network (RNN).
In neural LM, word representations are distributed to reduce the curse of dimensionality, which is a problem with n-gram LM. For semantically similar words, the cosine angles between each word’s feature vectors are high when they are represented as a real-valued feature vector. Formally the word
As a result of calculating the hidden state
Usually, the word embedding input and output (
Numerous sequence labeling tasks have been empirically proven to benefit from language-model-augmented knowledge. Peters et al. 100 posited an LM-augmented sequence tagger (TagLM). The TagLM works on three steps in pretraining word embedding, input unlabeled corpora, extract word and LM for each token, and use it as supervised sequence tagging. Every token in the input sequence is tagged with both pretrained and bidirectional embeddings for sequence labeling, as shown in Figure 6.

The forward and backward-LSTM-CRF architecture model combined with sample of characters and words. LSTM, long short-term memory; CRF, conditional random field.
Experiments and Results
Configuring latency benchmarks
To support the low latency claim of our TCM-NER pipeline, we performed end-to-end inference profiling on both GPU and CPU, explicitly separating tokenization and model forward passes. Unless otherwise specified, experiments were performed using BioLinkBERT-base with a maximum sequence length (L), batch size of 16, Adam optimizer, and learning rate of 1e-3. We report the minimum (best case), median, and 95th percentile (p95) latencies for repeated iterations, as well as the throughput measured in sentences per second. These configurations confirm that the system can achieve sub-(target, e.g., 20 ms) per sentence latency on commodity hardware such as the RTX-1070, while remaining efficient on the CPU, thus enabling real-time or near-real-time biomedical NER applications.
Evaluation metrics
To rigorously evaluate the performance of the proposed models, we used commonly used evaluation metrics in information retrieval and classification tasks, namely accuracy (Acc), precision (Pre), recall (Rec), and F1-score. Precision measures the proportion of correctly identified entities among all entities predicted by the model, thus indicating its ability to reduce false positives. Recall quantifies the proportion of true entities that were successfully identified among all true entities in the dataset, characterizing the model’s sensitivity and its ability to reduce false negatives. Since both precision and recall are important in biomedical text mining 66 —where ignoring a relevant medical entity or introducing spurious terms can be equally problematic—we report the F1-score, which is a composite measure of precision and recall and provides a balanced metric for overall performance. These metrics are particularly suitable for the evaluation of NER systems and text classifiers in biomedical domains, where class distributions are often unbalanced and single metrics such as accuracy can be misleading.
Baseline
LSTMs
86
are a type of RNN designed to capture long-range dependencies in sequential data by mitigating the vanishing gradient problem. CNNs
31
are effective for extracting local patterns and hierarchical features from text sequences through convolution and pooling operations. BiLSTM extends LSTM by processing sequences in both forward and backward directions, enabling the model to capture past and future context simultaneously. CRFs
101
are probabilistic models widely used for structured prediction tasks like sequence labeling, ensuring optimal label dependencies across tokens. SVMs
39
are supervised learning algorithms that classify data by finding the optimal hyperplane that maximizes the margin between classes. BioLinkBERT
102
model was fine-tuned into a prediction model using a question–answering formulation of the task.
Performance comparison
Other than representations at the word and character levels, some studies also include information at the level of a whole sentence. The Span-Based Label Classification (SBLC) is a hybrid model for disease-NER that uses semantic bidirectional LSTMs 103 and CRFs, 101 while semi-markov conditional random field (SCRFs) are hybrid semi-CRF for neural sequence labeling. 104 The hybrid representation combines feature-based representation with DL-based representation. It is possible to improve TCM-NER performance by adding additional information, but it may harm its generality as a result. NER recognition accuracy can be improved using a hybrid DL approach. 105
Yao et al. 106 proposed biomedical NER on a deep neural network model, which contains multiple layers. In each layer, features are abstracted based on features generated by the previous layers. The F-score for fivefold cross-validation was 71.01%, which illustrates that DL can be applied to biomedical NER effectively using just POS (part-of-speech) features. Recently, a hybrid DL model based on transformer robustly optimized BERT (Bidirectional Encoder Representations from Transformers) and BioLinkBERT 102 combines the sequence strength and limitation.
Table 8 compares the performance of several models on the ACE-004 dataset using four standard evaluation metrics: F1-score, precision, recall, and accuracy. The results show a clear progression from traditional machine learning methods to more advanced DL methods. Among the baseline models, LSTMs achieved the lowest overall performance with an F1-score of 32.73%, indicating a limited ability to capture long-range dependencies in the dataset. CRFs performed better with an F1 of 41.46%, benefiting from their strength in sequence labeling but still showing limited precision and recall. Biomedical NER models showed a significant improvement in performance, reaching an F1 of 49.48%, reflecting the benefit of domain adaptation in handling biomedical terminology. SVMs achieved a slightly higher F1 of 50.03% but suffered from lower precision (44.56%), suggesting misclassification issues despite reasonable accuracy. The best performance came from BioLinkBERT, which achieved the highest scores across all metrics, with an F1 of 55.36%, precision of 53.44%, recall of 54.89%, and accuracy of 57.13%. This confirms that transformer-based models, when fine-tuned for domain-specific tasks, outperform both traditional machine learning and earlier DL models by capturing context and semantic nuances more effectively. Overall, the results highlight the importance of using modern transformer architectures such as BioLinkBERT for TCM-NER tasks, as they provide a balance between high precision, accuracy, and recall, and robustness compared with traditional methods.
The evaluation performance of the models based on the ACE-004 datasets
LSTM, long short-term memory; CRF, conditional random field; NER, named entity recognition; SVM, support vector machines.
Error analysis
To illuminate the limitations of the model and guide future improvements, we combined a short error analysis with a qualitative inspection of our TCM-NER sequence. 6 Quantitatively, we report span-level and token-level confusion metrics, per-class precision/recall/F1, and boundary-specific F1 (BIOES) to separate labeling errors from segmentation errors. We further stratify errors by entity length and frequency to reveal the effects of long tails. In terms of ability, we examine representative misclassifications from the ACE-2004 (Chinese) test split: (1) boundary slips in nested mentions (e.g., procedures embedded in imaging phrases), (2) type confusions between semantically close classes (e.g., ORG/hospital vs. ORG treatment institutions; device), (3) abbreviation and synonym variation (e.g., ultrasound/HIFU/LIPUS), and (4) out-of-word herbal/compound names or duplications. We also analyze cross-sentence context failures where the necessary cues for disambiguation are outside the immediate window. This analysis, included in the section “Therapies in Cancer Treatment Method” with examples, shows the main sources of false positives (overextension of span and modality/device conflation) and false negatives (rare, long, or nested entities), providing concrete goals—better span decoders, lexicon enhancements, and context expansion.
Ablation study
The ablation study illustrates how different components of the proposed framework contribute to overall performance, as shown in Figure 7. Removing boundary constraints (“W/o boundary”) leads to moderate drops in F1, precision, recall, and accuracy, highlighting their role in improving entity span detection. Similarly, eliminating filtering (“W/o filter”) and prototype modules (“W/o prototype”) further reduces scores across all metrics, suggesting that these components enhance robustness and reduce noise in predictions. By contrast, the full insight model consistently achieves the highest performance, surpassing all ablated variants and confirming the complementary effect of integrating boundary detection, filtering, and prototype mechanisms. These results demonstrate that each module contributes incrementally, but their combination is critical for achieving state-of-the-art performance in TCM-NER.

Overall performance of the proposed model over multiple strategies.
Discussion
This discussion raises several issues necessary for interpreting our TCM-NER results and shaping future work. 21 First, although our experiments rely on the Chinese portion of ACE-2004—a reliable benchmark for sequence labeling—it is not purpose-built for TCM medical narratives. Its genre mix (e.g., newswire, broadcast) deviates from ECRs, potentially contributing to boundary slips and type confusion when entities exhibit clinical morphology (e.g., natural herbal bioactive constituents, procedure–device composites). 26 To better assess the efficiency of deployment in oncology workflows, future reviews should include TCM-specific corpora annotated with clinical entity schemas (herb, formula, syndrome, treatment law, manifestation).107,108 Second, standard biomedical resources under-represent TCM terminology, polysemy, and classical phrases. Our domain-specific fine-tuning partially addresses this, but two challenges remain: sparse coverage of long-tail herbs and regional transcriptions, and the conceptual gap between classical text and contemporary ECR style. Correcting incremental domain corpora (clinic notes, prescriptions, discharge summaries) and continuously pretraining with word addition (e.g., matching sub-words for large herb-grams) should reduce type confusion and reduce span pruning. Third, since high-quality TCM annotation is expensive, we advocate augmenting our semi-supervised pipeline with TCMLS/UMLS dictionaries, herbal index tables, and KBQA patterns with remote supervision, complemented by confidence-based self-training that emphasizes rare subtypes and ensures no loss of correction or consistency to manage label noise introduced by weak signals.
In addition, data anonymization involves removing or generalizing personally identifiable information, such as names or IDs, so that individual patients cannot be re-identified from clinical records. Entity masking is accomplished by replacing sensitive terms, such as patient names, addresses, or medical record numbers, with placeholder tokens (e.g., <NAME>, <ID>), thereby preserving the structure of the text for analysis. In parallel, edge computing-based local inference ensures privacy by processing clinical text directly on local or hospital servers rather than transferring it to external clouds, keeping sensitive health data within secure boundaries.
In the context of TCM, the application of AI-based data mining techniques has become increasingly necessary to process large amounts of unstructured text data derived from classical medical literature, medical records, and herbal prescriptions. Advanced NLP models, including transformer-based architectures such as BERT and LLaMA (Large Language Model Meta AI), can automatically identify and extract key entities, such as herbs, symptoms, syndromes, and prescriptions, and transform narrative TCM texts into structured and machine-readable formats. To ensure the accuracy and reliability of these AI-based solutions, formal validation and verification techniques can be used. These techniques include rule-based consistency verification, ontology alignment, and constraint-based validation to ensure that the extracted data is consistent with the cognitive framework and diagnostic principles of TCM. Furthermore, model validation and logical reasoning methods can be integrated to verify that AI reasoning processes follow the hierarchical relationships between symptoms, diagnosis, and treatment within TCM. By integrating machine learning with formal methods, the proposed framework ensures semantic accuracy and interpretation, reducing the risk of data inconsistency or misclassification. This approach not only improves the reliability of AI-assisted TCM data mining but also lays a transparent foundation for the development of intelligent, evidence-based decision support systems in the field of integrative medicine.
Implication, Challenges, and Future Directions
Research implications
This study advances the theoretical understanding of NER within TCM by demonstrating how DL models, particularly BioLinkBERT integrated with probabilistic frameworks, can adapt to domain-specific language. Unlike prior approaches that rely heavily on handcrafted features or limited biomedical corpora, our method systematically captures linguistic nuances of TCM terminology, including herbs, syndromes, and treatment laws. From a theoretical perspective, this work contributes to the broader NLP literature by showing that domain adaptation, semi-supervised signals, and hybrid embeddings can collectively enhance entity recognition in low-resource and culturally specific domains. Scholars and researchers in computational linguistics, biomedical informatics, and digital health will benefit from these insights as they provide a foundation for extending language models to other specialized medical subfields.
On the practical side, the findings highlight how TCM-NER can directly support oncology and clinical workflows by extracting critical information—such as herbs, treatment modalities, and patient manifestations—from ECRs. By achieving low-latency performance and reliable entity recognition, the system can be integrated into hospital information systems, decision-support platforms, and digital health applications to assist practitioners in diagnosis, therapy planning, and monitoring. Patients and health care providers stand to benefit through improved efficiency, reduced manual documentation effort, and more accurate tracking of cancer therapies involving Chinese herbs. 109 Additionally, policymakers and health IT developers can leverage this framework to design culturally sensitive, AI-driven solutions that align with both modern medical practices and traditional knowledge systems.
Research challenges
The DL-based TCM-NER tag and context representations (decoders/encoders) are discussed in the section “Future Directions” of this article. Various pretrained embeddings for language modeling, such as Google Word2vec, GloVe, and more recent pretrained language models, have been employed in DL-based TCM-NER over the years. This allows us to revisit the challenges and potential directions of TCM-NER without the need for intricate feature engineering. Data annotation remains a significant requirement for supervised TCM-NER systems, including DL-based approaches. Despite technological advancements, annotating data remains a time-intensive and costly endeavor. Maintaining both the quality and consistency of annotations is crucial to tackle language ambiguity. It is important to note that a patient’s historical diagnoses serve as seeds that integrate into the comorbidity network for that patient. This network iteratively expands the latent risk across potential diseases that might affect the patient. Remote diagnoses can pose challenges in distinguishing relevant information from noise. 110
Data-driven approaches often use temporal characteristics from ECRs to predict disease, readmission times, and diagnosis of patients using data from the ECR. Due to the absence of labels in some supervised training courses for some temporal events, as well as providing generic interpretability simultaneously, existing methods are difficult. 111 It is challenging to integrate TCM multi-omics ECR data with low noise because it is one of the major causes of overfitting and poor generalization performance resulting from the integration of multi-omics data. 112 The lack of consistency in the data annotation means that a model trained on one dataset might not work well on another dataset, even if the documents in the two datasets are from the same domain as the documents on the first dataset.
Chinese natural language is used in TCM to record scientific procedures illustrated in the Chinese medical record for COVID-19 prevention. 109 TCM-NER and the dataset used is purely in Chinese language; here two techniques can be applied for the recognition of named entities. TCM-NER systems often handle user-generated text in many application scenarios, such as when a patient is prescribed Chinese herbs such as Cordyceps Sinensis, which have been used in TCM for centuries to treat various ailments. 32 The purported health benefits have caused them to become increasingly popular in smart cities. In ECR data collection, it is preferred to use a narrative that can be flexibly structured to achieve a balance between structured data entry and free-text entry. When working in a labor-intensive clinical environment, TCM practitioners have a real challenge getting high-quality structured data.
Future directions
As the field of modeling languages continues to advance and real-world applications expand, researchers are anticipated to devote greater attention to TCM-NER. TCM-NER is typically applied as a preliminary step before downstream processing. The specific nature of named entities and the presence of nested entities determine the scope of a TCM-NER task. The following research directions for TCM-NER are derived from the studies surveyed in this article. However, a challenge arises when allowing an NE to have multiple NE types, particularly evident in fine-grained NER. TCM-NER methods commonly adopt BIOES tagging for context encoding and decoding.110,111 Detecting NE boundaries should be treated as a distinct task independent of NE types. This decoupling of NE-type classification and boundary detection offers opportunities to share robust boundary detection approaches across domains while utilizing domain-specific strategies for classification.
This aspect of research holds promise for further exploration in the future. As part of a pipeline framework, TCM-NER and entity linking are typically treated as separate tasks. Improving the scalability of neural TCM-NER models remains a challenge. 113 Recommender systems play a crucial role in managing information overload. Information filtering will continue to be integral for personalization as long as recommender systems research remains active. Despite the significant progress and promising outcomes demonstrated by DL, there is still room for improvement, particularly in aspects like accuracy and scalability. While some DL-based TCM-NER models achieve strong performance with substantial computational resources, addressing processing power costs remains a notable concern.
Conclusions
This study provides a comprehensive exploration of real-time NER systems applied to multilingual speech-to-text data in the context of cancer treatment, with a particular focus on TCM in ECRs. With the accelerating integration of artificial intelligence and digital health, particularly in multilingual and multimodal environments, the development of low-latency neural architectures has become critical to ensure timely and accurate IE from spoken medical content. By leveraging DL methods, BIOES-labeled datasets, and encoder–decoder architectures, we demonstrate how modern NER systems can be adapted to handle the complexities of multilingual speech input and the precise terminology of TCM. The study also emphasizes the importance of domain-specific resources and tools to enhance the accuracy and applicability of NER in clinical settings, particularly in situations where speech-based data is increasingly available.
Authors’ Contributions
P.N.A.: Conceptualization, data curation, methodology, resources, software, visualization, writing—review and editing; M.S.A.: Formal analysis, funding acquisition, project administration, resources, supervision, writing—original draft, writing—review and editing; S.M.: Conceptualization, data curation, formal analysis, validation, writing—review and editing; A.U.R.: Conceptualization, data curation, methodology, software, validation, visualization; M.Z.: Formal analysis, investigation, methodology, software, visualization.
Footnotes
Author Disclosure Statement
No competing financial interests exist.
Funding Information
No funding was received for this article.
Ethical Statement
The authors acknowledge that the use of traditional Chinese medicine (TCM) data carries cultural and ethical considerations. All analyses in this study are conducted with respect for culturally embedded medical knowledge, and the authors emphasize that extracted insights are intended solely for research purposes. Responsible use requires safeguarding patient privacy, ensuring proper attribution of traditional knowledge, and avoiding any misrepresentation of TCM practices.
