Abstract
Introduction
According to global cancer statistics in 2002, liver cancer was the sixth most common cancer worldwide and the third most common cause of death. 1 From another study of cancer statistics in 2010 in the United States, the death rate from liver cancer increased in both men and women between 1990–1991 and 2006. 2 Hepatocellular carcinoma (HCC), as the most common primary liver cancer, 3 –5 has been the leading cause of cancer death in Taiwan since 1984. 6
To provide an efficient way for tracking patients' condition over time and to facilitate the collection of clinical features relevant to HCC from narrative reports, it is critical to develop an efficient method for smoothly analyzing the clinical data accumulated in narrative reports, that is, the process of information extraction (IE) is required.
During the past two decades, researchers have successfully applied natural language processing to the medical narrative reports for case finding, case identifying, and IE for specific disease and topic (e.g., heart failure, 7,8 diabetes, 9 hypertension, 10 smoking, 11 –13 and cancer 14 –17 ), for identifying and detecting problems 18 and adverse events, 19,20 for acquiring the relations (e.g., disease manifestation-related symptoms and adverse drug events 21 ), and for assessing the quality of care, 22 among others.
The goal of this work was to provide liver cancer patients' statuses from different types of narrative reports for facilitating clinical research in three respects: (1) the single report level identified the major target information and the related attributes from each report, (2) the cross-report temporal level ordered and arranged these extracted results according to temporal order and report type for presenting the variance of each observed target or getting the status before/after specific time point, and (3) the summarized level further provided the answers of whether the patients' statuses met the criteria specified in the clinical research.
Materials and Methods
The proposed method had two phases for tracking patient disease progression over time and for answering whether the patients met the specific criteria of clinical research from the mixture of narrative reports. The overview of this method is shown in Figure 1. In the first phase, the IE module was developed for extracting the concepts related to patient disease from narrative reports. These extracted concepts were further ordered and arranged for grouping the duplicated information and presenting the variances of patient status over time. In the second phase, the rule-based classifier was developed for answering three clinical questions according to the arranged information.

The overview of this method: the information extraction module phase and the rule-based classifier phase. HCC, hepatocellular carcinoma.
Subjects
The narrative reports of 152 patients receiving ultrasound-guided radiofrequency ablation in National Taiwan University Hospital (Taipei, Taiwan) between 2007 and 2009 were involved in the development and evaluation of this method. The development set contained the data of 74 patients who received radiofrequency ablation between 2007 and 2008. This dataset was used to develop the IE module and rule-based classifier. The testing set came from the data of 78 patients who received radiofrequency ablation in 2009. The narrative reports in the testing set were not previously reviewed or analyzed by system developers. This dataset was used to evaluate the proposed method. Different types of reports were produced by different groups of clinicians. For example, radiology reports were produced by about five radiologists specializing in abdomen interpretation. Ultrasound reports were produced by about 15 trained gastroenterology/hepatology specialists. The discharge summaries and admission notes were produced by supervised attending physicians with after-review of trained residents. Operation notes were produced by about five liver surgeons.
The time period for personal reports in the development set ranged from 0.13 years to 10 years. The range was 9.87 years, and the mean±standard deviation age was 5.63±3.41 years. The average number of the reports of each patient was 36. The time period for personal reports in the testing set ranged from 0.15 years to 10 years. The range was 9.85 years, and the mean±standard deviation interval was 5.55±3.58 years. The average number of reports for each patient was 30.
In total, 759 reports were involved in the process of evaluating concept identifications. All patients' personal reports (n=2,351 reports) were involved in the evaluation of the classifications of personal summarized status. The percentage of each report type in different report types are shown as radiology (53%), ultrasound (24%), discharge (14%), pathology (5%), admission (2%), and operation (2%).
In this environment, the computerized patient records include structured data (such as the laboratory test results) and nonstructured data (such as the narrative reports). This study is simply for tracking data from narrative reports but not for structured data sources such as laboratory test results, which could be analyzed statistically.
Concept Model
Concepts
Following the approach proposed by Friedman et al., 23 the liver cancer concept model was identified for assisting in the IE. In the work, the important clinical factors were collected from the existing research, including cancer diagnosis, cancer staging, tumor information, comorbidity diagnosis, treatment, and recurrent status. 24 –26 These clinical factors were regarded as the major concepts in the study.
To identify the related information of the major concept, each major concept had a set of related concepts for specifying its relevant information. Temporal information and report types were the related concepts for tracking the disease progression and confirming the examination sources. Table 1 shows these major concepts and a listing of their related concepts.
The Major Concepts and Their Related Concepts
BCLC, Barcelona Clinic Liver Cancer; HCC, hepatocellular carcinoma; RFA, radiofrequncy ablation; TACE, transcatheter arterial chemoembolization.
Expressions
For identifying the concept from clinical texts, each concept has its corresponding set of regular expressions for matching the different expressions of concepts. The developers previously reviewed the reports in the development set and manually identified the expressions of concepts, including synonyms, typical and atypical abbreviations, common misspellings, etc. Table 2 shows examples of expressions for the concepts.
The Textual Expression Examples of Different Concepts
MR, magnetic resonance; PEI, percutaneous ethanol injection; PMCT, percutaneous microwave coagulation therapy; TACE, transcatheter arterial chemoembolization.
IE module (first phase): identification of major and related concepts
The idea of the hot-spotting technique was used for identifying the locations of major interest concepts (e.g., the expressions of tumor: tumor, lesion, nodule, mass, etc.) in the clinical text. The locations were regarded as the bases for searching the related information (e.g., the location and size) from the surrounding text. After the major concepts were located in the text, their related concepts were captured within the predefined window. The regular expressions defined in the concept model were used to capture the texts and values relevant to the concepts and their corresponding concepts.
Binding of matched concepts
For mapping the relationships among multiple extracted concepts in the same document or even the same sentence, the rule-based scheme was built for binding the relationships among these concepts. A set of rules was defined for reserving and filtering out the bound concepts based on the character properties of these concepts. The value of positive was used for indicating positive correlated information being reserved, and the value of negative was used for indicating the extraneous information being filtered out. For temporal IE, this method focused on extracting temporal information explicitly stated in the reports. Different patterns of temporal information were encoded as regular expressions. The temporal information (year, month, and date) might appear together (e.g., “2011/10/27”) or separately (e.g., “In 2011,…,10/27”). If only the partial temporal concepts are identified (e.g., “month/date”), the method will search surrounding text for finding other temporal information (e.g., “year”) and combine these concepts as one complete temporal concept (e.g., “year/month/date”).
Normalization
The process of normalization was used for the following tasks: (1) normalizing synonyms of the same concepts, (2) normalizing various extracted textual expressions that represented the same concept using full name, typical abbreviations, and atypical abbreviations, (3) standardizing the units for numeric clinical measurement, (4) translating mixture languages (English and Chinese) into English, and (5) standardizing two different types of temporal information, such as Anno Domini (A.D.) and the “year” used in Taiwan starting from 1911 A.D. (e.g., “100/10/27” being the “year” used in Taiwan standardized as “2011/10/27”).
Sorting and grouping
These extracted concepts from the clinical text were sorted according to the temporal information. The group method congregated the clinical findings originally from the same report but appearing and being mentioned in different reports. For example, the status of liver cirrhosis was originally mentioned in the ultrasound report on 2009/5/1, and this finding was also mentioned in the latter admission note and discharge summary. These clinical findings originally from the ultrasound report on 2009/5/1 (relevant to the diagnosis of liver cirrhosis) were grouped together.
Rule-based classifier for summarizing the patient's status (second phase)
The following questions could be answered by the classifier: 1. HCC patient. Did the patient have HCC? 2. First treatment. Was the specific treatment the patient's first treatment for HCC? 3. Recurrent HCC: Was the patient recurrent after the specific treatment?
Examples of evidence for these questions are shown in Table 3. The classifier gave the answer for each question according to the direct or indirect evidence found in the sorted and grouped extracted results. Furthermore, the report type-oriented properties were used during the processing of classification when concepts were extracted from more than one type of reports. For example, for checking the confirmed diagnosis, the method would check the extracted concepts in the order of pathology, radiology, ultrasound, and other types of reports.
The Evidence Examples of a Patient's Summarization Status
CT, computed tomography; HCC, hepatocellular carcinoma; RFA, radiofrequency ablation.
Evaluation
Report annotation
Gold-standard annotations for reports and classifications for patients' personal summarized statuses were determined by annotators. Normally, each report was annotated by two annotators. For disagreements between two annotators, a third annotator adjudicated the annotations and classifications.
Inter-annotator agreement
The inter-annotator agreements (IAAs) for IE and classification were calculated between the annotated results of two annotators. In IE, the F-score was used for measuring IAA.
27
In the tasks of classification, Cohen's kappa
28
–30
was used for measuring IAA because positive and negative cases of patients' classifications could be specified:
where P
a was the relatively observed agreement among the annotators and P
e was the expected agreement due to chance.
where TP was true-positives, FP false-positives, TN true-negatives, and FN false-negatives.
Evaluation metrics for IE module
For the evaluation of the proposed IE module, the precision, recall, and F-score were used. These metrics were frequently used for evaluating the methodology of IE. 31 –33
The definitions are listed as follows:
Evaluation metrics for classification
The following metrics were used for evaluating the rule-based classification results:
Results
Tables 4 and 5 show the IAAs of the concept identification from the total 759 reports and the classification of 78 personal statuses. The human annotators achieved the following IAAs: from 93.26% to 99.04% (F-scores) for six major concepts and temporal and report type concepts and from 71.83% to 79.37% (kappa valuess) for classification for 78 patients' personal statuses.
Inter-annotator Agreement for Information Extraction of Concept Entities
BCLC, Barcelona Clinic Liver Cancer; HCC, hepatocellular carcinoma.
Inter-annotator Agreement for Classification
HCC, hepatocellular carcinoma.
The performance of the IE module achieved an F 1-score from 92.40% to 99.59%, precision from 91.86% to 100%, and recall from 92.94% to 99.19% (Table 6). The performance of the rule-based classifier achieved an accuracy from 96.15% to 100%, PPV from 94.12% to 100%, NPV from 82.35% to 100%, sensitivity from 95.31% to 100%, and specificity from 95.56% to 100% (Table 7).
Results for Evaluating the Information Extraction Module
BCLC, Barcelona Clinic Liver Cancer; HCC, hepatocellular carcinoma.
Results for Evaluating the Rule-Based Classification
HCC, hepatocellular carcinoma; NPV, negative predictive value; PPV, positive predictive value.
Discussion
For resolving the problem of information overload for clinicians to review large amounts of patient records, previous investigations put efforts into the creation of concept and problem-oriented viewing of patient records. 34 –36
This proposed method reduced the problem of information overload 34 to three aspects: the first aspect was to retrieve only reports that included concepts relevant to liver cancer; the second aspect was to further group-duplicated clinical findings from the same original report but mentioned in different reports; and the third aspect was to provide answers and evidence sentences of clinical questions using a rule-based classifier. For checking a patient's personal summarized status, the clinicians could only check the brief answers and evidence sentences instead of reviewing all extracted and grouped results.
The comparison of time spent between human reviewer and automated method (IE module and rule-based classifier) is shown by the following. The human reviewers took an average of 34 min per patient, and the computer analyzed the reports of on average 6 patients per minute. The automated method is a cost savings compared with reviewing all the considerable quantities of patients' narrative reports and extracting information manually.
For each classification question, each patient only has one classification result. In this study, we only have classification results from a total of 78 patients for each classification question. In this case, perhaps a few errors could largely reduce the kappa scores. That might be the reason that the results of human IAAs are in the range of 71–79% in kappa scores. For example, in the classification of HCC patient, only one disagreement classification occurred in the total of 78 classifications, and the kappa score was 79.37%.
In order to provide more flexibility for processing the grammatical/ungrammatical sentences and narrative/tabular textual formats based on the requirements, the idea of the hot-spotting technique was used for matching the interest concepts, and these relationships were bound among matched concepts. If merely large amounts of grammar rules and syntactic rules were used for parsing all sentences, the ungrammatical sentences and tabular textual formats might not be handled well. Therefore, the method in this study used a more flexible way to identify major concepts in sentences and then parsed the surrounding text based on major concepts for identifying their related concepts. Although the flexibility of this method may produce errors in processing some cases, this method has reported good extraction scores and classification scores in this study.
For the reports from a homogeneous group of patients, the concepts relevant to liver cancer were collected by clinicians and could be identified with these methods. For other patient groups, specific concepts might be collected from other specialized clinicians. The architecture of this system could be reused for the reports from other patient groups by replacing the components of this system such as regular expressions and rule classification. Therefore, not only the liver disease patients could be approached by these methods, but also patients with other diseases.
Conclusions
The application was successfully applied to the mixture types of narrative clinical reports with the following characteristics, including, among others, partial mixture of languages, synonyms, typical/atypical abbreviations, two mixture types of temporal information, and grammatical/ungrammatical sentences. The application provided the functionality for normalizing these different expressions and further grouping extracted findings from the same sources for reducing the problem of information overload. For future applications, the designed concept might be the tracking items for other cancer patients. As a consequence, it might be applied to the key extraction for other types of cancer patients. For clinical practice, the system may assist clinicians in understanding a patient's status from large amounts of reports in a more effective way. For clinical research, the system may assist researchers in identifying patients who meet the clinical research eligibility criteria from large amounts of patient sets. For the development of clinical applications, the experience of developing this system may be applied to the design of systems relevant to searching desired data from electronic medical records.
Footnotes
Disclosure Statement
No competing financial interests exist.
