Abstract
Keywords
Introduction
The International Statistical Classification of Disease and Related Health Problems, Tenth Revision (ICD-10) has been used in many countries for 30 years (Tanpowpong et al., 2021) as a basic tool for coding mortality and morbidity problems, and can be used to guide policies in the public health environment for specific and impactful changes (Almeida et al., 2020; Barke et al., 2018; Barraza et al., 2020). The development of ICD-11 started in 2007. Following an alpha draft in May 2001 and a beta draft in May 2012, a stable version of the ICD-11 Mortality and Morbidity Statistics (ICD-11 MMS) was released in June 2018 to enable member states to begin planning for implementation. ICD-11 contains over 55,000 codes. It is considerably more complex than ICD-10 (Balhara et al., 2020; Lou et al., 2017; Reed et al., 2018a). It addresses the changing needs of digitised healthcare systems and healthcare innovation and is capable of coding detailed features of a patient’s clinical condition (Krawczyk and Swiecicki, 2020). In 2017, the World Health Organization (WHO) performed a field trial prior to the release of ICD-11, involving 1673 coding experts from 31 countries (including Korea) performing 112,383 code assignments (IFHIMA, 2018). In total, 298 generic line coding and 30 case scenarios were included, and user guidelines, ICD-11 tools and training materials were changed in line with the results of the trial.
Korean participants experienced great difficulty in post-coordination, which was newly introduced into ICD-11 and required the use of multiple codes to describe some clinical concepts (WHO, 2021). Education about the ICD-11 structure was found necessary (Korean Medical Record Association, 2017) and a domestic field trial for diagnoses that are frequent in Korea was also required. Statistics Korea supported two domestic ICD-11 field trials in 2018 and 2019. The study chapters and volume were decided by the needs and annual budget of Statistics Korea. A university conducted the first trial (Choi, 2018) for Chapter 1 (Certain infectious and parasitic diseases), Chapter 2 (Neoplasms), Chapter 3 (Diseases of the blood or blood forming organs) and Chapter 4 (Diseases of the immune system). In 2019, the Korean Health Information Management Association performed the second field trial for Chapter 5 (Endocrine, nutritional or metabolic diseases), Chapter 9 (Diseases of the visual system) and Chapter 10 (Diseases of the ear or mastoid process). The current study reports the results of the second trial, the aim being to find effective ways to increase the accuracy of coding of diagnoses that are frequent in Korea, in order to support a stable transition to ICD-11 by examining coding differences between KCD-7 and ICD-11.
Method
Trial diagnoses selection and standard responses
Six health information managers working in six tertiary hospitals and general hospitals with more than 10 years of clinical coding experience selected target diagnoses according to the following criteria: 1. frequent diseases in six research hospitals; 2. diagnoses with different descriptions of the same concept in ICD-11 and KCD-7; 3. diagnoses present in KCD-7 only; 4. code pairs as a diagnosis – the dagger and asterisk system in KCD-7; 5. chapter transition; 6. changed or removed diagnosis from KCD-7; and 7. different mapping chapters between ICD-11 and KCD-7 for same diagnosis.
For line coding, 56 diagnostic terms were selected, with 17 case scenarios for case coding prepared from patient records, including pathology reports and procedure notes. In all, there were 111 diagnoses. Selected diagnostic terms by chapters are shown in Box 1. Number of selected diagnostic terms by chapter. aOther chapters in line coding: Codes moved to other chapters. bOther chapters in case coding: Codes from other chapters required to complete clinical descriptions.
A code cluster is a group of codes composed by post-coordination. Cluster coding presents the code cluster using a specific syntax to indicate which codes belong together when post-coordination is used. The WHO guidelines for cluster coding were not sufficiently clear to enable their application for the field trial at that time. Standard responses for line coding were developed based on a consensus of six health information managers (HIMs), who consulted relevant doctors for the 20 coding cases on which they did not reach consensus. Standard responses for the case coding were not able to be set.
Field trial
The ICD-11 browser (MMS 2019 April) was provided with Korean translation in a web-based Korean ICD-11 field trial programme performed over a one-week period in July 2019. Participants were required to log in to the programme with their own ID and password. The programme had a Questions and Answers (Q & A) bulletin board, where participants in the programme could be supported in real time, including coding method, case coding and the survey. Diagnostic terms and case scenarios were provided to 27 participants with more than five years of clinical coding experience, who worked in one of 13 tertiary hospitals or general hospitals. Participants used both ICD-11 and KCD-7 to code the same diagnosis. For each code or code cluster, they were required to answer survey questions about the granularity difference between ICD-11 and KCD-7, and the difficulty of ICD-11 clinical coding. Face-to-face training was provided for: ICD-11 features; 2018 field trial results in Korea; introduction and use of the Korean ICD-11 field trial programme; ICD-11 conventions; ICD-11 reference guide; and field trial examples.
Statistical analysis
The accuracy rates of ICD-11 and KCD-7 line coding for each of the 56 diagnostic terms were calculated. Because of difficulty in reaching a consensus on standard responses for case coding, we could not provide accuracy rates for case coding. Percentage agreements for each line coding or case coding case were calculated as the inter-rater agreements of both line coding and case coding in ICD-11 and KCD-7, as outlined below.
The total number of pairs for the percentage agreements was 27C2 (=351). Accuracy rates and percentage agreements were calculated with and without the order of codes in each cluster. R version 4.0.5 was used for statistical analysis.
Ethics approval
This study was determined to be exempted from deliberation in accordance with Article 13, Paragraph 2 of the Enforcement Rule of Bioethics and Safety Act among the Ordinance of the Ministry of Health and Welfare, Korea.
Results
Accuracy of line coding
The average accuracy rate of ICD-11 coding was 74.5% when the order of the codes in a cluster was not considered and 71.6% when order was considered. The accuracy of 14 diagnostic terms was 100%. However, nine terms showed accuracy of less than 30%. The average accuracy rate of KCD-7 coding was 80.2%, slightly higher than that of ICD-11.
Selected line coding examples of ICD-11.
XK6G was in the 2019 MMS frozen version.
Selected line coding examples of KCD-7.
We also discovered some term changes, removal of codes, and change of coding rules. For example, ‘globe’ was used in the Chapter 9 (Diseases of the eye and adnexa) of KCD-7. However, in ICD-11, for some codes, only ‘eyeball’ was used, while ‘globe’ was not provided even in index terms. Codes ‘H04.0 Dacryoadenitis’ and ‘E28.0 Oestrogen Excess’ were not available in ICD-11. For ‘J40 Bronchitis, not specified acute or chronic’, a coding note (‘Bronchitis not specified as acute or chronic in those under 15 years of age can be assumed to be of acute nature and should be classified to J20.-.’) was provided in KCD-7 to clarify the use of the code. However, in ICD-11 ‘CA20.Z Bronchitis, unspecified’, the coding note was removed, which confused participants who followed the coding note in KCD-7.
Percentage agreement of clinical coding
Line coding
The mean percentage agreement of 56 diagnostic terms of line coding was 64.2% in ICD-11, and 72.1% in KCD-7. The diagnostic terms that showed low accuracy rates generally showed low percentage agreements. Post-coordination and cluster formation had negative effects on percentage agreement in ICD-11. For 20 diagnostic terms (22%), the percentage agreement was above 90%, and for two diagnoses (primary hypothyroidism, eyelid apraxia), the percentage agreement of ICD-11 was 20% higher than KCD-7.
Case coding
Percentage agreement of ICD-11 and KCD-7 for each case scenarios (N = 351).
Review of cluster coding in case coding
Participants submitted different numbers of code clusters depending on the way they combined the Principal Diagnosis and other diagnoses. Participants who combined the most diagnoses submitted 39 clusters for 55 diagnoses, whereas others who combined the least diagnosis submitted 54 clusters. Where there was a cause-and-effect relationship between diagnoses, each code should be combined as a cluster. However, the number of code clusters and combinations were different, according to participants’ ways of combine diagnoses or codes. The percentage agreement was low when the cause-and-effect relationship between the Principal Diagnosis and other diagnoses was unclear.
Survey results
Difficulty and granularity of ICD-11 coding, N (%).
No answer was excluded from χ2 analysis
Discussion
Language barrier
The biggest obstacle to the Japanese ICD-11 field trial in 2017 was English proficiency (Sato et al., 2019). Raters had problems understanding ICD-11. About 70% of participants in the additional questionnaire said they were not confident with their English skills. In Korea, we used the ICD-11 browser and Korean translation in the Korean ICD-11 field trial programme. In addition, participants were familiar with the English version of ICD-11 because diagnostic terms in Korean hospitals are usually written in English. Thus, English was not a barrier in the field trial in Korea.
Accuracy and agreement of clinical coding
The average accuracy rate of ICD-11 line coding was 74.5% when the order of the codes in a cluster was not considered, and 71.6% when order sequence was considered, both of which were higher than the accuracy of a field trial for pain (63.2%) (Barke et al., 2021). Mean percentage agreement of line coding was 64.2% in ICD-11 and 72.1% in KCD-7. For case coding, it was 15.3% in ICD-11 and 26.6% in KCD-7. Other studies have showed that the agreement with ICD-11 was better than that with ICD-10 (Barke et al., 2021; Gaebel et al., 2020), although the percentage agreements of case coding were very low, in spite of other studies adopting different inter-rater reliability coefficients. One explanation for these results may have been the selection criteria, which made coding challenging for the participants. Other explanations could include post-coordination and code differences in ICD-11. In the main, we choose a single code from pre-listed disease and disability codes in KCD-7 coding. Post-coordination is a newly introduced capability of ICD-11 and is a generalised form of the ‘dagger and asterisk’ system in ICD-10. ICD-11 post-coordination capability allows more detailed description of diseases (Almeida et al., 2020; Fung et al., 2020). On the other hand, diverse post-coordination may exist according to coders’ selection and combination of stem codes and extension codes, which can decrease the accuracy of ICD-11 clinical coding. The same issues relating to post-coordination have been raised by Germany (Gaebel et al., 2018) and Japan (Nishio et al., 2019). Participants also coded in various ways when the title in a code was changed. They suggested adding terms in KCD-7 to index terms of ICD-11 to keep coding consistent. Where a code in the previous version was removed or changed, or a coding rule was changed, as in J40, a clear instruction or guidance should be prepared. Percentage agreement cannot eliminate agreement by chance (Suen and Lee, 1985) when used in multiple-choice questions with limited number of answers. Other researchers used Krippendorff’s Alpha (Eisele et al., 2019) or intraclass kappa coefficient (Reed et al., 2018b) to assess the agreement of ICD-11 coding. However, we adopted percentage agreement in this study to calculate inter-rater reliability because participants assigned codes using their coding knowledge and each answer was different as determined by the coding cases, which should have minimised agreement by chance.
Survey results of granularity and difficulty
Approximately half of all participants reported that the granularity of ICD-11 was similar to that of KCD-7. Before the trial, we had expected that participants would report that ICD-11 had finer granularity because the number of codes in ICD-11 is much greater than in KCD-7; but that they might also feel that granularity was almost the same because of the limited codes used in the field trial (only three chapters from ICD-11 were included). The Japanese field trial in 2017, which covered all chapters, reported that granularity of ICD-11 was different by chapters and suggested the need for more detailed analysis of the different chapters (Sato et al., 2019).
Most participants in the current study reported that ICD-11 was easy or moderately difficult for the study chapters included. They reported difficult or very difficult in line coding (15.3%) and case coding (10.9%). However, participants may have reported less difficulty with case coding than with line coding because with case coding they were more easily able to locate the correct ICD-11 code using additional information such as chief complaint and present illness.
Suggestions for accurate and reliable coding
Need for detailed guidance or support system for post-coordination and/or cluster coding
Post-coordination was the main reason for the inconsistencies found in ICD-11 coding. Therefore, detailed guidance or support system for post-coordination or clustering is required (Chen et al., 2019). Coding results changed according to the way participants used post-coordination. Most participants had difficulties because there was no detailed guidance for using post-coordination. For example, for the diagnosis ‘severe non-proliferative retinopathy in diabetes mellitus’, 45% of participants selected diabetes as the stem code and diabetic retinopathy was added as post-coordination, whereas 41% put diabetic retinopathy as the stem code. Although the participants selected the same stem code, the way the extensions were added was different. Some put laterality first, others put severity first, while others only put one of two extensions. This was an important reason for inconsistency of coding results.
For case coding, the effect of post-coordination was stronger. The cause-and-effect relationship between diagnoses was an important factor to inform the creation of a cluster code. For example, for case number #2, the Principal Diagnosis was ‘chronic kidney disease, stage 5’ with three other diagnoses ‘Non-insulin-dependent diabetes mellitus with established diabetic nephropathy’, ‘glomerular disorder in diabetes mellitus’, and ‘Acute renal failure’, 20 among 27 participants submitted different cluster codes. Participants had difficulties making clusters because the diseases had cause and effect relationships and they did not know how to combine each disease code in the right order in a cluster (see Table 4). In KCD-7, some codes have the symbol ‘*(asterisk)’ and ‘+(dagger)’ so that coders can easily select related codes. However, in ICD-11, all related codes should be selected in the post-coordination disease list (associated disease, manifestation). Since the provided disease range in the post-coordination list included all diabetic and all body systems, coders needed more time to find the correct manifestation from a long list. At the time of the study, the ICD-11 coding tool had limited functionality to support post-coordination, and the participants did not use it.
Provision of guidelines for changed code or coding rule
Where there were changes in the code or coding rule, many coding specialists who were accustomed to KCD-7, showed confusion in using ICD-11. A special section or booklet with information about changed codes or coding rules might help coders reduce their confusion in coding.
Need for intensive education to provide clinical knowledge
Doctors in Korea do not always record all information needed for coding, such as the cause-and-effect relationship between diseases or severity of disease, and some coders need more clinical knowledge to enable them to raise an efficient query to doctors. For example, the case coding case #2 (see Box 2), the clinical knowledge of participants concerning the relationship between diabetes, chronic kidney disease and acute kidney disease affected the coding result. Greater clinical knowledge was needed for clinical coders to select the stem code and to apply post-coordination in ICD-11, indicating that further study and education in clinical knowledge should be provided to clinical coders before transition to ICD-11. Case coding example: case #2. 1G40: Sepsis without septic shock, GB4Z: Glomerular disease, unspecified, GB60.0: Acute kidney failure stage 1, GB60.Z: Acute kidney failure, GB61.5: Chronic kidney disease, stage 5, GB61.Z: Diabetic nephropathy, 5A11: Non-insulin–dependent diabetes mellitus, 5A14: Diabetes mellitus, type unspecified, 5A2Y: Other specified acute complications of diabetes mellitus, MF83: Glomerular disorder in diabetes mellitus. Slanted bar (/) is used to combine two or more stem codes in cluster coding. Vertical bar (|) separates clusters. Participants were asked to make appropriate clusters to express all diseases. aParticipants made cluster with principal disease, diabetes, and diabetic nephropathy (# 5, 6). bParticipants tried to put codes for each diagnosis expressing cause and effect relationship leading to duplicate coding (# 11- 20).
Limitations
In the absence of an official gold standard for clinical coding, senior health information managers decided on the most suitable coding responses for line coding, with additional information provided by consulting doctors. For case scenarios, no consensus was reached for standard responses. As the study did not allow enough time for training, results did not fully reflect participants’ classification capacity. Appropriate training for clinical knowledge and ICD-11 coding might lead to higher quality coding results in future studies. Results of this study may not be applicable to more recent versions of ICD-11 because ICD-11 is continuously updated. Therefore, the possibility that some issues discussed in this report may have already been updated should be taken into account.
Conclusion
As the ICD-11 contains very detailed codes, participants in this study were easily able to code some terms. However, if there were any changes in terms in ICD-11 or the code used in KCD-7 was not present in ICD-11, participants expressed difficulties. To ensure a smooth transition from KCD-7 to ICD-11, it is necessary to examine the consistency of terms and codes. Lacking detailed instructions on how to add ‘post-coordination’ to stem code was another major factor that made coding difficult. If the WHO could provide more detailed reference guidelines and more efficient training for clinical coding professionals in each country, ICD-11 would be an excellent tool for gathering relevant information about diseases.
Footnotes
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship and/or publication of this article: This research project was funded by Statistics Korea (11-1240000-001341-01). The results of this study are not associated with the Statistics Korea’s official position and the authors are solely responsible for its content and conclusions.
