Abstract
Background
The need for quantitative assessment of interstitial lung involvement on thin-section computed tomography (CT) has arisen in interstitial lung diseases including connective tissue disease (CTD).
Purpose
To evaluate the capability of machine learning (ML)-based CT texture analysis for disease severity and treatment response assessments in comparison with qualitatively assessed thin-section CT for patients with CTD.
Material and Methods
A total of 149 patients with CTD-related ILD (CTD-ILD) underwent initial and follow-up CT scans (total 364 paired serial CT examinations), pulmonary function tests, and serum KL-6 level tests. Based on all follow-up examination results, all paired serial CT examinations were assessed as “Stable” (n = 188), “Worse” (n = 98) and “Improved” (n = 78). Next, quantitative index changes were determined by software, and qualitative disease severity scores were assessed by consensus of two radiologists. To evaluate differences in each quantitative index as well as in disease severity score between paired serial CT examinations, Tukey's honestly significant difference (HSD) test was performed among the three statuses. Stepwise regression analyses were performed to determine changes in each pulmonary functional parameter and all quantitative indexes between paired serial CT scans.
Results
Δ% normal lung, Δ% consolidation, Δ% ground glass opacity, Δ% reticulation, and Δdisease severity score showed significant differences among the three statuses (P < 0.05). All differences in pulmonary functional parameters were significantly affected by Δ% normal lung, Δ% reticulation, and Δ% honeycomb (0.16 ≤r2 ≤0.42; P < 0.05).
Conclusion
ML-based CT texture analysis has better potential than qualitatively assessed thin-section CT for disease severity assessment and treatment response evaluation for CTD-ILD.
Keywords
Introduction
Connective tissue diseases (CTD) are a group of autoimmune diseases, sometimes also referred to as collagen vascular diseases, which consists of systemic sclerosis (SS), rheumatoid arthritis (RA), systemic lupus erythematosus (SLE), Sjögren syndrome, inflammatory myositis including polymyositis (PM), dermatomyositis (DM) and antisynthetase syndrome, and mixed connective tissue diseases (MCTD) (1). Moreover, patients with CTD are frequently affected by disease-related interstitial lung disease (ILD). ILD is a shared name for several types of diffuse parenchymal abnormalities of lung tissue and is characterized by inflammation or progressive scarring and fibrosis of lung parenchyma (1). Since ILD is associated with severe morbidity and early mortality for patients with CTD (2), early diagnosis and quantification of the severity of ILD is important for treatment initiation, monitoring, and prognostication (3,4).
Thin-section computed tomography (CT) of the chest has long been utilized for clinical assessment and physiological measures for diagnosing ILD, assessing disease severity and progression of disease, and assessing efficacy of treatment, and thus plays an important role in the management of patients with CTD. Interstitial abnormalities visible on thin-section CT include reticulation, consolidation, traction bronchiectasis, ground glass opacity (GGO), architectural distortion, volume loss, and honeycombing (5). Currently, evaluation of interstitial lung involvement on thin-section CT is performed by means of reader-based visual assessment or by subjective semi-quantitative scoring methods (6). However, these assessments are subjective and time-consuming while even in a research setting, substantial inter- and intra-observer variability have been reported, which may affect disease management for ILD (7). The need for quantitative assessment of interstitial lung involvement on thin-section CT has therefore arisen, and several investigators have been testing commercially available or proprietary software for CT texture analyses with different methods based on histogram analysis or artificial intelligence (4,8–14). With these systems, thin-section CT findings can be categorized as follows: (i) normal lung, (ii) GGO, (iii) reticulation, (iv) emphysema, (v) nodular lesion, (vi) consolidation, and (vii) honeycomb, and moderate or good agreements among radiologists have been demonstrated (8,10,12).
Under these circumstances, we have developed and investigated the capability of a new system for computer-aided detection and computer-aided volumetry based on non-artificial intelligence (AI) and AI in basic and clinical trials (15–17). For this system, we developed new machine learning (ML)-based software for fully automated CT texture analysis on thin-section chest CT for patients with various pulmonary diseases (18). However, no studies have been reported so far on this system's capability for evaluation of quantitative disease severity and treatment response for patients with CTD. We hypothesized that our newly developed ML-based algorithm has a better potential than qualitatively assessed disease severity for disease severity assessment and treatment response evaluation for patients with CTD. The aim of the present study was thus to prospectively and directly evaluate the capability of ML-based CT texture analysis software for disease severity and treatment response assessments in comparison with qualitatively assessed thin-section CT for patients with CTD.
Material and Methods
Protocol, support, and funding
The training and validation cases in this study were retrospectively obtained, and all test cases were gathered as a prospective study, which was approved by our institutional review board of Kobe University Hospital. Written informed consent was obtained from each individual. This study was financially and technically supported by Canon Medical Systems Corporation, Smoking Research Foundation, and Grants-in-Aid for Scientific Research from the Japanese Ministry of Education, Culture, Sports, Science and Technology (JSTS.KAKEN; No. 18K07675 and No. 20K08037). Three of the authors are employees of Canon Medical Systems (K.A., Y.F., and N.S.) but did not have control over any of the data used in this study.
Participants
Training set
A total of 45 cases were retrospectively collected from our institution between April 2015 and October 2015. The inclusion criteria for test cases were: (i) CT images reconstructed as 1-mm section thickness; (ii) diagnosis of ILD based on updated diagnostic criteria through multidisciplinary involvement of experienced clinical experts, radiologists, and pathologists (19); (iii) diagnosis of COPD based on Global Initiative for Chronic Obstructive Lung Disease (20); (iv) diagnosis of infectious disease based on radiological and microbacterial examinations from sputum or specimens obtained by transbronchial, CT-guided, or video-assisted thoracic surgery; (v) availability of baseline and >1-year follow-up CT scans; (vi) availability of baseline and results of >1-year follow-up pulmonary function tests and microbacterial examinations; (vii) no combined complications such as acute exacerbation in ILD or cardiopulmonary disease patients as well as pneumonia in patients with COPD and ILD; (viii) no evidence of other diseases such as thoracic malignancies, pneumothorax, and pulmonary thromboembolism at the time of baseline CT acquisition; and (ix) no specific treatment for pulmonary diseases at the time of baseline CT acquisition.
In the present study, 17 cases were excluded because they were lost to follow up (n = 5) or due to insufficient follow-up CT studies during a <2-year follow-up period (n = 12). Eventually, 28 training cases were included as the training set for this study. A total of 28 training cases consisting of 15 men (mean age = 66 ± 12 years; age range = 48–85 years) and 13 women (mean age = 63 ± 14 years; age range = 43–87 years) with usual interstitial pneumonia (UIP; n = 5), non-specific interstitial pneumonia (NSIP; n = 5), rheumatoid arthritis (RA; n = 3), dermatomyositis (DM; n = 2), polymyositis (PM; n = 2), mixed connective tissue disease (MCTD; n = 2), COPD (n = 2), non-tuberculous mycobacteriosis infection (NTM; n = 2), tuberculosis (TB; n = 2), sarcoidosis (n = 2), and mycoplasma pneumonia (n = 1). Data from a total of 11,250 slices obtained from these patients were used as the training dataset.
Validation set
A total of 27 cases were retrospectively collected as the validation set from our institution between November 2015 and December 2015, by applying the same inclusion criteria for selecting the test cases. Ten cases were excluded because they were lost to follow up (n = 6) or due to insufficient follow-up CT studies during a <2-year follow-up period (n = 4). Eventually, 17 cases with ten thin-section CT data were included as the validation set for this study. For the validation set, 17 cases were selected, consisting of ten men (mean age = 63 ± 11 years; age range = 48–71 years) and seven women (mean age = 61 ± 12 years; age range = 45–76 years) with UIP (n = 3), NSIP (n = 3), COPD (n = 3), NTM (n = 3), RA (n = 2), TB (n = 2), and sarcoidosis (n = 1). Data from a total of 6875 slices obtained from these patients were used as validation dataset.
Test set
Between January 2016 and December 2017, a total of 206 patients with CTD-related ILD (CTD-ILD) who met the various inclusion criteria and had been prospectively examined at our institution by means of unenhanced initial and follow-up CT scans using 320- and 64-detector row CT systems and had undergone pulmonary function and serum krebs von den Lungen-6 (KL-6) tests were originally included in this study. The inclusion criteria for this study were: (i) diagnosis of CTD-ILD based on updated diagnostic criteria through multidisciplinary involvement of experienced clinical experts, radiologists, and pathologists (21); (ii) availability of baseline and at least two years of follow-up CT examinations; (iii) availability of baseline and at least two years of follow-up pulmonary function tests (PFTs); (iv) availability of baseline and at least two years of follow-up laboratory tests such as IL-6; and (v) no combined complications such as pneumonia or acute exacerbation at the time of initial CT acquisition. Next, 57 cases were excluded due to specific treatment for pneumonia or acute exacerbation at the time of had not baseline CT acquisition (n = 28), insufficient follow-up studies (n = 19), or because they were lost to follow up (n = 10). Finally, 149 consecutive patients with CTD-ILD consisting of 56 men (mean age = 60 ± 11 years; age range = 39–83 years) and 93 women (mean age = 58 ± 9 years; age range = 30–85 years) were included as test cases. During at least two years of follow-up, each patient underwent ≥2 follow-up CT examinations after initial CT examination. Therefore, the total number of paired serial CT scans in this study was determined to be 364 examinations. For the test set, 149 consecutive patients with CTD-ILD were included, consisting of 54 cases of progressive scleroderma (PSS), 34 cases of dermatomyositis (DM), 29 cases of rheumatoid arthritis (RA), 14 cases of polymyositis (PM), seven cases of mixed connective tissue disease (MCTD), seven cases of anti-neutrophil cytoplasmic antibody (ANCA)-associated vasculitis, and four cases of Sjögren syndrome. After integration of clinical, laboratory, and paired serial CT data in each patient, all paired serial CT examinations (n = 364) were categorized as “Stable” (n = 188), “Worse” (n = 98), or “Improved” (n = 78) groups by a multidisciplinary team consisting of board-certified chest radiologists, pulmonologists, and immunologists with >10 years of experience who were not included for the image analyses in the study.
Thin-section CT examination
All CT data were obtained with a 320-detector row CT scanner (Aquilion ONE; Canon Medical Systems, Otawara, Tochigi, Japan) and two 64-detector row CT scanners (Aquilion 64; Canon Medical Systems). In each patient, baseline and follow-up CT examinations were performed with the same scanner in this study. For the 320-detector row CT, wide volumetric scanning (also known as step-and-shoot scanning) with the following parameters was used: collimation = 320 × 0.5 mm; 270 mA; 120 kVp; gantry rotation time = 0.5 s; matrix = 512 × 512, field of view (FOV) = 300–350 mm; 3 steps. For the 64-detector row CT, helical scanning with the following parameters was used: collimation = 64 × 0.5 mm; 270 mA; 120 kVp; beam pitch = 0.83; gantry rotation time = 0.5 s; matrix = 512 × 512; and FOV = 300–350 mm. All thin-section CT data were then reconstructed with the filtered back projection method in contiguous section thicknesses of 1 mm and used for generating the reconstruction standard lung kernel (FC51, Canon).
Pulmonary functional test
Pulmonary function testing was performed with an automatic spirometer (System 9, Minato Ikagaku) and according to American Thoracic Society standards (22,23). We measured forced vital capacity (FVC), forced expiratory volume in 1 s (FEV1), vital capacity (VC), lung volume, and diffusing capacity of the lung for carbon monoxide (DLCO), and then evaluated the ratio of FEV1 to FVC (FEV1/FVC%), the percentage predicted VC (%VC), and the percentage predicted DLCO corrected for alveolar volume (%DLCO/Va) according to the ATS-ERS guidelines (24). DLCO was estimated for 10-s breath-holding.
ML-based texture analysis software algorithm
Details of the ML-based texture analysis software was described in the previous literature (18), and this section is briefly mentioned the algorithm. Fig. 1 shows a schematic diagram of the ML-based texture analysis software used in this study. Given a set of chest CT images as input, it classifies every single voxel into seven radiological texture patterns: (i) normal lung; (ii) GGO; (iii) reticulation; (iv) emphysema; (v) nodular lesion; (vi) consolidation; and (vii) honeycomb. It utilizes three-dimensional (3D) ML methods described in the following sections. It consists of four stages: (i) preprocessing; (ii) feature extraction; (iii) classification; and (iv) postprocessing.

Flow chart of machine learning for computed tomography texture analysis. Flow chart of the proposed method. At the feature extraction stage, likelihood of each texture pattern's occurrence on every voxel is calculated. At the classification stage, probability of each texture pattern is calculated from the features extracted on each voxel. Finally, each voxel is labeled with a specific texture pattern showing the maximum posterior probability.
Preprocessing
Given a set of chest CT images as input, software converts it to an isotropic volume with 0.6-mm spacing. The lung region in the volume is then automatically segmented by means of the segmentation method implemented in the Lung Nodule Analysis application of Vitrea (Vital Images, Inc., Minnetonka, MN, USA). Only the voxels in the resulting lung mask are then processed in the following steps.
Feature extraction
In the feature extraction stage, the likelihood of each texture pattern on each voxel is calculated. For the six texture patterns not including nodular, the extremely randomized trees (ERT) method (25) is used to calculate the likelihoods. ERT is a tree-based ensemble method for supervised classification and randomly selects the cut-point. To capture texture patterns on multiple view scales, we use three sizes of 3D region of interests as inputs for ERT. For the nodular texture pattern, we employ the radial structure tensor (RST) (26) to calculate the likelihood of occurrence. RST is a filter that enhances blob-like structures by correlating position and direction of the gradient vectors in a local neighborhood. This is followed by applying average pooling with multiple local window sizes to the extracted features by means of both ERT and RST.
Classification
In the classification stage, probability of each texture pattern is calculated from the extracted features for each voxel by using the multi-class support vector machine (SVM) (27), which is a set of supervised learning methods used for classification. The output probabilities of SVM are then corrected by using conditional random field (CRF) (28), which is a discriminative model for machine learning. It provides optimal probabilities for a whole volume by considering differences in both location and voxel values between adjacent voxels.
Postprocessing
Finally, each voxel is labeled with a specific texture pattern with the maximum posterior probability. The voxels with a Hounsfield unit (HU) below −950 are relabeled with emphysema. Note that the voxels with a honeycomb label are excluded for this simple thresholding, as this texture pattern may contain voxels below −950 HU.
Image analysis
Quantitative assessment of disease severity using ML-based CT texture analysis software
All measurements by means of the newly developed ML-based CT texture analysis software were performed by a board-certified radiologist (T.Y.) using a commercially available workstation (Vitrea; Vital Images, Inc., Minnetonka, MN, USA), although the software used in this study was proprietary software (CT Lung Parenchyma Analysis, Prototype v. 3) provided by Canon Medical Systems and installed on the same workstation provided by Vitrea. With our proprietary software, thin-section CT findings for each patient with ILD were divided into the following seven different radiological findings-based categories derived from the glossary terms for thoracic imaging published by the Fleischner Society (29): (i) normal lung; (ii) emphysema; (iii) nodular lesion; (iv) consolidation; (v) GGO; (vi) reticulation; and (vii) honeycomb. Each lesion volume, normalized by the lung volume determined from CT data, was then automatically calculated. All radiological finding volumes (%normal lung, % emphysema, %nodular lesion, %consolidation, %GGO, %reticulation, and %honeycomb) were determined as a percentage of total lung volume. In addition, volume changes (Δs) between paired serial CT scans of each lesion were also automatically calculated for each patient.
Qualitative disease severity assessment based on thin-section CT findings
To evaluate the disease severity of ILD in patients with CTD, qualitatively assessed disease severity was independently scored by two chest radiologists (Y.O. and D.T.) with, 24 and 26 years of experience, respectively, with a picture archiving and communication system (PACS) (Shade Quest; Yokogawa Electric, Tokyo, Japan). Both reviewers assessed disease severity without having access to any information about disease severity based on serum KL-6 levels or to pulmonary function test results for any of the individuals.
As previously described (29–31), visual evaluation involved both a severity and an extent score. The former is based on assessment of the parenchymal abnormalities assumed to reflect increasing severity of lung involvement: GGO (score = 1); irregular pleural margins (score = 2); septal and subpleural lines (score = 3); honeycombing (score = 4); and subpleural cysts (score = 5) (30–32). The overall severity score thus ranged from 0 (no abnormality) to 15 (all abnormalities present). The extent score was obtained by counting the bronchial pulmonary segments in which any of the previous abnormalities are observed: involvement of 1–3 segments rated a score of 1; of 4–9 segments a score of 2; and of >9 segments a score of 3. The overall extent score thus ranged from 0 (no abnormality in any segment) to 15 (all five abnormalities in more than nine segments). The severity and extent scores were then added to obtain a total disease score (range = 0–30) (29–31). In this study, each investigator evaluated total disease severity score at two different times. The final value for each investigator was then determined as the average value of the two assessments. In addition, change in total disease severity (Δ) between the paired serial CT scans was also calculated for each patient.
Statistical analysis
To determine reproducibility of ML assessment of all differences in radiologically detected findings between the first and second measurements with ML-based CT texture analysis software on CT examination at each time point, correlation and reproducibility of each finding concerning differences between the two measurements were statistically assessed by means of Pearson's correlation and Bland–Altman analysis (33).
For evaluating the intra-observer agreements on total disease severity scores for CT examination at each time point, correlation and reproducibility coefficients for the first and second evaluations by each investigator were determined by using the same statistical analyses.
To assess inter-observer agreement on total disease severity score for the two investigators on CT examination at each time point, correlation between the scores by the two investigators was also evaluated by means of Pearson's correlation. Further, the limits of agreement between both investigators were calculated by Bland–Altman analysis.
For determination of the associations between each pulmonary functional parameter and quantitatively and qualitatively assessed indexes on CT examination at each time point, univariate regression analyses were performed for each parameter and all quantitative indexes and disease severity scores. In addition, stepwise regression analyses were also performed to determine associations between each pulmonary functional parameter and all quantitative indexes.
To evaluate the differences between paired serial CT examinations for each quantitative index as well as for each disease severity score, Tukey's honestly significant difference (HSD) test was performed among three statuses used in this study.
Finally, all changes in pulmonary functional parameter between the paired serial CT scans were correlated with each quantitative index and disease severity score change by means of univariate regression analysis. In addition, stepwise regression analyses were also performed to determine associations between each pulmonary functional parameter and all quantitative index changes observed on the paired serial CT scans.
Results
Representative cases are shown in Figs. 2 and 3.

A 62-year-old male patient with dermatomyositis (first line, L to R: thin-section CT scans at baseline, acute exacerbation phase, and after treatment phase; second line, L to R: CT texture analysis results obtained by means of machine learning-based software at baseline and the same two phases.). A comparison of CT scans at baseline (i.e. “Stable” group) and at acute exacerbation phase (i.e. “Worse” group), shows an increase in GGO, reticulation, and honeycomb area, and a decrease in normal lung area. Δ%normal lung, Δ%GGO, Δ%reticulation and Δ%honeycomb were determined as −5.5%, 4.4%, 6.2%, and 7.1%¸ respectively, while Δdisease severity score was 10. A comparison of CT scans obtained at the acute exacerbation phase (i.e. “Worse” group) and at the after treatment phase (i.e. “Improved” group), shows a decrease in GGO, reticulation and honeycomb area, and an increase in normal lung area. Δ%normal lung, Δ%GGO, Δ%reticulation and Δ%honeycomb were determined as 3.3%, −1.5%, −2.8%, and −1.1%¸ respectively, while Δdisease severity score was −4. CT, computed tomography; GGO, ground glass opacity.

A 65-year-old female patient with progressive scleroderma (first line, L to R: thin-section CT scans at baseline, acute exacerbation phase, and after treatment phase; second line, L to R: CT texture analysis results by means of machine learning-based software at baseline and the same two phases.). A comparison of CT scans obtained at baseline (i.e. “Stable” group) and at the acute exacerbation phase (i.e. “Worse” group) shows an increase in GGO and consolidation area and a decrease in normal lung area. Δ%normal lung, Δ%GGO, and Δ%consolidation were determined as −16.9%, 13.2%, and 2.5%¸ respectively, while Δdisease severity score was 6. A comparison of CT scans obtained at the acute exacerbation phase (i.e. “Worse” group) and the after treatment phase (i.e. “Worse” group) shows an increase in GGO, reticulation, and honeycomb area and a decrease in normal lung area. Δ%normal lung, Δ%GGO, Δ%reticulation, and Δ%honeycomb were determined as −19.5%, 14.9%, 4.2%, and 0.2%, while Δdisease severity score was 15. CT, computed tomography; GGO, ground glass opacity.
On correlations between first and second assessments of ML evaluation of each radiological finding were significantly excellent or perfect for all radiological findings for volume between the first and second measurements (0.99 ≤ r; P < 0.0001), while their reproducibility coefficients were determined as 0.0 ± 0.0%.
Correlations and intra-observer agreements between first and second assessments by the two investigators were significant and excellent for qualitative disease severity score between first and second evaluations (first investigator: r = 0.99; P < 0.0001; second investigator: r = 0.99; P < 0.0001). Moreover, reproducibility coefficients were = 0 ± 2 for both investigators.
Correlation and inter-observer agreements between the two investigators were significant and excellent for qualitative disease severity scores (r = 0.98; P < 0.0001). The limits of agreement for qualitative disease severity scores between the two investigators were determined as 0 ± 2.
Table 1 shows the results of univariate regression analysis of differences between each pulmonary functional parameter and all radiological finding volumes derived from ML-based CT texture analysis as well as qualitative disease severity scores obtained in this study. There were significant correlations between FEV1/FVC% and all quantitative indexes except % nodular lesion (%normal lung: r = 0.51; P < 0.0001; other indexes: −0.39 ≤ r ≤ −0.17; P ≤ 0.0001) and disease severity index (r = −0.46; P < 0.0001). There were also significant correlations between %VC and all quantitative indexes except %nodular lesion (%normal lung: r = 0.59; P < 0.0001; other indexes: −0.45 ≤ r ≤ −0.22; P < 0.0001) and disease severity index (r = −0.56; P < 0.0001). Further, there were significant correlations between %DLCO/VA and all quantitative indexes except %nodular lesion (%normal lung: r = 0.51; P < 0.0001; other indexes: −0.39 ≤ r ≤ −0.17; P < 0.0001) and disease severity index (r = −0.46; P < 0.0001). Finally, there were significant correlations between KL-6 level and all quantitative indexes except %emphysema, %nodular lesion, and %consolidation (%normal lung: r = −0.24; P < 0.0001; other indexes: 0.10 ≤ r ≤ 0.20; P ≤ 0.01) and disease severity index (r = 0.22; P < 0.0001).
Univariate regression analysis of differences between each pulmonary functional parameter and all radiological finding volumes derived from ML-based CT texture analysis as well as qualitative disease severity scores on CT examination at each time point.
CT, computed tomography; ML, machine learning.
On stepwise regression analysis of differences between all quantitative radiological indexes and each pulmonary functional parameter as well as serum KL-6 level, FEV1/FVC% was significantly affected by %normal lung and %reticulation (r2 = 0.27; P < 0.05), %VC was significantly affected by %normal lung, %reticulation, and %GGO (r2 = 0.36; P < 0.05), while % DLCO/VA was significantly affected by %normal lung, %reticulation, and %GGO (r2 = 0.27; P < 0.05). In addition, the serum KL-6 level was significantly affected by %normal lung and %reticulation (r2 = 0.27; P < 0.05).
The results of a comparison of differences in pulmonary functional parameter, serum KL-6, each radiological index, and disease severity score among “Stable,” “Worse,” and “Improved” statuses for all patients with CTD-ILD are shown in Table 2. ΔFEV1/FVC%, Δ%VC, Δ%DLCO/VA, Δserum KL-6 level, Δ%normal lung, Δ%consolidation, Δ%GGO, Δ%reticulation and Δdisease severity score showed significant differences among the three statuses (P < 0.05). In addition, Δ%honeycomb had significant difference between “Stable” and “Worse” groups (P<0.05) and between “Worse” and "Improved" groups (P<0.05).
Comparison of differences in pulmonary functional parameter, serum KL-6, each radiological index and disease severity score among “Stable,” “Worse,” and “Improved” statuses for paired serial CT examinations in all patients with CTD-ILD.
Values are given as mean ± SD.
*Significant difference with “Stable” group (P < 0.05).
Significant difference with “Worse” group (P < 0.05).
CT, computed tomography; CTD-ILD, connective tissue disease-related interstitial lung disease.
Table 3 shows results of univariate regression analysis of differences between each radiological index and each pulmonary functional parameter as well as serum KL-6 level for all patients with CTD-ILD. All quantitative radiological indexes except Δ% emphysema and Δ% nodular lesion and Δdisease severity score showed significant correlations with ΔFEV1/FVC% (Δ% normal lung: r = 0.51; P < 0.0001; other quantitative indexes: 0.34≤r ≤ 0.21; P ≤ 0.0001; Δdisease severity score: r = −0.37; P < 0.0001) and Δserum KL-6 level (Δ% normal lung: r = −0.38; P < 0.0001; other quantitative indexes: −0.33≤r ≤ 0.20; P < 0.0001; Δdisease severity score: r = −0.33; P < 0.0001). All quantitative radiological indexes and Δdisease severity score showed significant correlations with Δ%VC (Δ% normal lung: r = 0.64; P < 0.0001; other quantitative indexes: −0.45≤r ≤ −0.11; P < 0.05; Δdisease severity score: r = −0.52; P < 0.0001) and ΔDLCO/VA (Δ% normal lung: r = 0.58; P < 0.0001; other quantitative indexes: −0.36≤r ≤ −0.12; P < 0.05; Δdisease severity score: r = −0.51; P < 0.0001).
Univariate regression analysis of differences between each radiological index and each pulmonary functional parameter as well as serum KL-6 level for paired serial CT examinations in all patients with CTD-ILD.
CT, computed tomography; CTD-ILD, connective tissue disease-related interstitial lung disease.
Results of stepwise regression analysis of differences among all quantitative radiological indexes and each pulmonary functional parameter as well as serum KL-6 level for all patients with CTD-ILD are shown in Table 4. All differences in pulmonary functional parameters as well as serum KL-6 level were significantly affected by Δ% normal lung, Δ% reticulation, and Δ% honeycomb (0.16≤r2≤0.42; P < 0.05).
Stepwise regression analysis of differences among all quantitative radiological indexes and each pulmonary functional parameter as well as serum KL-6 level for paired serial CT examinations in all CTD-ILD patients.
CT, computed tomography; CTD-ILD, connective tissue disease-related interstitial lung disease.
Discussion
Our findings demonstrate that ML-based CT texture analysis can quantitatively and reproducibly assess pulmonary functional loss in patients with CTD-ILD and evaluate disease status using correlations between quantitatively assessed radiological findings and pulmonary functional parameters as well as serum KL-6 level. In addition, ML-based CT texture analysis software is potentially more effective, in terms of reproducibility, correlations among evaluated radiological and pulmonary functional parameters, and follow-up examinations at various disease stages, than qualitatively assessed disease severity score by board-certified chest radiologists. To the best of our knowledge, no other studies have compared the capability of ML-based CT texture analysis for disease assessment with disease severity qualitatively assessed by board-certified chest radiologists in this setting.
Assessment of the reproducibility coefficients of ML-based CT texture analysis for each radiological finding evaluation and each board-certified chest radiologist as well as inter-observer agreement between two investigators, showed that the correlation and reproducibility coefficient of the radiological finding assessments using this software were superior to those of qualitatively assessed disease severity scores for patients with CTD-ILD. Therefore, ML-based CT texture analysis had better potential for fewer discrepancies than qualitatively assessed disease severity scores for the assessment of patients with CTD-ILD.
On the basis of the results of univariate and stepwise regression analyses of quantitatively and qualitatively assessed radiological indexes, pulmonary functional parameters, and serum KL-6 level, it can be concluded that %normal lung correlates better than disease severity score with pulmonary functional parameters and serum KL-6 level, while %GGO and %reticulation proved to be significant factors for improving correlations using stepwise regression analyses. Moreover, these findings can be considered compatible with the past literature, even though the techniques used were different (8–14). Therefore, quantitative assessment using ML-based CT texture analysis may be potentially superior to qualitative assessment for disease severity assessment based on thin-section CT in routine clinical practice.
When differences between paired serial CT scans in terms of each quantitative radiological index and disease severity score as well as pulmonary functional parameters and serum KL-6 level were assessed, Δ% normal lung, Δ% consolidation, Δ% GGO, Δ% reticulation, Δ% honeycomb, and Δdisease severity score as well as each pulmonary functional parameter and serum KL-6 level differences could be used to identify differences among “Stable,” “Worse,” and “Improved” statuses. In addition, these radiological differences correlated significantly with changes in pulmonary functional parameters and serum KL-6 level between paired serial CT scans. Furthermore, Δ% normal lung, Δ% reticulation, and Δ% honeycomb significantly influenced differences in all pulmonary functional parameters as well as serum KL-6 level. Finally, these quantitative radiological indexes obtained with ML-based CT texture analysis showed better correlations with disease status changes than did qualitatively determined disease severity scores in routine clinical practice. Therefore, our results indicate that ML-based CT texture analysis can be expected to be useful for answering various questions in not only academic, but also clinical studies of CTD-ILD as well as other patients with diffuse lung disease.
The present study has some limitations. First, this was a retrospective study, and the training and validation data were obtained from a single institution and only a few CT systems from only one CT vendor were used. Moreover, the number of training and validation cases were limited, and no external test set was used for confirmation of the results. A single hardware vendor's CT images were used, while other vendors CT data were not tested in this study. Therefore, it is necessary that data from other vendors will need to be used in a future study to validate the usefulness of this software and apply it in routine clinical practice. Second, neither the CT acquisition protocol nor the reconstruction algorithm was modified by means of state-of-the art CT techniques. Therefore, the diagnostic performance of our proprietary software might have been affected by the above-mentioned shortcomings and should thus be considered one of the limitations of this study. Third, only two radiologists assessed all radiological findings. As previously mentioned, however, increasing the number of investigators from different institutions and specialties may reduce inter-observer agreement for radiological finding evaluation, especially where honeycombing is concerned (34–36). Moreover, diagnoses of ILD were not evaluated, nor was the effect of the software in this setting determined in this study. Further investigations of the application of this software in routine clinical practice are therefore warranted.
In conclusion, newly developed ML-based CT texture analysis shows better potential than qualitatively assessed thin-section CT for disease severity assessment and treatment response evaluation of patients with CTD.
Footnotes
Acknowledgements
The authors thank Shinichiro Seki (Division of Functional and Diagnostic Imaging Research, Department of Radiology, Kobe University Graduate School of Medicine); Yuji Kishida (Department of Radiology, Kobe University Graduate School of Medicine); Shintaro Tokunaga, Shuya Hori, Motoko Tachihara, Kazuyuki Kobayashi, Yoshihiro Nishimura (all from the Division of Respiratory Medicine, Department of Internal Medicine, Kobe University Graduate School of Medicine); Wakiko Tani, Noriyuki Negi, and Takamichi Murakami (all from the Center for Radiology and Radiation Oncology, Kobe University Hospital) for their valuable contributions to this study.
Declaration of conflicting interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: KA, YF, and NS are employees of Canon Medical Systems Corporation, who had no control over any data or information submitted for publication nor any control over any parts of data or information included in this study.
Funding
The author(s) received the following financial support for the research, authorship, and/or publication of this article: This work was financially or technically supported by Canon Medical Systems Corporation, the Smoking Research Foundation, and Grants-in-Aid for Scientific Research from the Japanese Ministry of Education, Culture, Sports, Science and Technology (JSTS.KAKEN; No. 18K07675 and No. 20K08037).
