Abstract
OBJECTIVE:
To develop and test a novel method for automatic quantification of hepatic steatosis in histologic images based on the deep learning scheme designed to predict the fat ratio directly, which aims to improve accuracy in diagnosis of non-alcoholic fatty liver disease (NAFLD) with objective assessment of the severity of hepatic steatosis instead of subjective visual estimation.
MATERIALS AND METHODS:
Thirty-six 8-week old New Zealand white rabbits of both sexes were fed with high-cholesterol, high-fat diet and sacrificed under deep anesthesia at various time points to obtain the pathological specimen. All rabbits were performed by multislice computed tomography for surveillance to measure density changes of liver parenchyma. A deep learning scheme using a convolutional neural network was developed to directly predict the liver fat ratio based on the pathological images. The average error value, standard deviation, and accuracy (error <5%) were evaluated and compared between the deep learning scheme and manual segmentation results. The Pearson’s correlation coefficient was also calculated in this study.
RESULTS:
The deep learning scheme performs successfully on rabbit liver histologic data, showing a high degree of accuracy and stability. The average error value, standard deviation, and accuracy (error <5%) were 3.21%, 4.02%, and 79.10% for the cropped images, 2.22%, 1.92%, and 88.34% for the original images, respectively. The strong positive correlation was also observed for cropped images (R = 0.9227) and original images (R = 0.9255) in comparison to labeled fat ratio.
CONCLUSIONS:
This new deep learning scheme may aid in the quantification of steatosis in the liver and facilitate its treatment by providing an earlier clinical diagnosis.
Introduction
Non-alcoholic fatty liver disease (NAFLD) is defined as intracellular deposition of fat droplets exceeding 5% of triglycerides in the patients without clinically significant alcohol intake, infection, or other liver disease induced by a specific cause [1–4]. The earliest and most significant histologic characteristic of NAFLD is the presence of an intracellular accumulation of triglycerides within hepatocytes. Resulting oxidative stress leads to simple steatosis, liver injury, inflammation, simple hepatic steatosis, borederline non-alcoholic steatohepatitis (NASH), NASH, and eventually progresses to hepatocellular carcinoma and cirrhosis [5, 6]. Recently, the NAFLD has become an important public health concern due to the increasing incidence of obesity and diabetes in adults [7, 8], and it constitutes a significant risk factor for cardiac complications [9]. Steatosis in NAFLD patients can be reversed by timely intervention [9], underscoring the need for accurate quantification of the liver fat ratio in the early evaluation. Also, rapid and reliable quantification of steatosis is essential in liver transplantation, since steatosis in the donor’s liver increases the risk of organ failure in the recipient [9–11].
Currently, NAFLD is often assessed by non-invasive methods for quantification of hepatic steatosis, such as ultrasonography (US), computed tomography (CT), and magnetic resonance imaging (MRI), which show low accuracy and may yield unreliable results [12, 13]. In addition, there are no specific serum biomarkers for hepatic fat content quantification, and the aminotransferases are not sensitive and specific enough for detection of liver fat content. Given the limitations of the imaging modalities, liver biopsy remains the gold standard for diagnosis of fatty liver diseases and quantification of hepatic steatosis. Despite being an invasive method with inherent complications such as bleeding, expensive, requires expertise for interpretation, sampling error, and carries some morbidity and very rare mortality risk. However, it is the only reliable method for distinguishing NASH from simple steatosis [12, 15]. At biopsy, hepatic steatosis is assessed according to a commonly used grading system [9, 16]. The grading system categorizes hepatic steatosis as normal (grade 0) when the fat ratio is between 0% and 5%, mild (grade 1) when the ratio is between 5% and 33%, moderate (grade 2) when the ratio is between 34% and 66%, and severe (grade 3) when the ratio is greater than 67%. However, the visual assessment of hepatic steatosis in biopsy is subjective and depends on the pathologist’s experience, contributing to inter- or intra-observer variability in the evaluation of the proportion of liver fat and diagnosis of NAFLD [13, 18]. To address this problem, numerous image processing techniques have been developed for the quantification of steatosis in histologic sections [19–26]. Most of these methods have been developed using a limited dataset, and their feasibility and accuracy need to be validated with a large number of samples.
The available image analysis-based approaches utilize the traditional threshold processing, image segmentation, and machine learning methods. Some of them suffer from low robustness and require user interaction [9, 24]. Although machine learning methods can reduce the necessity for user interactions, their development is difficult given the extremely high number of possible scenarios and a small dataset. With the advent of artificial intelligence, deep learning technologies, such as Convolutional Neural Network (CNN) [27], Fully Convolutional Network (FCN) [28], and U-net [29], have been applied in digital pathology and microscopy image processing [30]. Deep learning has been used for histologic images since 2013 [31–46], including tasks like segmentation of nuclei, classification of lymphoma sub-types, and scoring of liver fibrosis stages.
Among these previously reported studies, Yu et al. [33] used deep learning method for the scoring of liver fibrosis stages. Li et al. [34] used convolutional neural network for hepatocellular carcinoma nuclei grading. Fu et al. [35] proposed a novel CNN for identifying fibrosis in cardiac histological images. Mazo et al. [36] used transfer learning in different CNNs to classify cardiovascular tissues in histological images. Xie et al. [37] utilized a modified CNN model for cell detection by using a structured regression layer instead of a classifier. Chen et al. [38] implemented a deep regression network along with transferred knowledge for automatic detection of mitosis, and documented efficiency of their proposed approach. Saha et al. [39] also used a deep learning model for mitosis detection in breast histopathology images. Van Eycke et al. [40] proposed a method to automate segmentation of the glandular epithelium and data augmentation. Das et al. [41] used deep CNN approach for the automatic identification of clinically relevant regions from oral tissue histological images. Saltz et al. [42] employed CNN to generate maps of tumor-infiltrating lymphocytes. Khosravi et al. [43] presented deep learning approaches for classifying pathology images in the presence of extensive tumor heterogeneity. Pal et al. [44] used deep CNN for the segmentation in images of psoriasis skin biopsies. Xie et al. [45] proposed a fully residual CNN over the conventional pixel-wise classification method for cell detection. These previous results demonstrate that the deep learning method is ideally suited for analysis of pathology images.
The objective of the present study was to develop a novel method for automatic quantification of hepatic steatosis in histologic images, based on the deep learning framework designed to predict the fat ratio directly. To the best of our knowledge, the work is the first attempt to quantify the degree of hepatic steatosis by employing deep learning method.
Materials and methods
Rabbit NAFLD model
Animal care and all experimental procedures were performed in accordance with the Guideline for Animal Experiment of Guizhou Medical University, and the research protocol was approved by the Animal Experimentation Ethics Committee of Guizhou Medical University. Thirty-six New Zealand white rabbits of both sexes, 8-weeks-old, were randomly divided into three groups (12 rabbits/group). All animals were fed a high-fat and high-cholesterol diet (HFD, a standard diet with an additional 2% cholesterol, 6% yolk powder, 10% lard, and 2% maltose dextrin). All the rabbits were performed by multislice computed tomography to measure the density changes of liver parenchyma by means of Housfield Units (HU) at various time points (Fig. 1). The rabbits were sacrificed at 4, 8, and 12 weeks after switching to the HFD. Figure 2 shows fat cell difference of liver histologic images in different weeks. This process was conducted by the Laboratory Animal Research Center of Guizhou Medical University.

The surveillance for density changes of animal liver parenchyma according to the computed tomography by the means of Housfield Units (HU) at different time points. The CT value of liver parenchyma was of 75HU at 1 week, 71HU at 2 weeks, 48HU at 4 weeks, 40HU at 8 weeks, and 9HU at 12 weeks in this study.

The liver histologic images of rabbits were sacrificed at 4, 8, and 12 weeks.
All rabbits were sacrificed under deep anesthesia by injection of 5 ml of potassium chloride through the auricular vein. All liver specimens were stained with hematoxylin and eosin (H&E). The histologic images of liver parenchyma have been acquired with the LEICA DM2500 microscope using the Leica LASX software. Five images of different histologic sections (eyepieces: 10×, objective lens: 40×, NA: 0.65) were obtained from each rabbit. A total of 180 images were obtained from 36 rabbits. The size of the original image was 1944×2592 pixels. The dataset was divided into six parts, one part for validation (30 images from 6 rabbits) and the other five parts for training (150 images from other 30 rabbits). The training and validation images come from different rabbits. The fat cells in all images were manually segmented by two pathologists with at least four years of clinical experience. The range of the fat ratio is from 13% to 50%.
Outline of the methodology
The block diagram of the proposed method is shown in Fig. 3. The training set was cropped into a sequence of sub-images of size 324×324, and the three-color channels (RGB) of each sub-image were used as the input of the CNN model. The output result was the predicted fat ratio. The development of the method was divided into two stages: computation of fat ratio in the input image, and training of the CNN model. After training, the system was deployed to perform the automatic quantification tasks. The two stages are detailed below.

The block diagram of the developed algorithm. The three-color channels (RGB) of cropped image were used as the input of CNN model. The output result was the predicted fat ratio.
In this step, 180 liver histologic images were segmented by two pathologists with at least four years of clinical experience. During the segmentation, all fat cells included in the images have been manually labeled. The value of the fat cell was one, and the value of other regions in the image was zero. Therefore, binary images of all cropped images were generated according to the region of manually segmented fat cells. Figure 4 shows two representative examples of three channels of cropped histologic images and the segmented binary image of all fat cells. The value of the labeled fat ratio, shown below the binary images, was computed by dividing the total number of pixels in the white region (fat cells) by the number of pixels in the entire binary image (liver tissue).

Examples of cropped histologic images, corresponding RGB channels, and the segmented binary images. The labeled fat ratio is listed below the binary images.
In the proposed CNN model, shown schematically in Fig. 5, the three-color channels of the cropped image of size 324×324 pixels were used as an input image. Then, in the convolution layer 1 (Conv1), 64 convolved maps were captured by adopting 64 filters of size 3×3 and the stride of 3. The weights of the filters were randomly initialized from a Gaussian distribution with a fixed zero mean and fixed standard deviation of 0.01. In the convolution layer 2 (Conv2), 128 convolved maps were captured by adopting 128 filters of size 3×3 and the stride of 1. The max pooling of size 2×2 and the stride of 2 were used to downsample the convolved maps. Rectified linear unit (ReLU) was used as non-linearity activation function after each convolution layer and fully connected layer. All convolution layers were appended by batch normalization (BN), and then by ReLU activation. Finally, the CNN feature of size 3×3×128 was connected to two fully connected layers (FC1 and FC2), yielding a single output. The output is the predicted fat ratio of the input image. In addition, the dropout layer and L2 regularization were used in our model to prevent overfitting. The dropout ratio was set to 0.4.

The block diagram of CNN for the prediction of fat ratio. It consists of one convolution layer with 64 filters of size 3×3 and stride of 3, and three convolution layers with 128 filters of size 3×3 and stride of 1. All convolution layers were appended by batch normalization (BN), then ReLU activation. The output result of CNN was the predicted fat ratio of input image.
In the training process, the training set was augmented by cropping an image into sub-images due to the lack of an appropriate dataset. Each original image of size 1944×2592 in the training set was cropped into 48 images of size 324×324 without any overlapping areas. We also applied rotation by 90°, 180°, 270°, and flip transformations to each cropped image. To facilitate training, each image was preprocessed to standardize the minimum and maximum intensities in each RGB channel to 0 and 1, respectively. The training set generated in this manner contained 57,600 histologic images. Each original image of the validation set was cropped into 48 images of size 324×324 without any overlapping areas. By this approach, the 30 original images in the validation set yielded 1,440 histologic images. The labeled fat ratio in the cropped images was obtained by manual segmentation of the original image. The network was trained by MATLAB 2018a in a Windows operating system, in which 57,600 images were enrolled for training and 1,440 images for validation. The network was trained by decreased the Euclidean loss function. The Euclidean loss (E) is defined as
The proposed method was developed with MATLAB 2018a in a 64-bit Windows 10 platform, which ran on a computer with an Intel 4.2GHz CPU and 64GB RAM. The parameter training of CNN used an NVIDIA graphics processing unit GTX1080Ti to accelerate the training process.
For the evaluation of the performance of the developed method, the absolute error, average error value, standard deviation, and Pearson’s correlation coefficient (R) were calculated and compared with the ground truth (manual segmentation results). The absolute error was defined according to the following formula:
If N denotes the total number of samples, the average error value is calculated by:
The Pearson’s correlation coefficient can be written as:
Besides the above methods, we also used accuracy to evaluate the validation set when the error was less than 5%. The accuracy (error <5%) was computed by dividing the number of images with the absolute error of less than 5% by the entire validation set.
In order to evaluate the CNN model in our research, we selected 30 original images for validation each time and left the remaining images for training. The training process was repeated 6 times. Each time, 30 original images were cropped into 1,440 images of size 324×324 without any overlapping areas and were used to evaluate the developed method. The labeled fat ratio and predicted fat ratio were acquired from manual segmentation method and the CNN model, respectively. Throughout the experiments, the average processing time of our deep learning network to compute the fat ratio was 0.008 s for each image of the validation set and 0.4 s for an original image.
Table 1 demonstrates the validation results of each training, including the number of cropped images in different ranges of absolute error, average error value, standard deviation, and accuracy (error <5%). According to the absolute error between labeled fat ratio and predicted fat ratio of each cropped image, the number of cropped images in different error range can be calculated. The average absolute error value of the proposed method was 3.21%, and the standard deviation was 4.02%. The average accuracy was 79.10%. The Pearson’s correlation coefficient between the predicted fat ratio and the labeled fat ratio was 0.9227 (average value of validation six times). The accuracy (error <5%) was computed by dividing the number of cropped images with the absolute error of less than 5% by the entire validation set (1440 images).
Number of cropped images in different ranges of absolute error, average error value and standard deviation (mean±STD), and accuracy (error <5%) in each validation
Number of cropped images in different ranges of absolute error, average error value and standard deviation (mean±STD), and accuracy (error <5%) in each validation
The scatter and box plots obtained by the application of the proposed method to 30 original images in each validation is shown in Fig. 6. The horizontal coordinate and vertical coordinate of each scatter point are labeled fat ratio and predicted fat ratio in Fig. 6a, respectively. According to the Pearson’s correlation coefficient (R), Fig. 6a shows the predicted results of proposed method correlates strongly with the manual segmentation method. Figure 6b illustrates the variation of error in each validation due to the use of different sets for training and validation. Table 2 shows the average error value, standard deviation, Pearson’s correlation coefficient (R), and accuracy (error <5%) for original images in each validation set. The accuracy (error <5%) was computed by dividing the number of original images with the absolute error of less than 5% by the all original images in each validation set. The average error value and standard deviation were 2.22% and 1.92%, respectively. The results obtained by the deep learning method correlated strongly with the results generated by manual segmentation (R = 0.9255). The average accuracy was 88.34% when the error was less than 5%.

The scatter plot (a) and box plot (b). Valid 1 to Valid 6 denote the results of 30 images between labeled fat ratio and predicted fat ratio in each validation. The error is equal to the labeled fat ratio minus predicted fat ratio.
Average error value and standard deviation (mean±STD), Pearson’s correlation coefficient (R), and accuracy (error <5%) for original images in each validation.
The present work proposes a deep learning method utilizing CNN for the quantification of hepatic steatosis in histologic images. The proposed technique was validated on 1,440 images. The average processing time of the prediction of fat ratio was 0.008 s for each image in the validation dataset, and the computation time for an uncropped original image was 0.4 s. Compared with the subjective visual estimation method and previously published techniques, this novel approach increases the accuracy and reduces the postprocessing time tremendously.
According to the values of average error and standard deviation shown in Table 1, the proposed method predicts the fat ratio within 5% of error in most of the images. However, the absolute error in some cases exceeds 10%. In these instances, the source of error is the presence of the central vein, portal vein, sinusoid, and bile duct in the validation images. These anatomical features constitute a white region in liver biopsies [21]. Figure 7 illustrates these structures following the H&E staining. The white region of these features can be included in the computation of the amount of fat in the CNN model. When these elements are absent, the trained network was able to yield results with a lower error rate. As indicated in Table 1, measurements in more than 98% images had an error smaller than 10%, and in 79.10% images had an error smaller than 5%.

White regions in liver histologic images. (a) Central vein; (b) Bile duct, portal artery and portal vein; (c) Sinusoid. Note that the size of these image are 456×456 pixels.
Several recent studies, listed in Table 3, have attempted to assess steatosis based on thresholding, image segmentation, classification, and machine learning. Lee et al. [19] utilized an image analysis tool for pathology analysis, and Nativ et al. [20] presented an automated image analysis method to quantify macrovesicular steatosis in human liver. Vanderbeck et al. [21] used supervised machine learning for the classification of white regions in liver biopsies. Automated and accurate identification of fat droplets in histologic sections has been presented by Sciarabba et al. [22] and Homeyer et al. [23]. Tsimplakidou et al. [24] proposed an image processing method for automated hepatic steatosis assessment with 20 liver biopsy images. Batool et al. [25] presented an algorithm based on morphological filtering and sparse linear models with 38 high-resolution images. Giannakeas et al. [13] measured steatosis by using machine learning morphological method. More recently, Homeyer et al. [26] used pixel classification and Random Forest classifier to improve the robustness of the method and evaluated different protocols for analysis using 970 histologic images from patients. However, all above-indicated studies used different and smaller datasets, and their effectiveness in a larger dataset needs to be validated. The major advantage of the proposed deep learning method is that it does not require the classification of pixels and pre-processing of the image. The average processing time of the prediction of fat ratio is 0.4 s for each original image. The short time of image processing meets the real-time requirement of clinical diagnosis. Although the average error value and standard deviation are higher than with the other methods, the results reveal that the proposed method correlates strongly with the manual segmentation method. Of relevance, the proposed CNN model calculates the fat ratio of histologic images directly, offering an objective quantification of steatosis instead of only classifying the histologic grade.
Comparison between the proposed method and the previously applied methods for the quantification of hepatic steatosis
N, Number of images; R, Pearson’s correlation coefficient; CCC, Concordance Correlation Coefficient.
Furthermore, we also compare the predicted result with labeled fat ratio of original image. The average error value and standard deviation of the original image were lower than that of the cropped image. The average accuracy (88.34%) is higher than that of the cropped image in Table 2. These results demonstrate that a few cropped images with a larger error have a small effect on the computation of predicted fat ratio of the original image. This property is due to the fact that the predicted fat ratio of the original image is the average value of 48 cropped images without any overlapping areas.
Some limitations of the current study have to be acknowledged. Firstly, the small size of the training dataset was overcome by cropping an original image into sub-images to generate training data utilized in network training. The performance of the proposed method could be lower when the histology images are blurry and using different staining. Secondly, there are no definitive approaches to design hyper-parameters, such as the number of layers, filter size, and learning rate in the design of the CNN framework. Using of a larger database might help solve these issues. Thirdly, only the rabbit NAFLD model was used due to the lack of human patient data. Finally, the inter-reader variability between these two pathologists was not performed in this study.
The performed experiments illustrate that the proposed method has a high accuracy in a large validation dataset and can be extended to NAFL patients. To the best of our knowledge, the present work is the first attempt to apply deep learning method to the task of prediction of hepatic steatosis in histologic images. In the future work, we will focus on the application of other CNN frameworks and improvement of the accuracy of the methodology.
The present work resulted in the development of a deep learning method in the CNN framework for automatic quantification of hepatic steatosis in histologic samples. More than 50,000 images were used to train the neural network. The accumulated results demonstrated high accuracy and stability of this novel methodology, which constitutes a substantial improvement in the quantification of steatosis in comparison with the subjective visual estimation and previously employed automated techniques. Although further testing and validating of the deep learning methodology using a larger amount of human histologic images is warranted, it is expected that the technique developed in this study can reduce the risk of failure in liver transplantation and facilitate the diagnosis of NAFL.
Conflict of interests
The authors declare that they have no competing interests.
Footnotes
Acknowledgments
We are thankful to Dr. Lujun Dai who supported the techniques of H&E staining and Masson trichrome staining; The authors gratefully thank all the participants and staff of the Affiliated Hospital of Guizhou Medical University; We are also thankful to StudyForBetter Team who contributed their best research spirits to each other during the process of the experiment. This work was supported partly by the National Natural Science Foundation of China (Grants nos. 81660298, 81960338, and 81760312), the 2011 Collaborative Innovation Program of Guizhou Province (no. 2015-04), Joint Fund of Guizhou Department of Science and Technology (Qiankehe LH [2017]7028), and Doctoral Research Initiation Fund of the Affiliated Hospital of Guizhou Medical University.
