Abstract
The central retinal artery and its branches supply blood to the inner retina. Vascular manifestations in the retina indirectly reflect the vascular changes and damage in organs such as the heart, kidneys, and brain because of the similar vascular structure of these organs. The diabetic retinopathy and risk of stroke are caused by increased venular caliber. The degrees of these diseases depend on the changes of arterioles and venules. The ratio between the calibers of arterioles and venules (AVR) is various. AVR is considered as the useful diagnostic indicator of different associated health problems. However, the task is not easy because of the lack of information of the features being used to classify the retinal vessels as arterioles and venules. This paper proposed a method to classify the retinal vessels into the arterioles and venules based on improving U-Net architecture and graph cuts. The accuracy of the proposed method is about 97.6%. The results of the proposed method are better than the other methods in RITE dataset and AVRDB dataset.
Introduction
In recent years, many dangerous diseases have appeared around the world. Some different systemic and ophthalmic ills can be diagnosed by basing on the morphology of retinal blood vessels. In fact, the retinal vessels include arteries and veins. The arteries transport blood rich in oxygen to organs, so they are brighter. While the veins carry the deoxygenated blood from the organs back to the heart, thus their color is darker [1–3].
Doctors have used the retinal images to diagnose some health problems such as narrowing arteriole [4], diabetic retinopathy [5], glaucoma [6], age-related macular degeneration [7], arteriosclerosis and hypertension [8], and so on. These diseases come from the changes of arterioles and venules. The evidence of this, the decreased arteriolar caliber caused the coronary arterial disease [9]. The arteriole narrowing at the early step is related to hypertensive retinopathy [10]. The narrowed arteriole is a sign of the pancreas [11]. The diabetic retinopathy and risk of stroke are caused by increased venular caliber [5]. The degrees of diseases depend on the changes of arterioles and venules. The ratio between the calibers of arterioles and venules (AVR) is various [12]. AVR is considered a useful diagnostic indicator of different associated health problems. The evidence from here is the contribution based on the abnormal AVR to diagnose high blood pressure or high cholesterol levels [13]. Thus, to find out the abnormal AVR is necessary to serve for diagnosing doctors. However, the task is not easy because of the lack of information of the features being used for the classification of the retinal vessels as arterioles and venules.
Acute hypertension often causes reversible vasospasm in the retina, and malignant hypertension can cause papilledema. Prolonged or severe hypertension leads to vascular changes, resulting in endothelial damage and necrosis. Other changes, such as thickening of the vessel wall and arteriovenous crossing, often require years of hypertension to progress. Symptoms usually do not develop until late in the disease and include blurred or narrowed vision. In the early stages, ophthalmoscopy helps to detect arterial spasm associated with a decreased ratio of arterioles to venules aperture. Therefore, the classification of arteries and veins in retinal images aims to calculate the ratio of the arterial to venous aperture, which helps in the initial diagnosis of hypertension.
In this paper, we proposed the method to classify the retinal vessels into the arterioles and venules based on improving U-Net and graph cuts. The proposed method includes four steps: pre-processing, improving U-Net architecture, initial arterioles and venules classification, and graph cuts. The RITE dataset [14] and AVRDB dataset [39] are tested to make the comparison between our proposed method and the recent other methods based on accuracy evaluation criteria.
Main contributions of this paper are as follows: Modifying U-net architecture (MoU-Net) to match arterioles and venules classifiers. Increasing the efficiency in the classification, MoU-Net is combined with graph cuts to improve the accuracy.
The rest of the paper is organized as the following: the literature review is presented in Section 2. The proposed method for the arterioles and venules classification is presented in Section 3. The experimentation and evaluation results are presented in Section 4. Finally, Section 5 is conclusions and future works.
Literature review
In general, the research separated the arterioles and venules segmentation into two categories: graph-based and feature-based methods [16]. In terms of graph-based methods [17, 18], Rothaus [19] presented that the retinal blood vessels were classified into arterioles and venules by using the automated graph separation algorithm. Dashtbozorg [20] proposed an approach including three stages. Firstly, the entire vascular tree is classified based on the kind of every intersection point and secondly, each segmented vessel is assigned by one of two labels. Finally, the graph-based labeling results are combined with a set of intensity features to segment the retinal vessels into arterioles and venules.
In terms of feature-based methods [21, 22], the target pixels are classified by capturing features from fundus image patches centered on the pixels. These methods usually include various common stages: firstly, this stage is to achieve the segmented vasculature tree or to begin with it directly; secondly, the vessel centerlines are made from extracting the vasculature; finally, the vessel segments were considered as arterioles or venules depending on the features extracted from their status or the kind of centerline pixels belonging to them.
In other words, most of the recent researches apply a two steps approach for retinal artery and vein classification: firstly, the vessels are segmented from the image; secondly, these segmented vessels are classified into arteries and veins by applying purely handcrafted features in feature-based techniques or by combining edge information in graph-based techniques. As Welikala [23], arterioles and venules were classified by applying many methods including support vector machines, neural networks, profile-based and region-of-interest-based feature extractions. Zhang [24] proposed improving the arteriovenous segmentation which applied dual-model fundus images and expanded a cascade refined U-net. In the approach, each dual-modal fundus image consists of both of a regular color image and other two monochromic images which were acquired by applying two various wavelengths. These monochromic images provide more useful information on the arteriole and venule. The proposed cascade refined U-net fully used the information and obtained a high performance.
Recently, Deep Learning is more and more developing, some convolutional neural network-based approaches are proposed for combining the vessel segmentation and the artery/vein (A/V) classification. Xu [25] proposed a vessel segmentation method to classify the vessels into arteries and veins simultaneously by enhancing the fully convolutional network (FCN) architecture. AlBadawi [26] FCN architecture is built with an encoder–decoder structure for classifying the vessel into arteries and veins based on pixels. Meyer [27] A/V classification was also employed by using the FCN architecture. The method illustrated high performance on thick vessels. Badawi [28] proposed a fully convolutional neural network with an encode-decoder for grouping the retinal vessels into arterioles and venules, and it did not require any preliminary stages of vessel segmentation. A multiloss function is optimized for learning the pixel-wise and segment-wise retinal vessel labels. Hemelings [29] novel method proposed to not only segment blood vessels but also discriminate A/V simultaneously. Li [30] highly confident prediction related to the peripheral vessels was created by acquiring the structural information among vessels with the post-processing. But, both automatic vessel segmentation and A/V classification have been difficult works because of some below challenges [31]: The performance of these methods is high for large-scale structures but low for small-scale structures because it is easy to overlook the multiscale structure of blood vessels. The retinal image consists of a small number of pixels belonging to the vessel. The blood vessels and choroid are quite similar. So, it is difficult to classify accurately.
In the technical development trends, Deep Learning is considered a major technology in many fields, especially in medical image processing. Evidence of this, many objects in an image were segmented by applying Deep Learning [34, 42]. In fact, convolutional neural networks (CNN) are a typical network structure in the technology. CNN is useful for segmenting objects in an image because it not only reduces the procedures of object feature extraction but also decreases feature selection to make segmentation processing more accurate. Based on the benefits of CNN, Wei Zhang [42] proposed a method to segment iris based on enhanced U-net. The method was built by fully dilated combining with U-Net. To increase the number of extracted global features the original convolution was replaced by dilated convolution.
Recently, most humans attempt to discontinue pandemics. However, these tools utilized to detect novel coronavirus are expensive and consume critical time. To overcome the limitations more and more researchers proposed some methods to classify the suspected subjects in the early stage. Singh [43] proposed an automated coronavirus screening model. The method built based on ensembling the deep transfer models to solve the sensitivity issue thanks to increasing the feature extraction.
Xinghua [44] introduces a deep translation-based change detection network for optical and synthetic aperture radar images. The deep translation maps images from one domain (optical) to another domain (synthetic aperture radar images) via a cyclic structure into the same feature space. After the deep translation, the similar characteristics become comparable to separate the unchanged pixels and changed pixels.
To solve the limitations mentioned above, this study presents the method to classify the retinal blood vessels into arteries and veins based on features by improving U-net architecture and graph cuts.
Artery and vein classification of retinal vessel image method
There are two vessel types in a retinal blood vessel image: arterioles and venules. Arterioles are brighter than venules because arterioles transport the blood rich in oxygen to other parts. Artery is usually next to two venules and the other way around, when near the optic disk [1]. The process of the proposed method is presented in Fig. 1. In Fig. 1, the proposed method includes four parts: pre-processing, improving U-shape network architecture, initial arterioles and venules classification, and graph cuts. The pre-processing part is detailed in Section 4.1. In the rest below, we explained the period by period of the proposed method.

The process of the proposed method.
U-Net is a fully convolutional neural network which has many advances in medical image segmentation [32]. However, the original U-Net has a drawback in extracting some of the complex features. The size of venules and arterioles in the retinal blood vessel is very small and difficult to distinguish them from each other. So, we need to modify the U-Net architecture to separate venules and arterioles. The U-Net architecture includes two paths: encoding path and decoding path. We use the encoding path to extract low-level features and the decoding path to extract high-level features. Figure 2 presented the modifying U-Net architecture (MoU-Net) which we proposed to separate venules and arterioles. The architecture of MoU-Net is like U-Net, it consists of two parts: the encoder and decoder.

The MoU-Net architecture.
The encoder part includes five blocks, each block consists of three convolution layers with kernel_size 3x3, stride 2×2. Next is the ReLU nonlinear activation layer and a pooling layer consisting of a convolution layer with kernel_size 2×2, stride 2×2. The decoder part includes five blocks, each block consists of three convolutional layers with kernel_size 3x3, stride 2×2, padding 1x1. Next is the ReLU nonlinear activation layer and an Unpooling layer consisting of a convolution layer with kernel_size 2×2, stride 2×2. The layer at the bottom is the middle layer between the layers in the encoder and decoder parts. It uses three convolution layers with a kernel of size 2×2, stride 2×2, and a ReLU nonlinear activation layer. Next is an Unpooling layer with kernel of size 2×2, stride 2×2. The result will be sent to a convolution layer with kernel of size 1x1, stride 1x1, and the sigmoid function used as the activation function.
We proposed the MoU-Net architecture with more depth (5 pooling, 5 unpooling) versus U-Net (4 pooling, 4 unpooling). Moreover, we have modified the Unpooling layer with kernels of size 2×2, stride 2×2 to get output consistent with the graph cuts algorithm to perform the classification. So, the possibility of feature filtering from the input image will be better than U-Net.
Suppose that, in the downsampling path from the encoder, EnC
i
represents the ith feature map. In the upsampling path from the decoder, DeC
j
represents the jth feature map. We used a bottleneck convolution with kernel of size 2*2 to create an auxiliary output for each feature of the above map. A channels for all the feature maps of all the scales are defined as [15]:
To train the MoU-Net model, we used the binary Cross-Entropy loss function [32]. Assume that the dataset training consists of n data points with labels as shown in the equation (2).
Suppose that the symbol θ is the set of all parameters including the convolutional, bias and softmax parameters of the MoU-Net network. For logistic regression, it is necessary to minimize the loss function in the formula (3) by θ:
The following hyperparameters are used during the model training process, including:
Adam Optimizer: The Adam optimizer is a combination of advantages of two commonly used optimizers including AdaGrad and RMSProp [33]. It is one of the powerful and efficient optimization algorithms commonly used in Deep Learning. Adam has four hyperparameters (α, β1, β2, ɛ), where [4], learning rate α (stepsize), β1, β2 ∈ [0, 1) are exponential decay rates for the moment estimates. The hyper-parameters β1, β2 control the exponential decay rates of moving averages of the gradient (mt) and the squared gradient (vt) (the default values of β1 and β2 are 0.9 and 0.999). And the value ɛ selected by 10-8 (ɛ=10-8)
The learning rate (α) is a hyperparameter used in training neural networks. Its value is a positive number, usually between 0 and 1. The learning rate controls how quickly the model changes the weights to match the classification task. The large learning rate helps the neural network be trained faster, but it can also reduce the accuracy. Here, we ran multiple tests with different learning rates to get the best results. We see that the value α=10-4 was chosen as the appropriate learning rate to train the model. The remaining three hyperparameters work well with the default values as above 35].
The batch size is a hyperparameter that determines the number of initial training samples to execute the initial parameters before updating the parameters inside the model. In this task, we choose a batch size = 1 to train the model.
The Epochs size establishes the number of times that the training algorithm will be run through the entire training dataset. The epoch size was set to 700 during our model training.
After the MoU-Net model is trained, we perform the initial vascular segmentation on the test dataset to obtain a probability distribution map. This map is the input to the complete arterioles and venules classification step using the graph cuts method.
To further improve the classification results, we use graph cuts algorithm to perform the classification by a global minimization of the built energy function to solve the problems discussed above.
Based on our observations: the larger the vascular area in the image, the more accurate the results can be obtained. Therefore, we choose the image with the largest vascular area as the initial partition image. The graph cuts energy function is shown in the equation (4).
The boundary term B(f) is determined by using a classical boundary function [6] shown by the formula (6).
The position and size of blood vessels in adjacent pixels do not change much. So, these kinds of context information are used to reconstruct the energy function for subsequent blood vessels. Therefore, a new energy function is expressed in the form of formula (7).
To improve the classifier performance and avoid over- or under-partitioning of the vascular region, the vascular grayscale information is added to the definition of the rion term in the equation (8), defined as follows.
where, IR and σR represent the mean brightness and standard deviation of all pixels in the vascular container, respectively.
F(f) represented a position constraint function. It is mainly established based on the vascular position of adjacent pixels which will not change appreciably. It can be defined in the equation (9).
where, D(Ii) is the shortest Euclidean distance between pixel Ii and the previous vascular region. For each pixel Ii in the considered vascular region, D(Ii)=0. And μ controls the search range of the vascular region. This can cause over- or under-partitioning of the vascular area when the μ value is too large or too small. Therefore, the value of μ is determined by many tests based on previous work proposed in [37], the value of μ is possible in the range [0.02, 0.05]. Therefore, we set μ=0.04 in our experiments.
Materials and datasets
Our experiments are developed in Python, with the computer of Intel core i7, 2.9 GHz CPU, 16 GB DDR2 memory. We use publicly available datasets for our experimentation as: RITE (Retinal Images vessel Tree Extraction) dataset [14] and AVRDB (Annotated Dataset for Vessel Segmentation and Calculation of Arteriovenous Ratio) dataset [39].
The RITE database [14] enables comparative studies on the classification of arterioles and venules on retinal fundus images, which is established based on the public available DRIVE database. The dataset contains 40 color retinal fundus images in which every image has the annotations of venules, arterioles, intersections of venules and arterioles by doctors (expert ophthalmologists) [14]. In this dataset, we randomly selected a subset of 30 images for training and the rest 10 images to evaluate the results. The images are cropped to the size of 512*512. And these images augmented the dataset by scaling, random clipping, etc. So, the size of the training dataset is expanded to 2500 images. Each image has been defined under png format. Some of the images of this dataset are presented in Fig. 3. The labeled images in Fig. 3 consist of vessels with arterioles (red color), venules (blur color). The overlapping of arterioles and venules are labelled in green; the vessels which are uncertain are labelled in white [14].

Some images in RITE dataset. a. Original image b. Annotated image.
The AVRDB dataset consists of 100 fundus retinal images which are annotated with the help of expert ophthalmologists from Armed Forces Institute of Ophthalmology. The vascular network of images in this dataset is classified into arteriolar and venular patterns. These images having the size of 1504 x 1000 contain retinal arteries and veins [39]. In this dataset, we randomly selected a subset of 70 images for training and the rest 30 images to evaluate the results. The images are also cropped to the size of 512*512. And these images augmented the dataset by scaling, random clipping, etc. So, the size of the training dataset is also expanded to 1200 images. Each image has been defined under jpg format. Some of the images of this dataset are presented in Fig. 4. The labeled images in Fig. 4 consist of vessels with arterioles (red color), venules (blur color) [39].

Some images in AVRDB dataset. a. Original image b. Annotated image.
To evaluate the results of the venules and arterioles classification process, we compare the results of the proposed method to the results of the expert symptom assessment (ground-truth) provided with the above dataset for each retinal image. Now, we used true positive rate (TPR) for arterioles and venules, and accuracy to evaluate the metrics of the results of the proposed method. TPRarteriols is the ratio of correctly classified artery centerline pixels to all centerline pixels. And TPRvenules is the ratio of correctly classified vein centerline pixels to all centerline pixels. Finally, the accuracy is the ratio of all correctly detected centerline pixels to all centerline pixels with a ground truth label [21]. They are computed as in the equations (10), (11) and (12).
We test all images over these two datasets. As an illustration here, with each dataset, we only present two case studies such as in Figs. 5 and 6 for RITE dataset, and Figs. 7 8 for AVRDB dataset.

Some images results of the proposed method from the RITE test dataset.

Partially extracted image from segmented image in the RITE test dataset.

Some images results of the proposed method from the AVRDB test dataset.

Partially extracted image from segmented image in AVRDB test dataset.
Figures 5 7 presented some images results of the proposed method from the RITE test dataset and AVRDB test dataset, respectively. Figures 6 8 presented the image partially extracted from the segmented image, respectively.
From the above experiments and others, we can see that the proposed method works well and gives reliability results. Table 1 and Table 2 presented the results of the evaluation of the proposed method on the above dataset.
Results on individual images of the proposed method from the RITE test dataset
Results on individual images of the proposed method from the AVRDB test dataset
In Table 1 and Table 2, the proposed method also gives good results. To objectively evaluate the effectiveness of the proposed method, the results of the proposed method are compared to those of Xiayu method [21], AlBadawi method [26], Galdran method [40] and Meng method [38] in RITE dataset by the evaluation as accuracy criteria. The results of the proposed method are also compared to those of Xiayu method [21], AlBadawi method [26] and Meng method [38] in the AVRDB test dataset by the evaluation as accuracy criteria.
Figure 9 presented the result of a test case of the proposed method with Xiayu method [21], AlBadawi method [26], Galdran method [40], and Meng method [38] in RITE dataset. In this case, the result of the proposed method is better than the other methods. Table 3 presented the results of arterioles and venules classification results which compared to the other methods in the RITE dataset.

A case of the proposed method results compared with other methods in the RITE dataset. (a) Original image (b) Result of expert ophthalmologists (c) Result of Xiayu method [21] (d) Result of AlBadawi method [26] (e) Result of Galdran method [40] (f) Result of Meng method [38] (g) Result of proposed method.
The arterioles and venules classification results compared with the other methods in the RITE dataset
From Table 3, with RITE dataset, the accuracy of the proposed method is 97.6%while the accuracy of Xiayu method, AlBadawi method, Galdran method, and Meng method are 92.4%, 93.7%, 89.0%and 96.5%, respectively.
Figure 10 presented the result of a test case of the proposed method with Xiayu method [21], AlBadawi method [26] and Meng method [38] in the AVRDB test dataset. In this case, the result of the proposed method is also better than the other methods. Table 4 presented the results of arterioles and venules classification results which compared to the other methods in AVRDB dataset.

The arterioles and venules classification results compared with the other methods in AVRDB dataset
From Table 4, with AVRDB dataset, the accuracy of the proposed method is 96.9%while the accuracy of Xiayu method, AlBadawi method and Meng method are 91.9%, 93.1%and 95.9%, respectively. From Table 3 and Table 4, we see that the results of the proposed method are better than the other methods over the RITE dataset and AVRDB dataset.
As presented in Section 3.1 and Section 3.3, the MoU-Net architecture with more depth (5 pooling, 5 unpooling) versus U-Net (4 pooling, 4 unpooling) makes the feature filtering from the input image better than the U-Net. Moreover, to further improve the classification results, we also use a graph cuts algorithm to perform the classification by the global minimization of the built energy function. While Xiayu method [21] evaluated on the correctly detected vessels, AlBadawi method [26] evaluated on known vessel centerline locations, Galdran method [40] evaluated on providing pixelwise uncertainty estimates, Meng method [38] evaluated on the whole detected vessels, the proposed method is evaluated based on improving the U-Net architecture and graph cuts optimization. These are the reasons why the proposed method gives the better results versus the other methods.
The central retinal artery and its branches supply blood to the inner retina. The inner retina is made up of the retinal nerve fiber layer, the ganglion cell layer, and the inner reticular layer. The central retinal artery arises from the ophthalmic artery, which is the first branch of the internal carotid artery in most people. Vascular manifestations in the retina indirectly reflect vascular changes and damage in organs such as the heart, kidneys, and brain because of the similar vascular structure in these organs. Therefore, classifying arterioles and venules in retinal images is a necessary first step as a premise for the diagnosis of diseases such as hypertensive retinopathy, diabetic retinopathy, etc. In this paper, we proposed the method for classifying the retinal vessels into the arterioles and venules based on improving the U-Net architecture and graph cuts with four steps as presented above. The results of the proposed method are better than the other methods on RITE dataset and AVRDB dataset. However, we have not compared the execution times between the methods and on many other datasets. In the future, we will continue to evaluate the execution time, testing on more datasets. Furthermore, we will detect abnormalities in blood vessels to assist in diagnosis.
Footnotes
Acknowledgments
This research is funded by Vietnam National University Ho Chi Minh City (VNU-HCM) under grant number B2019-20-05. We acknowledge the support of time and facilities from Ho Chi Minh City University of Technology (HCMUT), VNU-HCM for this study.
