Abstract
Medical decisions, especially when diagnosing Hepatitis C, are challenging to make as they often have to be based on uncertain and fuzzy information. In most cases, that puts doctors in complex yet uncertain decision-making situations. Therefore, it would be more suitable for doctors to use a semantically intelligent system that mimics the doctor’s thinking and enables fast Hepatitis C diagnosis. Fuzzy ontologies have been used to remedy the shortcomings of classical ontologies by using fuzzy logic, which allows dealing with fuzzy knowledge in ontologies. Moreover, Fuzzy Bayesian networks are well-known and widely used to represent and analyze uncertain medical data.
This paper presents a system that combines fuzzy ontologies and Bayesian networks to diagnose Hepatitis C. The system uses a fuzzy ontology to represent sequences of uncertain and fuzzy data about patients and some features relevant to Hepatitis C diagnosis, enabling more reusable and interpretable datasets. In addition, we propose a novel semantic diagnosis process based on a fuzzy Bayesian network as an inference engine. We conducted an experimental study on 615 real cases to validate the proposed system. The experimentation allowed us to compare the results of existing machine learning algorithms for the Hepatitis C diagnosis with the results of our proposed system. Our solution shows promising results and proves effective for fast medical assistance.
Keywords
Introduction
Hepatitis C is inflammation of the liver caused by a virus called HCV, which can lead to acute hepatitis such as chronic hepatitis. The severity of hepatitis C changes and can range from a mild form, lasting only a few weeks, to a serious illness that lasts for life and cause an estimated 1.4 million deaths annually [1].
Early detection and treatment of the hepatitis C virus can reduce the number of deaths caused by the virus. Nevertheless, detecting the hepatitis C virus at its early stage is a difficult task due to the uncertainty associated to HCV prediction. This uncertainty may occur due to several reasons, notably, (a) the medical field is uncertain by nature and the relations among the presence of the virus in such a patient and its risks factors, symptoms,... etc is uncertain (randomness), (b) it may take place also due to the subjective doctor’s decisions (epistemic uncertainty), (c) inaccurate and incomplete information about patients. Also, (d) it may appear due to the ambiguity and the inadequate flow of communication in the hospital.
Despite all these challenges, doctors are making very difficult decisions. Hence, it seems very interesting to design and implement automated healthcare systems that consider these challenges and ease the diagnosis task for the doctors.
Semantic Web technologies notably ontologies have been adopted in many medical-aided systems, they allow representing the main features of the domain concepts and the relations among them in a structured format readable and treatable by machines. Several interoperable health care systems such as [2–5] have been implemented based on classical ontologies. Nevertheless, these systems need to be extended and enhanced in order to deal with the uncertain knowledge related to the medical field. Indeed, classical ontologies fail to handle the uncertainty of the medical field which is inherently presented in most of the medical field scenarios.
Fuzzy ontologies on the one side allow representing and reasoning with vague and imprecise knowledge that may appear inherently in classical ontologies. Recently, they have been applied successfully in large variety of domains such as [6–11]. Nonetheless, they cannot handle the probabilistic knowledge involved in the domain.
Fuzzy Bayesian networks on the other side are deemed efficient models for dealing with the knowledge in rich-uncertainty domains. In particular, they represent a set of random variables (crisp or fuzzy), their probabilistic relationships and allow dealing with probabilistic events with vague knowledge. Many experiments have proved the benefits of FBNs in a wide diversity of domains and applications such as contributions presented in [12–17].
Lately, several studies have been proposed to diagnose HCV; some of these studies are based on Machine Learning (ML) methods [18–22]. These studies lack semantics when processing data; whereas other studies use classical ontologies as a knowledge base [23–26]. However, classical ontologies have limits when representing and processing medical uncertain medical data. In contrast to the above-mentioned studies, our study proposes a promising aided system to diagnose HCV based on probabilistic inference with fuzzy evidence. The developed system is mainly based on a fuzzy ontology and a fuzzy Bayesian network to benefit from the advantages of both models. The proposed system is powerful and it ensures the interoperable information system in health care. The main contributions of this paper can be briefly summarized as follows: We implement a system to diagnose HCV, which combines the expressiveness of fuzzy ontologies and the power of reasoning provided by fuzzy Bayesian networks under uncertain data. We propose a fuzzy ontology to model fuzzy Data semantics about patients and populate it with individuals from real patient Dataset. We propose a semantic reasoning process to predict the state of the Hepatitis C Virus based on the fuzzy Bayesian network for a given patient stored in the fuzzy ontology.
The rest of this paper is organized as follows: Section 2 explores the background knowledge of the research; section 3 presents the architecture of our proposed system, and section 4 presents the experiments results of our proposed system. Finally, section 5 concludes the paper and presents our future works.
Conceptual background
In this section, we briefly recall some necessary background knowledge including Bayesian networks, fuzzy Bayesian networks, and fuzzy ontologies.
Fuzzy Bayesian networks
Bayesian networks (BNs) [27–29] are hybrid models, which combine the graph theory and the probability theory. They allow dealing with probabilistic knowledge, which is related to statistical experiments and on the frequency of occurrence of an event [30].
Fuzzy Bayesian networks combine the capabilities of Bayesian networks and fuzzy logic to benefit from the advantages of the two models. They allow dealing with probabilistic events with fuzzy evidence.
N = {X1, …, X
m
} is the set of the nodes that constitute G. We also define N= Φ ∪ Ψ.
Φ = {Z1, …, Z
k
} is the set of the fuzzy nodes of F, with cardinally k, and with Φ⊆ N. Ψ = {Y1, …, Y
t
} is the list of the crisp nodes of F, with cardinally t, and with Ψ⊆ N. A = {(X
i
, X
j
)}, with X
i
, X
j
∈ N, is a set of arcs. Each (X
i
, X
j
)∈ A represents a dependency link between X
i
and X
j
. P is the probability distribution of F. M = {μF1, μF2, …, μ
Fl
} is a finite set of the membership functions used to fuzzify fuzzy nodes.
Each node X ∈ N has a set of finite states S={x1, , …, x
a
}. If X ∈ Φ. i.e., it is fuzzy, per each x
i
∈ S, a membership function μ
Fi
∈ M will be associated to x
i
in order to fuzzify it.
Virtual Evidence (VE) is a method to incorporate evidence with uncertainty in Bayesian inference, proposed by Pearl in [27]. Virtual evidence is interpreted as evidence with uncertainty that quantifies observer’s strength of confidence toward the observed event [32].
The virtual node allows storing the uncertainty and the imprecise knowledge about an observed event [27, 33–36]. The idea is to add a virtual evidence node to each fuzzy node as a child node.
The merit of this method is that it allows the use of both approximate and exact algorithms to incorporate uncertainty when computing the inference (updating the belief)and without changing those algorithms. The choice of the algorithm type depends on the complexity of the studied problem, thus when the complexity of the problem arise approximate algorithms are more appropriate.
Fuzzy evidence can be interpreted as evidence with vagueness that quantifies the gradual belonging of an observation to different states of the considered node (lack of precision of an observation).
Consider the nodes depicted in Fig. 1, which represents a node X and its virtual node Vx. The CPT of Vx is given in Table 1.

A node X with its virtual node Vx.
The CPT of Vx
Then, the belief distribution of X based on a fuzzy evidence
Where λ is a normalization constant.
Fuzzy ontologies extend the components of the standard ontologies in order to allow them dealing with vague and imprecise knowledge [37–39]. To make their representation possible in Semantic Web technologies, several extensions of semantic Web language were developed. To this end, an extension to the Web Ontology Language (OWL2) named Fuzzy OWL 2 [40] is proposed in order to make fuzzy ontologies representation possible. It uses OWL 2 annotation properties in order to encode fuzzy ontologies semantics.
Fuzzy OWL 2 allows representing fuzzy concepts, fuzzy properties (or roles), fuzzy data types, fuzzy modifiers, and fuzzy axioms. While there is no standard for representing fuzzy ontologies, we use in this study Fuzzy OWL 2 which provides the necessary means for fuzzy ontology representation. Fuzzy concepts represent the concepts which have no clear boundaries, they can be used to represent fuzzy sets of individuals. When a concept is fuzzy, its individuals are associated with some degrees of membership that are always in the interval [0, 1]. For example, the temperature 24 degree could be classified as an instance of a fuzzy concept Hot_Temperature with a degree of 0.8. Fuzzy properties are divided into two classes. On the hand, Fuzzy object properties allow fuzzy binary relations among concepts or individuals. They permit to assign some degree to the association among the instances of concepts (crisp or fuzzy). For example, a fuzzy relationship Distance can be used to represent a vague statement, such as “the distance between Region A and Region B is far with a degree of 0.9”. On the other hand, fuzzy data properties are used to assign a degree to the association between an individual and a data value. Fuzzy data types are used to fuzzify attributes values of concepts, such as the range of data properties. They can also be attached to a concept instance. For example, the patient X has temperature "Hot_Temperature". Fuzzy Modifiers are used to change the interpretation of fuzzy concepts and fuzzy datatypes. Fuzzy axioms ensure, for example, that concept and role assertions can involve vagueness. For example, the membership of an individual to a fuzzy concept is not certain. The ontology hierarchy can also involve vagueness. For example, subsumption among two concepts is vague, and partially holds.
DATIL [41] (DATatypes with Imprecision Learner) is framework that automatically learns fuzzy data types for fuzzy ontologies from different types of inputs. Datil implements several unsupervised clustering algorithms: k-means, fuzzy c-means and mean-shift.
For each data property in an ontology with a numeric range Datil collects an array of real numbers corresponding to the values of the property for different individuals. A clustering algorithm provides a set of center of mass from these tables of values. These centers of mass are used as parameters to construct fuzzy membership functions partitioning the domain. An example is illutrated in Fig. 2.

Example of membership functions built from the centroids [41].
Fuzzy DL [42] was proposed in order to allow fuzzy ontology representation and provides reasoning services. It provides several interfaces to eases handling fuzzy ontologies.
To sum up, fuzzy ontologies provide an adequate support for handling vague and imprecise knowledge of real-world problems, thus they consist of crisp components which are the components whose meaning is precise (well defined components), and vague components whose meaning is vague and imprecise.
This section reviews the most related contributions to our work. It classifies the well-known approaches into two classes: (a) Ontology-based approaches and (b) Machine Learning approaches.
Machine learning based approaches
Authors in [18] proposed a system that is capable of predicting whether a patient suffering from HCV is likely to survive or die. The result of the network showed
In [19], a comparative study is presented to predict the diagnosis of HCV. Several machine learning algorithms were selected, especially, Linear Classifiers: Logistic Regression, Naive Bayes Classifier, Support Vector Machines, Decision Trees, Random Forest, Neural Networks, Boosted Trees, Nearest Neighbor. The reported results indicate that the Logistic Regression algorithm gives the best accuracy of
In [20], a study has been carried out to predict the diagnosis of HCV. In the first step, the data set was passed in a Data Preprocessing and Balancing module and then, the performances of several machine learning algorithms are compared, notably, Random Forest (RF), K-nearest neighbor (KNN), Decision Tree (DT), Naive Bayes (NB), Logistic Regression (LR), Xgboost classifier (XGB), Support Vector Machine (SVM). The study reported that KNN shows better result than other classifiers with the highest accuracy of (
Lately, in [21], an automatic system is designed to build a clinical risk models to predict the extent of fibrosis in patients with chronic Hepatitis C. Nine ML algorithms have been investigated based on patient demographics and test laboratory values. The study compares Logistic Regression, Naïve Bayes, Decision Tree, Random Forest, Extreme Gradient Boosting, k-Nearest Neighbor, Support Vector Machine, Neural Networks and Ensemble Method. It is reported that Extreme Gradient Boosting was able to evaluate for fibrosis with the high accuracy of
Authors in [22] have compared two machine learning algorithms, which are
Ontology based approaches
The study presented in [23] proposes a process for developing an ontology for the HCV infection. The developed ontology is represented in OWL format and is implemented using the Protégé-OWL framework. The validation of the proposed ontology is performed by experts.
Authors in [24] focusing on constructing and querying a hepatitis ontology, aims to provide a framework for ontology-based medical services. The paper concerns the algorithm of query expansion for the hepatitis ontology, including synonym expansion, hypernym-hyponym expansion and expansion of similar words. It applies semantic similarity compute the similarity of retrieval terms.
In [25] the hepatitis ontology ‘HEPO’ has been designed and validated for designing abductive medical diagnostic systems. It supports abductive reasoning for ontology-driven medical diagnostic systems. This ontology can assist domain experts in order to diagnose hepatitis patients and can also help them to give suggestions for the treatment according to the patients’ state and hepatitis types. It covers three hepatitis types including Hepatitis A, Hepatitis B, and Hepatitis C.
In [26] the Viral Hepatitis Ontology is developed using ontology of Biomedical Reality (OBR) framework 1 . It includes a different types of viruses A, B, C and D viruses, which are the most widely spread among males and females. This Ontology is encoded in the OWL Language. The Viral Hepatitis Ontology aims at ensuring the semantic interoperability between Intelligent Systems and Physicians so as to share, reason, and exploit this knowledge in different ways.
Materials and methods
We present through this paper a suitable solution diagnosis HCV with uncertain data, which uses fuzzy ontologies and fuzzy Bayesian networks. It grants the opportunity to handle probabilistic and vague knowledge at the same time. Thus, our system has the following outstanding features: The representation of fuzzy medical knowledge using fuzzy ontology. This ontology defines structured vocabularies, grouping together useful concepts of a domain and their relationships which serve to organize and exchange fuzzy knowledge in an unambiguous way. The representation of probabilistic and incomplete knowledge by exploiting the capacities offered by FBNs. The fuzzy Bayesian network is used primarily to calculate the probabilities of events under fuzzy observations related to each other by cause and effect relationships. Dealing with fuzzy and probabilistic knowledge at the same time.
Our system consists of 3 different layers as shown in Fig. 3. Its main Components are:
Fuzzy ontology: facilitates the most direct expression of patient and hepatitis C data using fuzzy ontology. This ontology forms the fuzzy knowledge base of our system. In this fuzzy knowledge base, a clear distinction has been made between fuzzy data and certain data. This ontology plays the role of a knowledge base in our system; it will be queried by the inference engine to retrieve data about a given patient. Fuzzy Bayesian network: represents the probabilistic knowledge of the domain as a causal graph.It will be used by the inference engine to compute the probability of a diagnosis based on the information stored in the fuzzy ontology. Query engine: handles queries received from the application layer. Probabilistic reasoning engine: contains the inference engine which will be responsible for determining the medical diagnosis for a given patient based on the information entered. The inference engine in our system takes into account fuzzy data and probabilistic data to determine the medical diagnosis. Effectively, it is based on a FBN model to calculate the likelihood of a patient having hepatitis C or not using information stored in our system’s fuzzy knowledge base.

The architecture of fuzzy ontology and fuzzy Bayesian network based HCV diagnosis system.
The creation of FuzOnto-HCV starts from the need of representing data rows about patients including their risks information and their tests as well as to identify whether a patient is has HCV or not based on a list of its evidence.
In order to develop the fuzzy ontology, we have followed the methodology proposed in [40], which consists of two steps:
Firstly, we have developed a core part of the ontology using the free ontology editor Protégé
2
. The FuzOnto-HCV consists of a several classes as illustrated in Fig. 4:
The class The class The class

FuzOnto-HCV classes.
The core ontology developed and visualised in Protégé is illustrated in Fig. 5.

FuzOnto-HCV model.
Secondly, we have modeled the fuzzy part on the ontology based on additional annotations properties using Fuzzy OWL 2 plugin
3
. Indeed, all the fuzzy data properties were annotated semantically by fuzzy labels in order to represent the vague knowledge using membership functions with the arguments shown in Table 2. These membership functions are defined automatically using DATIL and their annotations using fuzzy DL are shown in the Fig. 6. For instance, the fuzzy class
Fuzzy concepts

Membership functions annotations using fuzzy DL.
Represents a collection of Albumin instances whose value is assigned with the LowALB data type.
The enrichment of HCV Fuzzy-Ontology consists in feeding and populating the ontology by adding instances to the concepts. The dataset used in our system contains laboratory values from blood donors and patients with hepatitis C and demographic values such as age and sex [43]. It contains 615 cases, all of the dataset values were collected at the moment of medical examination.
The steps of the proposed HCV Fuzzy-Ontology enrichment as formulated in
Proposed fuzzy Bayesian network
This step phase aims to model the probabilistic knowledge using FBNs. Thus, this section describes the two steps of modeling.

The structure of the FBN.

The FBN with parameters.
This sub section explains the newly proposed prediction model by using fuzzy Bayesian network for predicting the diseases and also measures the severity of the diseases.
The reasoning process for the diagnosis is shown in Fig. 9 and it proceeds as follows: A controller will check the validity of the data. If the data is not valid. So, notify the doctor that the data is not valid If the data is valid. So, store the data in the fuzzy ontology and go to Subsequently, the recorded data will be fuzzified according to the membership functions already listed in table 2. Send the fuzzified data as fuzzy evidence to the FBN. Compute the inference for diagnosis node. The system will choose the state which has the highest probability as diagnostic.

Semantic HCV diagnosis prediction process.
This section describe an empirical study to evaluate the effectiveness of our proposed system.
The first experiments aim at comparing the proposed approach to machine learning classifiers (LogisticRegression, SupportVector Classifier, DecisionTree, RandomForest and GaussianNaive Bayes). In this experimentation, we opted for the cross-validation strategy, which permits using the entire dataset for training and validation simultaneously. Indeed, we divide the dataset in
The evaluation parameters used for each class are: True Positive (TP), True Negative (TN), False Positive (FP), False Negative (FN).
In fact, our study is about a multi-class classification problem with five classes. Based on on the confusion matrix presented in Table 3, an illustration of the performance parameters for each class can be found in Table 4.
Confusion matrix
Confusion matrix
Illustration of performance metrics
The histogram in Fig. 10 outlines the results of the performed evaluation. It shows that the experimental results for various machine learning classifiers provide fewer predictions performance regarding our proposed system. The proposed system outperforms machine learning classifiers for all values of k= 2, 3, 4, 5, 6, and 7. Indeed, our system has exhibited promising predictive performance with the highest accuracy for each value of k, especially when k=2 (it reach an accuracy of 95.4). Furthermore, the second comparative analysis of the performance in terms of accuracy of our proposed system was carried out with some previous studies.

Performance of HCV diagnosis prediction.
Figure 11 illustrates the results of this analysis. We can see from Fig. 11 prestudies that our proposed system based on a fuzzy ontology and a fuzzy Bayesian network presents the best value of accuracy (95,4%) compared to 85%, 94,4%, 81, 87,17%, and 83,6% obtained by previous studies proposed in the literature [18–22] respectively.

A comparison of HCV diagnosis prediction between the proposed system and some previous studies.
These results can be interpreted by the fact that our system is based on the fuzzy probabilistic inference provided by FBN. It can predict diagnosis for a given patient and allow dealing with randomness, vagueness, and incompleteness at the same time:
To sum up, the obtained accuracy by our system is closer to 100 %, which means that the prediction based on our system is close to the doctor’s prediction. Therefore our system is capable of acting more similar to human expertise.
The authors of this paper have proposed a novel medical decision support system, which can work under uncertain knowledge. It uses mainly a fuzzy ontology and a fuzzy Bayesian network to predict HCV. The current paper demonstrates the great potential of fuzzy ontology and fuzzy Bayesian network in tackling the uncertainty in knowledge for diagnosing HCV by facilitating complex decision-making.
The efficiency of the proposed approach was demonstrated to show the expressiveness offered by the fuzzy ontology and the capability of the fuzzy probabilistic reasoning provided by the Fuzzy Bayesian network. Indeed, the conducted evaluation of our system demonstrates that fuzzy ontology and fuzzy Bayesian network have improved the performance of the HCV diagnosis task. The merit of our proposed system is that it uses a fuzzy ontology, which allows representing the HCV data as semantically structured knowledge in a way that is understandable and processable by machine, especially if our system is deployed in open environments such as IOT. Also, the advantage of using a Fuzzy Bayesian Network is that it can predict the diagnosis for a given patient even in the case of missing values and with incomplete data.
To expand our work, we envisage improving our medical diagnostic aided system, which is based on a fuzzy Bayesian network model and a fuzzy ontology by: Experimentation of the system as part of a home follow-up for patients with chronic diseases. Enrichment of the ontology by other diseases and taking into account factors of uncertainty. The study can be further extended to detect HCV automatically based on data captured by the Internet of Things(IoT) sensors.
Footnotes
Acknowledgements
The authors would like to thank the Algerian Ministry of Higher Education and Scientific Research (MESRS) and the General Direction of Scientific Research (DGRSDT) for their support. We would like also to thank the anonymous reviewers for their valuable comments on the first version of this paper.
A framework developed in order to build biomedical Ontologies so as to facilitate inferences across the boundaries in anatomy Ontology, pathology Ontology, etc.
http://protege.stanford.edu
https://protegewiki.stanford.edu/wiki/FuzzyOWL2
