Abstract
The use of electronic media for the detection and monitoring of animal disease outbreaks is crucial for disease surveillance and early warning systems. Animal health specialists regularly query web pages using various formulations to obtain up-to-date news on disease outbreaks. This task, however, is often manual and time-consuming. Visualization techniques can nevertheless facilitate their web searches, compared to traditional searches. This article presents EpidVis, a visual web query tool designed for experts in animal health, conducting epidemic intelligence activities from news sources on the Internet. It consists of several views that help the domain experts efficiently build and launch queries, as well as visualize the results. Moreover, it supports external information integration to help domain experts enrich their knowledge and adapt their queries. EpidVis was assessed considering usability (user study) and usefulness for experts (case study). The results show that our tool helps domain experts in their daily surveillance tasks, allowing them to extract in timely manner accurate information on disease outbreaks from the web.
Keywords
Introduction
There is currently a substantial global increase in published news related to the occurrence of disease outbreaks. For risk managers, this reinforces the need for continuous monitoring of emerging and new disease outbreaks. However, most of the news sources publish data on human diseases and less on animal diseases, despite the strong interactions between animals and humans. In addition, animals are the origin of many zoonotic pathogens to humans, 1 such as influenza (transmitted by air) and rabies (transmitted by bite and saliva). Therefore, in this article, we focus on the animal health domain by proposing a tool that will reinforce the monitoring of online news on animal disease outbreaks.
In the past two decades, we have seen the expansion of the World Wide Web as a primary source of information.2–4 News regarding animal disease outbreaks, such as the emergence of a new pathogen, is generally first posted online via unofficial sources before being reported and disseminated by official organizations (e.g. World Organisation for Animal Health (OIE)). Delays in official reporting are generally due to internal validation of information at different levels until official notification to OIE. As a consequence, this may lead to further spread of disease, especially if control measures are not rapidly implemented. Animal health specialists (hereafter referred to as domain experts) are thus challenged to monitor multiple web sources on a daily basis and stay alert for potential animal outbreak information.
The main challenge in this task is to effectively formulate queries in order to obtain the correct information. The simplest way is to formulate web queries manually, which is time-consuming. Moreover, domain experts are rarely familiar with the use of logical operators (e.g. OR and AND) to ensure high-quality results. This process underlines the need for a tool to formulate complex queries and ensure access to the sought-after information faster than via traditional searches. Visualization offers promising ways to help domain experts by providing far richer information descriptions than the word-based counterparts. 5 This could help them manage complex data and express their knowledge in the form of graphical objects, thus facilitating the formulation and launch of queries. Current searches display web data in the form of maps or statistical diagrams6,7 rather than helping to refine the queries.
We propose a new visual interactive web querying tool, EpidVis, that helps build and launch queries based on domain expert knowledge and facilitates the visualization of web results (https://youtu.be/DixKWcIDDXs, accessed 7 April 2019). Furthermore, it integrates external knowledge to enrich queries and provide more accurate results.
Domain problem characterization
Background
The work that we present in this article is destined to health professionals conducting epidemic intelligence (EI) activities on infectious diseases using online news sources, whether for animal or human health. EI encompasses all activities related to early identification of potential health hazards, their verification, assessment, and investigation in order to recommend health control measures.
We apply our original approach in collaboration with the EI team of the French Platform for Animal Health Surveillance (in French: Veille Sanitaire Internationale (VSI)). Since 2013, the VSI team checks daily online news sources for information on animal health hazards not present in France but threatening animal populations in France. In order to detect relevant news, the domain experts use a list of keywords of names of diseases, hosts, and symptoms. Once detected, the VSI team verifies the news and issues early warning reports to French animal health authorities and animal health professionals when potential threats are detected.
Knowledge extraction workflow
The first suspicion of a disease outbreak is the appearance of a number of (non-) specific symptoms in susceptible animals (hosts). Therefore, for EI purposes domain experts are interested in identifying early signals of disease emergence, including affected hosts and manifested symptoms. Figure 1 shows the knowledge extraction workflow. They start with a list of keywords, usually stored in a text file. They have some additional knowledge in their mind about these keywords such as their relation (for instance, a given symptom is related to a given disease). Based on this knowledge, they construct a query copying/pasting the keywords and then launch it on a web browser. The results obtained help them extracting knowledge. Sometimes, they need to refine the query as the results do not match with what they were looking for. Sometimes also, they extract from the results new keywords and relations that enrich their knowledge. Finally, colleagues or text-mining process can suggest them further keywords to enrich their own initial knowledge.

Knowledge extraction workflow.
The main drawback of this current practice is the lack of a unique platform for managing all the steps of the information extraction workflow. Storing keywords and their relation in a text file or in their mind induces loss of information. They do not know how to manage external knowledge. The queries they create are poor and can miss some important keywords. They cannot easily save the interesting articles and retrieve them later. That is why, in this article, we propose EpidVis, a unique platform integrating all the features to perform the workflow described earlier.
EpidVis is a part of a larger project for animal disease surveillance involving epidemiologists, Natural Language Processing (NLP), and InfoVis researchers. The work presented in this article deals with the identification of relevant web page for epidemiological analysis, as described in the aforementioned workflow. This is the first part of our Epidemic Intelligence System. The second part consists of extracting epidemiological keywords (e.g. symptoms and hosts) and spatio-temporal entities from these pages with text-mining approaches. Finally, the last part is to create a visual analytics tool based on spatio-temporal information for exploring the web results extracted with EpidVis.
Requirement analysis
After a first interview with the domain experts to identify the aforementioned workflow and understand the difficulties encountered with current practices, computer scientists formulated a list of requirements that a tool should fill to help them. Then, these requirements were proposed to the experts and adjusted with their feedback to converge to the following list.
[R1] Keyword management
Domain experts use their knowledge on diseases in order to suggest keywords and launch queries on search engines such as Google, Bing, and Yahoo. More specifically, domain experts use three main categories of keywords: (1) diseases, (2) hosts, and (3) symptoms. For example, in order to detect online news on outbreaks of avian influenza, domain experts use the following query: “influenza chicken lethality.” In this case, “influenza” is the disease, “chicken” is the host, and “lethality” is the symptom. Currently, the keywords are stored and updated manually in a text file, and queries are also launched manually. To overcome these limitations, a visualization tool can help domain experts to organize/express their knowledge through graphical elements. As a result, domain experts could visually build and launch queries.
[R2] Relationship management
Discussions with domain experts highlighted the importance of taking into account strong or weak relationships between the keywords of different categories. This is especially important for diseases that affect multiple hosts, with different receptivity. For example, avian influenza virus principally infects birds; influenza has thus a strong relationship with birds. However, avian influenza infections can potentially occur in humans, horses, and pigs, thus having a weaker relationship between this pathogen and these hosts. With the current approach of querying the web, the domain experts launch mainly queries with strong relationships, thus limiting the possibility of detection of weak signals of disease emergence in hosts other than the principal hosts. Therefore, a visual comparison/analysis of the strength of the relationships between keywords is crucial for domain experts and their decisions about which queries to build.
[R3] Integration of external knowledge
Since 2016, the VSI team additionally uses external knowledge to build queries, for example, keywords proposed by disease specialists or keywords extracted with text mining from a corpus of relevant news reports on a given disease. 8 Domain experts manually collect and store relevant news on a given disease, automatically extract relevant keywords, 9 and manually update the list of keywords and build the queries. These actions are long and tedious, especially when news is long and multiple. Thus, a visualization can help domain experts to easily analyze the collected news reports and build queries from the mentioned relevant keywords in the news.
[R4] Result management
Domain experts require a visual representation of the resulting news reports. They also need to dynamically refine the query based on the results they have previously obtained. Finally, they want relevant results to be filtered and saved for further analysis and interpretation.
Related work
The design of visual query systems (VQS) is an old issue, and an abundant literature is available on the topic. Catarci et al. 10 proposed a survey of the first systems classifying them according to the kind of representation they are based on: forms (which means table in their article), diagrams, icons, of hybrid. Note that in our context, users need to manipulate keywords with their relationships in order to construct their query. Such a dataset can be modeled as a graph, in which nodes are keywords and links are their relationships. A popular and intuitive approach to visualize graphs is to draw them as node–link diagrams, 11 where nodes are represented as points and links are represented as lines between the points. We follow this approach. Thus, EpidVis naturally falls into the diagram-based representation for the query visualization. Concerning the result representation, it is a form based, as it enhanced classical web browser visualization.
In this section, we first review the main classes of VQS from the query construction perspective. Then, we list the visual systems currently used in epidemiology. As this article focuses on the query construction used to retrieve web pages, we do not review the document visualization tools such as Jigsaw, 12 except when they deal with epidemiology surveillance. The interested reader can refer to a dedicated survey 13 for an introduction (textvis.lnu.se, accessed 5 March 2019).
Visual querying and suggestion systems
Traditional VQS propose to help users constructing SQL queries to extract data from relational databases. For instance, SeeDB
14
is an interactive platform where selection lists are available to dynamically create the SQL queries. Then, when the user launches the query, classical bar charts show overviews of the results. Another example of this kind of tools is Polaris.
15
In this case, drag and drop features are provided to select fields of the database. Then, a table-based visualization containing point clouds represents the corresponding part of the database. Additional widgets such as selection lists and checkboxes enable selection of the structure of the table and the rendering of the displayed elements. The purpose of these VQS is to query relational databases without writing SQL queries (they are automatically created via the graphical interface). This purpose differs from ours while we query the web. However, they show how traditional widgets can be used to design complex queries, which is a point we explore in our
Another kind of approach contains complex visual interfaces to explore multidimensional datasets. Krueger et al. 16 proposed VESPa 2.0 to select movement sequences. In this case, the dataset contains series of trajectories and the visual interface helps constructing graph-based patterns to extract the corresponding trajectories. A timeline and a map show the results. Even if the dataset differs from ours, providing a graph-based view to design queries is interesting in our context because our dataset of keywords is also a graph. An original approach to query multidimensional data from different sources is DataMeadow. 17 The authors provide a VQS based on a set of interactive radial views that help selecting specific ranges on attributes and spread the selection to the other views. Here also, the dataset differs from ours, so the solution cannot be directly used in our context. However, the radial layout and the use of Boolean operators are similar to EpidVis.
An interesting set of VQS help querying multivariate graph datasets.18,19 They propose an interface to create a graph pattern where the user can specify ranges of values associated with nodes and/or links. When they launch the query, the subgraphs matching the pattern are returned. These approaches are related to ours because the visual query is also constructed with a graph. However, in our context, this graph helps constructing a web query instead of being directly launched to query a graph dataset. An intersecting functionality provided by some graph querying systems is to assist the user when designing a query by suggesting additional data. For instance, Yi et al. 20 and Cuenca et al. 21 described systems in which additional edges and nodes are proposed to refine the current query in regard of the whole graph dataset. In EpidVis, [R3] consists of integrating external information to the list of keywords used to construct the queries. While previous systems enable refining the queries with elements from the dataset on which the queries are launched, we need to provide a system that helps refining the initial set of keywords without considering the target dataset.
The set of VQS dedicated to the exploration of microblog messages is of particular interest in our context. For instance, TreeQueST 22 is a visual tool to explore Tweets. The user starts by creating a first “seed” query. Then, different views are proposed to explore a topic hierarchy related to the initial query and refine it. A view also shows the resulting list of tweets. Another example of this kind of VQS is ScatterBlogs2. 23 It is a visual interface containing several views helping filtering tweets and refining queries function of the initial set of data extracted. These approaches develop the idea of suggesting queries refinement based on previous query results analysis. This possibility is not valuable for our purpose because the domain experts already have a strong knowledge of the keywords they need for creating the query. They require functionality to manage these keywords and their relationship but not to automatically extract suggested keywords from a list of previously obtained results.
A last group of VQS contains tools to query the web, such as EpidVis. For instance, with VisiQ, 24 a user first enters a set of keywords/terms. Then, a view shows the related query space, which is a bipartite graph containing the concepts and the terms related to the initial set of keywords. The user can select additional terms to enrich the query and send it on the Google Search engine. InfoCrystal 25 is another example of web VQS. The purpose of this approach is to provide a visual query language to create complex queries. After selecting a set of keywords, an iconic display shows all the possibilities to construct a query involving these keywords related by logical operators. As we will see in section “EpidVis design,” EpidVis also helps construct queries containing logical operators. The main difference is that the types of operators between the keywords depend on the categories of the keywords (diseases, hosts, and symptoms). Thus, we do not need to provide all the possible Boolean queries, and the visual design is less complex.
Visualization in epidemiology
Most visualization tools in epidemiology 26 advocate the web results in relation to a given disease outbreak. These tools use results from online news media and public health sources and plot the spatial information on a map,6,7,27,28 or use statistical diagrams to represent aggregated information,29,30 or both. 31 For example, HealthMap 6 uses text processing algorithms for querying, filtering, and visualizing unstructured online reports on disease outbreaks. The system plots the disease and spatial location of a potential outbreak on an interactive map. To filter specific results, users select their choice from a fixed list of diseases.
Other tools focus on descriptive visualization such as Gapminder, 29 which represents global disease trends using a bubble map. Some visualization tools focus on a particular disease, such as Nextflu, 30 which monitors influenza sequence data from the GISAID EpiFlu database. This tool shows a phylogenetic tree corresponding to the disease information: mutation, genotype, sampling location, and statistic information. In GapMinder and Nextflu, the visual queries are used as methodologies to select the data displayed. For example, a visual query can be the filtering of diseases by selecting a time period using sliders.
Other tools simulate and display a disease spread. For instance, GLEaMviz 27 simulates the human-to-human disease spread across the world (www.gleamviz.org, accessed 5 March 2019). Users first create a compartmental model with the following three layers: geographic information about population data, geographic information about the mobility of the people, and an epidemic model of the infection dynamics. Next, the model is launched in the GLEaMviz database in order to simulate the disease. A dynamic map shows the evolution of the disease spread.
Most of the epidemiology visualization tools that we described earlier are result-oriented rather than problem-oriented: they focus on providing efficient tools for exploring a given data collection, but they do not provide any feature to extract interesting data collections from the web. By proposing an interactive tool for designing complex web queries, EpidVis fills this gap.
EpidVis design
EpidVis is an interactive visual tool for querying online news sources on animal disease outbreaks. Multiple visualization and interactive features of EpidVis allow domain experts to: (1) express their knowledge with keywords and relationships thereof ([R1],

EpidVis overview. (a)
KEYWORD MANAGER
The
Visual mapping
Entities and relationships can be modeled as a graph, where nodes represent keywords [R1] and weighted links represent relationships [R2]. A common way to visualize graphs is to use a node-link diagram, where nodes are represented by points and links by lines. 32 In our context, the nodes are grouped into three categories, and links only connect nodes from different categories. Such a structure is called a tripartite graph. An efficient approach to visualize a tripartite graph as a node–link diagram is to use a Hive plot: 33 nodes of the same category are plotted along axis organized radially, and links are shown as curves between them (Figure 2(a)). A benefit of this type of layout, compared to a traditional force-directed layout, 34 is to highlight the categories of the nodes by representing them on different portions of the plane. The radial organization of the axis representing the categories also avoids inducing a hierarchy between these categories and/or the links. This would not be the case if we had employed a Sugiyama-style layout with parallel lines 35 or concentric circles 36 to represent the categories.
A category is associated with each axis and identified by a specific color of nodes (blue for diseases, orange for hosts, and green for symptoms). These colors have been selected to avoid color blindness issues and the lightness is the same to avoid perceiving one type of keywords as more important than the others (projects.susielu.com/viz-palette?colors=["#6ec4a9","#fc6c36","#5babd7"], accessed 5 March 2019).
The user can add nodes, which appear first with a fixed size. He or she can merge keywords which are semantically similar (see below), the size of the merged node is proportional to the number of embedded keywords. They are ordered in each axis based on the order of their creation by the user starting from bottom to top.
We aim to use all available space on each axis to position the nodes. To achieve this goal, we uniformly arrange nodes along the axis. Let
where
When several nodes are merged by the user, we position the resulting node at the barycenter of the merged nodes. Then, we use a node overlap removal algorithm 37 to arrange the layout on the axis while preserving the relative distances between the consecutive pairs of nodes. 38
The user can also add links between the nodes to model keyword relationships. For instance, the user can add a link between a given symptom and a given disease if this disease involves this symptom. When creating a link, the user can select a weight for this link. For instance, if he or she knows that a given disease often involves a particular symptom and sometimes involves another symptom, he or she can create two links with a high weight for the first one and a lower weight for the later. Weights of relations are encoded with the width of their corresponding links.
keyword manager interactions
The
To ease the manipulation of the three categories of keywords, EpidVis includes a toolbar (Figure 2(e)) where users can add, remove, and update keywords and links, and merge and split nodes in each category. Users can also move nodes along their axis and visualize the composition of merged nodes with a tree view (Figure 3). Each level in the tree represents a merge action performed by the user. For example, in Figure 3, the user can decide to first merge “chicken” and “turkey” into “poultry” and then merge “poultry” and “duck” into “bird,” which is the root of the tree.

A tree representation of the merged node “bird.” The merged node is composed of “duck” and “poultry,,” and the later is composed of “chicken” and “turkey..”
Users can also drag and drop the entire
Furthermore, the user can create, save, or open his own
QUERY BUILDER
The main purpose of the
Visual mapping
The

The
The query is displayed with keywords linked by logical operators. After discussion with domain experts, the logical operator “OR” links keywords of the same category, and the logical operator “AND” links keywords from different categories (Figure 5). Following this rule, we obtain a complete query as shown in Figure 4(c). Furthermore, to make the query understandable, EpidVis color codes the three different keyword categories. Finally, the user can launch the query in one of the search engines currently available in EpidVis: Google, Google News, and Google Advanced (Figure 4(d)).

Logical operators of the query: keywords from the same category are separated by the logical operator OR and between different categories by the logical operator AND.
Query refinement and interactions
As mentioned earlier, the user can select three thresholds by moving the sliders at the top of the view. Each threshold concerns the links between two categories. For instance, when the user selects 0.5 for Diseases–Hosts, the links between diseases and hosts holding a weight less than 0.5 are no more considered, and the corresponding keywords are removed from the query. Thus, the thresholds are used to filter some irrelevant keywords of the query function of the weight of their links with other keywords. We now explain further and more formally how the query is created.
We emphasize that users can choose keywords from any category to create queries and refine it according to the selected category. For this reason, we make a general formulation that carries out with keywords in any chosen category (diseases, hosts, and symptoms).
In the following, let
We generate the query
If there is no link between two keywords
The

query results view
The

Third, we associate check boxes with each result. Users can use them to select and save relevant web results (see also the “save” and “load” buttons at the top of the application Figure 2(h)). Finally, classical buttons are available to change the pages as shown in Figure 7(e).
Users can sort the web results with the colored buttons shown in Figure 7(d). The colors refer to the categories. Results with snippets containing selected categories are positioned at the top, and results with snippets containing none of the selected categories appear at the end of the list. For example, Figure 7(a) shows results ordered within the disease category (blue button).
The tool allows users to modify the query (Figure 7(c)). In this case, the
SUGGESTION VIEW
Additional original functionality of EpidVis is the enrichment of the
The
Visual mapping
The

Both the views are based on a circle divided into arcs representing the keywords. The circular visualization is easy to understand and manipulate, while providing enough visual variables to convey the various types of information (categories, keywords, and relationships).
In order to facilitate the comparison of the two views, we must plot the same set of keywords at the same position in both the views. To meet this requirement, we plot the union of the keywords of the
Each arc is divided into a set of sub-arcs. Relations between keywords are represented as curves between sub-arcs in such a way that one link is associated with a unique sub-arc. The radius of a sub-arc represents the weight of the link starting from it. Each link has an inverse color interpolation related to colors of linked sub-arcs to ease the recognition of the opposite sub-arc when the user looks at a particular one. All the relations involve the selected keyword. Thus, we could only display bipartite graphs showing the keywords of two other categories. After discussion with experts, we found that adding suggested data to the
The
suggestion view interactions
EpidVis combines several interactive features to explore the suggested data and add some of these suggested data to the
For example, Figure 9(a) shows the highlighted triplet: “bluetongue”–“sheep”–“fever outbreak,” and Figure 9(b) shows the current keyword view, thus allowing users to compare both data views and choose triplets to be added to the

The suggested keyword view mouseover feature. (a) Overring a sub-arc in the suggested keywords view highlights the corresponding triplet. (b) The same triplet appears in the current keywords view.
Users can add and remove the triplets in the
As shown in Figure 10, the user selects triplets in the suggested keyword view by clicking on the corresponding arcs. Selected triplets are represented with a red color and appear in both the

We synchronize both the
The slider at the bottom of the view can be used to filter relationships between keywords according to weights. The user selects a threshold value for the relationships, and relationships with weights below threshold will be removed. Figure 11 shows an example where the threshold 0.22 has been selected. We observe that some links have been removed between Figures 8 and 11.

Filter suggested relationships from 0.01 to 0.22.
Technical considerations
EpidVis is a web-based application. The user interface was implemented in Javascript using Extjs (www.sencha.com/products/extjs/, accessed 5 March 2019). The different views were implemented with the D3.js library. 39 We used a PHP server to launch the queries and obtain the online news results. The user interface is connected to the server using the JQuery library (jquery.com, accessed 5 March 2019).
Evaluation
We evaluated EpidVis from two perspectives: (1) a user study, to evaluate its usefulness and its usability and (2) a case study, to highlight how the tool helps domain experts to find pertinent news sources.
User study
We conducted a user study to test the usefulness and the usability of the tool. The purpose of this study was not to show how the tool helps domain experts in their daily tasks (see the next section for examples), but to evaluate how the design choices facilitated general tasks, that is, tasks not involving particular skills in epidemiology. For instance, on the
Twelve participants were involved in this study (Table 1). As no particular skill in epidemiology was required, there were three experts in epidemiology (referred to as domain experts) and nine people who are not experts in epidemiology (referred to as non-experts).
User study participants.
First, we gave them a detailed demonstration of the tool and the different functionalities. Then, they were asked to familiarize themselves with the tool. There was no time constraint for this task, as they had online access to the tool during several days. When they were ready, they were asked to fill an online questionnaire about usefulness and usability of the tool. For instance, in this questionnaire, participants were asked to perform tasks with the
Table 2 shows the average results of the evaluation. No significant differences appeared between domain experts and non-experts, so they are not separated in the table. Participants highly appreciated the different visualizations in terms of usefulness (mean score: 4.2/5) and usability (mean score: 4.3/5). In particular, the usefulness of the
EpidVis qualitative evaluation: average scores among participants.
Comments left by participants underline that the filtering and saving features of results have been appreciated. They did not highlight further needs for the tool or any specific limitation in its usefulness/usability, other than including other search engines like Bing and Yahoo. Our system can handle these engines but we did not have a free access to their APIs during the prototype development.
Case study
The case study provides an example of application of EpidVis to animal disease surveillance. This example helps to determine the usability and the added-value of EpidVis to domain experts. In this case study, the tool was tested and evaluated by an expert working in disease surveillance and EI for the French Platform for Animal Health Surveillance (ESA Platform—www.plateforme-esa.fr, accessed 5 March 2019). Here, we present her feedback.
Before the evaluation, the tool designers met with the domain expert to present the tool and its functionalities. The expert had several days to test the tool before elaborating an evaluation protocol described below.
For this evaluation, queries were built using keywords from the following three categories: diseases, hosts, and symptoms. The expert decided to select African swine fever as the example of disease to test the tool, given its spread across Eastern Europe and its introduction into Belgium, which represents a great threat to pig and wild boar populations in France.
The following aspects were analyzed:
Test 1: usability of the tool by using a simple query corresponding to a search using only the disease name (one keyword category).
Test 2: impact of taking into account languages by merging the disease names in different languages (one keyword category with several keywords linked with OR operator).
Test 3: impact of complex queries by adding several keywords from the same category and/or adding keywords from other categories of keywords.
Test 4: impact of using the suggestion option to enrich queries.
The queries of the four tests were run using the Google News search API. The first 10 results of each query were analyzed by the expert to evaluate pertinence, that is, articles/web pages containing information related to an outbreak or control measures for the studied disease.
Test 1: basic disease search
Regarding the first test, the expert created a simple keyword view that corresponded to a basic query on a search engine. Only one axis (disease) was used, with one keyword (name of the disease). This query was used as a baseline to compare with other tests and was done in English (with the keyword African swine fever) and in French (with the keyword peste porcine africaine).
For the query in English, 8 results out of the first 10 were pertinent, that is, related to a disease outbreak or control measures to prevent the introduction of the disease. Out of the eight pertinent results, six were from media sources (from China, United States, United Kingdom, and Luxemburg) and two were from official sources (United Nations Food and Agriculture Organization (FAO) and International Atomic Energy Agency (IAEA)). The results that were not pertinent were referring to a general description of African swine fever on Wikipedia or to a media article describing the clinical signs and thus focusing on awareness.
For the query in French, all the first 10 results were pertinent. Eight were from media sources (mostly Belgian, but also French and Luxemburgish), one result was from the official Wallonia authorities’ website and one was from the Belgian Royal Hunter’s Association (French is the spoken language in the Southern part of Belgium). The high number of pertinent results and important representation of Belgium is due to the fact that African swine fever emerged in Belgium in September 2018 and has since caused over 700 deaths in wild boars and threatening not only Belgian pig farms but also pig farms in bordering countries, such as France.
This first test is as simple as searching for the keyword manually in Google News. It serves as baseline to compare the added value of the other tests.
Test 2: adding other languages
By considering different languages, the objective was to extract further pertinent Web pages and show how the merge functionality can help doing this. The languages included were French (“peste porcine africaine”), German (“Afrikanische Schweinepest”), and Polish (“afrykańskiego pomoru świń”). French was selected because it is the spoken language in the Southern part of Belgium, where African swine fever was introduced in September 2018 (cf. Test 1). German and Polish were chosen because Germany shares a border with Poland, where African swine fever has been circulating since February 2014. 40 English was not used in this case study in order to avoid detecting scientific publications (for which English is the most commonly used language) and because it is so widely used that it would overshadow other diseases in the query results. All disease names were merged together into a single node using the dedicated functionality. As the name of the merged node is taken into account in the query (it cannot be the same name as one of the keywords), the French disease name was used as the name of the merged node (Figure 12).

Merging keywords representing the same disease with different languages.
All the first 10 results were pertinent. Seven of the first 10 results were similar to the Test 1 query in French. Adding languages to the query led to the identification of three new pertinent results, of which two were in German and one in Polish. The two media articles in German were related to official sources and contained information on prevention measures to mitigate the risk of the disease in Luxemburg (on the website of veterinary services of Luxemburg) and in Germany (on the website of Baden–Wurttemberg province), information which is of utmost importance for disease outbreak surveillance. The result in Polish was from a media source, describing new outbreaks of African swine fever in Poland (where the disease has been circulating since 2014).
This complex query, which included three diseases, would have taken more time to launch manually in Google News. Also, the “Save” option allows the expert to save the model to reuse on a daily basis and easily adapt in the framework of EI activities. This was highlighted as a strong advantage and a considerable time-saving option for users.
Test 3: adding keywords for hosts and symptoms
The expert added complexity to Test 1 (in English this time) by adding three keywords from the two other keyword categories: “mortality,”“haemorrhagic,” and “fever” for the symptom category, and “pig,”“wild boar,” and “porcine” for the host category (Figure 13). These keywords were linked to the disease keyword with varying weights (attributed by the expert) determined by epidemiological relevance to the disease. They were assembled in the query as shown in Figure 13. The search results were compared to the results from Test 1 (English) to identify the added value of adding keywords in other categories.

Adding keywords for hosts and symptoms.
By adding keywords from two other categories (symptoms and hosts), the expert was able to identify four additional pertinent results that were not identified in Test 1. These results were from media sources (Spanish and Chinese media) and from official sources (United States Department of Agriculture or FAO). They added new information regarding African swine fever outbreaks in China and prevention measures in the United States, which would not have been found using a simple query (Test 1). The other results from Test 3 were either already present in Test 1 (four results) or not pertinent (two results). The latter were related to general description of the disease (Wikipedia or the official United States Department of Agriculture website) or to pig import health certificate website (UK health authorities).
Similar to Test 2, adding extra keywords related to hosts and symptoms could have been done manually in Google News, but would have taken more time not only to enter the keywords and operators but also to aggregate the results. The “Save” option was again underlined as a time-saving functionality.
Test 4: using suggestion
In this test, the expert enriched the query from Test 3 with additional keywords from both the symptom and host categories. To do so, the expert uploaded a csv file giving association scores between terms describing hosts and clinical signs. This score is calculated by combining text- and web-mining methods detailed in the study by Arsevska et al. 8
The expert was able to set a threshold to filter the keywords depending on the weight of their associations. Given the high number of keywords, the expert sets a threshold to visualize only keyword associations with a weight of 0.2 or more (Figure 14).

Adding keywords and links from an external file (top: without filtering, bottom: filtering associations with a weight lower than 0.2).
The query from Test 4 allowed the expert to identify three additional pertinent articles that were not identified using the query from Test 3. These three articles were from media sources (from the United States, United Kingdom, and China) and contained information on prevention measures in Luxembourg, control measures and economic compensations in China to limit the impact of the disease. This information of utmost importance would not have been identified without the enrichment of the new knowledge concerning pertinent keywords to add to the query.
Uploading the csv file and using the “suggestion” functionality of EpidVis allowed the expert to visualize keywords and their weighted associations, which facilitated the enrichment of the query. Keywords were easily selected and integrated into the existing associations of the query. Identifying pertinent keywords and combining them with the existing keywords while taking into account the weight of associations would have taken much more time if done manually in Google News. Again, the expert is able to save the enriched model and run it again later without having to recreate it from scratch.
Conclusion of the case study
From this analysis, the expert concluded that the tool was easy to use, intuitive, and allows saving time in daily surveillance activities. Queries could easily be modified by adjusting or changing keywords following a preliminary search. The tool presented several very useful functions which helped the expert in his daily work, including visualizing the links between keywords, the possibility to merge keyword nodes, the possibility to save models to reuse later, and the suggestion option which enriches the query with new pertinent keywords.
For instance, this case study highlighted the added value of including disease name in different languages, particularly languages from countries affected or threatened by the disease, which leads to many pertinent results. Adding keywords related to symptom and host axes allows the expert to find new pertinent information that would have been missed if only the disease name was used during the query. This allows the expert to avoid missing important information related to a disease outbreak that could potentially have a severe impact on animal populations and alert health authorities on time to implement the appropriate measures.
The expert can easily upload a previously saved model and run the query rapidly or even modify the query to adapt to a new context (adding keywords for example). The queries tested in this case study could have been done in Google News manually, but using EpidVis allowed the expert to run the queries and combine the search results in a limited amount of time.
To conclude, integrating EpidVis in daily surveillance work allows experts to be more exhaustive and save time in their routine activities such as media monitoring. This case study also provides insight into the importance of identifying the relevant keywords and the epidemiological links between diseases/hosts/symptoms.
Discussion
Although domain experts advocate that EpidVis helps them with their daily monitoring of online news reports for EI activities, the tool still presents technical limitations. First, the main limitation of the

We encounter similar problems with the
These limitations were not highlighted by the domain experts during their first use of our system. They currently use separate files of keywords and suggestions containing small datasets that are efficiently handled by EpidVis. To conclude, temporary solutions have been used by experts (separate files and zooming in). However, the identified limitations constitute challenging issues for future work on new visual representations and interactive features in a context of large datasets for EI activities.
Conclusion and future work
In this article, we presented EpidVis, a new visual web querying tool for monitoring animal disease outbreak information from online news sources. It combines several views including
Our results show that EpidVis helps domain experts create precise queries based on their own knowledge as well as external knowledge, and they can even adapt their queries according to the results. EpidVis allows domain experts to have rapid access to pertinent information related to animal outbreaks. This can decrease delays in terms of outbreak detection and, in turn, control, and thus limit the impact and spread of pathogens. EpidVis has been applied to a context of disease surveillance in animal health which is highly related to both human health (e.g. animal disease mutating and reaching humans) and agriculture (e.g. avian influenza outbreak involving the slaughter of hundreds of thousands ducks in France in 2017).
EpidVis is a part of a larger project which aims at providing an integrated platform allowing users to extract and explore epidemiological data from the web for the early detection and monitoring of disease outbreaks. In this context, while EpidVis will provide relevant web sources for the analysis, we plan to combine it with a text-mining module to extract from these pages useful information 4 such as dates, locations, diseases, hosts, and symptoms and to provide a visual system 31 to geographically compare this information with official reports from organizations such as OIE or the United Nations FAO.
For future work, we also plan to explore the possibilities of adapting the tool to a One Health context by including disease surveillance in both human and environmental health. We also want to extend it to other domains involving web querying, such as documentary search (where categories could be authors, topics, and types of documents) or call for proposal search (where categories could be domains, topics, and funders).
Footnotes
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Ministry of Higher Education and Scientific Research of Algeria and the SONGES project (FEDER and Occitanie). We thank Renaud Lancelot (ASTRE, Cirad) and Sarah Valentin (ASTRE & TETIS, Cirad) for their expertise in epidemiological surveillance.
