Abstract
Interactive Topic Detection and Tracking (iTDT) refers to the TDT works which focus on user interaction, user evaluation and user interfaces aspects. This article investigates and identifies elements of the design of an interface that aims to facilitate journalists performing TDT tasks such as tracking and detection. It presents an (iTDT) interface called Interactive Event Tracking (iEvent), and evaluates the usability of the features introduced. The findings indicate the features that facilitated the participants in performing both tasks: cluster labelling and top terms features in Cluster View, a histogram with the timeline and document content features in Document View, and a keyword approach feature in Term View. Meanwhile, features such as cluster visualisation in Cluster View and histogram with the timeline in Term View only facilitated participants during the tracking task. The study shows that the interface enables journalists to perform well in TDT tasks.
1. Introduction
Research on Topic Detection and Tracking (TDT) serves as a core technology for a system that monitors news broadcasts: it helps to alert information professionals such as journalists to new and interesting events happening in the world. They are keen to track and, in particular, to know the latest news about a story from a large amount of information that arrives daily. In addition, Forex traders are interested in tracking and detecting news announcements, such as political events that affect microeconomic factors. This has created an opportunity and attempt in TDT research to focus on user interaction, user evaluation and user interfaces as a way to visualise and represent news in a meaningful way.
TDT tasks such as stream segmentation, link detection, story detection and story tracking focus mainly on the evaluation of algorithms in the context of Text REtrieval Conference (TREC) and without user involvement [1]. We believe that TDT is very much an interactive task which is interrelated with and complemented by user interaction. Therefore, research in TDT should be tackled from the user perspective, which enables us to view the complete TDT [2, 3]. For example, users can validate and define a new story in a new event detection task.
Efforts have been made on user interfaces to improve the TDT system by investigating not just the interaction aspect, but also user and task-oriented evaluation [4]. We believe that interfaces play a vital role in interactive Topic Detection and Tracking (iTDT), and so we set out to design a new interface for iTDT that is meant to support the user in all the tasks related to TDT. The importance of user interaction in the real world is the reason why iTDT is receiving more attention, and it was clear from the literature that a well-designed iTDT interface is important to guide users in performing TDT tasks. Designing such an interface should incorporate its best and successful components or features.
The term iTDT is used in this article to refer to the TDT works that focus on these aspects. It is important to provide a means for people such as journalists to understand and interpret what is happening in the news. TDT research is still active where researchers in this area have focused on developing algorithms for better TDT performance, and the evaluation of these algorithms is the main activity in TREC evaluation [5]. A few TDT researchers have investigated techniques such as information visualisation and automatic timelines to support users with dynamic and interactive usage. but very few researchers have worked on interfaces and user interaction for TDT. According to [6] and [7], an effective interface should be well designed and generate positive feelings of success, competence, mastery, pleasure and clarity in the user community.
2. Related work
TDT research is continuing where the focus is on iTDT, and the present work is aimed in this direction. TDT researchers have attempted to build better document models, developing similarity metrics or better document representations [8]. This has led to a series of research efforts that concentrate on improving document representation by applying Named Entity Recognition [9–14]. Then a few researchers started to move from the laboratory style of experiment to the interactive TDT mainly focusing on graphical user interface. Event Organizer [15], TDTLighthouse [16], TimeMine [17] and the Topic Tracking Visualisation tool [18] are examples of TDT works that investigate certain approaches to improving TDT system performance using a graphical user interface. We reviewed these works by discussing the features and approaches used and how they motivated this work. The reviewed works on iTDT enabled us to identify the similarities and differences of the components and features used, as shown in Table 1.
Comparison of iTDT features
Most of the iTDT interfaces reviewed had Document View (DV) as the important component that displays information such as the document timeline, document content and list of topics or documents. Meanwhile, Cluster View (CV) is an important component that presents stories or documents by visualising them in a cluster or box form. Unfortunately, Term View (TV) is not as popular compared to the rest. Exploration and combination of these three views (DV, CV and TV) with features such as cluster visualisation and the timeline on the user interface could be effectively used together to perform TDT tasks. Based on the works reviewed, none of them measured the effectiveness of their approach and features applied to the interfaces from a formal user aspect. Most of them reported on the technique’s effectiveness for system performance using the Information Retrieval and TDT style evaluation. Past research has proven that user interfaces can significantly improve the effectiveness of the TDT task [15]. Therefore, the challenging questions are how to analyse and present news effectively in a meaningful and efficient manner, and what kinds of additional and critical information will contribute to an iTDT interface design. This will be described in the next section.
3. Interactive event tracking (iEvent) interface
The reviewed work on iTDT interfaces discussed previously affected the design of iEvent. iEvent comprises three components: the CV, DV and TV [3]. In this section we describe the design of iEvent and discuss its components and features.
The layout and order of the components displayed on the interface begin from the CV, followed by the DV and finally the TV. The CV is displayed on top of the interface as the main component, since this is the starting point where users are presented with a large amount of information for rapid interpretation. Visualising the cluster based on the size and density of the documents might help them to identify the important and related cluster based on the task given. The CV allows users to browse the whole collection before they narrow their search to a specific cluster: this is the reason why the DV is ordered after the CV. The DV allows the users to view the whole document in a cluster with the specific timeline: it provides an effective form of presentation and a very fast graphical overview of the information that a cluster contains. The DV generates an interactive timeline displaying the major events and uses it as a browsing interface to a document collection contained in a cluster. Finally, the TV is displayed at the bottom of the interface to be more specific on the named entities contained in the cluster. Named entities are information units such as names, including person, organisation and location names, and numeric expressions including time, date, money and percentage expressions. Users get the whole view of the corpus before they receive specific information on the documents and the named entities occurring in a cluster. The sequence or the ordering of the components on iEvent helps users to narrow down their browsing and to be focused in their search, thus helping them to perform TDT tasks.
iEvent has two settings. Set-up 1 (Figure 1) is the baseline set-up that uses keywords, and Set-up 2 (Figure 2) is the experimental set-up that uses Named Entity Recognition. Table 2 shows the comparison of the features in the set-ups. We extracted the named entities using ANNIE (A Nearly-New Information Extraction system), which is an information extraction component of the General Architecture for Text Engineering (GATE). We used it for its accurate entity, pronoun and nominal co-references extraction [19]. In this article, we have evaluated the usability of iEvent interface without comparing the set-ups. Therefore, in section 5 (Results), we present the results for both settings. We associated the findings with the set-up if there was a significant finding when comparing the results between set-ups.
Comparison of features in Set-up 1 and Set-up 2

Keywords set-up (Set-up 1).

Named entities set-up (Set-up 2).
3.1. Cluster View
The CV displays information related to the size and the density of a cluster, and the 10 most frequent keywords (Set-up 1) or named entities (Set-up 2) in a cluster. The clusters are visualised based on their size and density. On the one hand, clusters with a large size and high density contain lots of documents that have appeared over a short period of time: therefore, they are supposed to represent very important events. On the other hand, clusters with a small size and low density contain a small number of documents that have appeared over a long period of time, thus presenting recurring but relatively unimportant events. Cluster visualisation is intended to help the user to make a rapid interpretation of a topic. It should be noted that, given the difficulty in story segmentation, sometimes a cluster with a large size and low density might indicate the presence of more than one topic in the cluster. Clusters are labelled using the three most frequently named entities. When a user clicks on the cluster, additional information on the 10 most frequently named entities in that cluster is presented. These features are useful in TDT tasks, since they provide information on the most frequently named entities that occur in a specific cluster.
3.2. Document View
The DV displays information about the document timeline and the documents contained in a cluster. The document timeline is displayed in histogram form to show the occurrence and document frequency for a specific date. The height of the histogram indicates the number of documents which have occurred on that specific date in a cluster. This feature is an attempt to support the user in analysing the discourse or information flow in a press article. Discourse analysis is a general term that includes many approaches to analysing the use of language, and one important application of it is to news [20].
Timelines are a useful way to present information that has a temporal dimension. Journalists often generate timelines to describe the course of events. This will be evaluated to prove that automatically generated timelines could prove invaluable for navigating the results of a TDT system and for iTDT. The timeline feature is offered in both DV and TV. Users would be able to see the occurrence of the document and named entities within the timeline in histogram form for each cluster. In addition, users would be able to see the document content, with named entities highlighted where only Set-up 2 has highlighted named entities.
3.3. Term View
TV displays information related to named entities’ timeline in a cluster. The timeline is displayed in histogram form to show the named entities’ occurrence and their frequency for a specific date. The histogram with the timeline shows the relevance score of named entities using term frequency (tf). The timeline feature provides journalists with the whole view of named entities’ occurrences in the cluster. This is helpful in providing information about when the event occurred and in supporting the new event detection task. This feature also helps the user in the topic detection task by presenting information about the latest occurrence of a named entity from the timeline.
4. Method
4.1. Procedure
During the user experiment 240 tasks were performed, with 160 tracking tasks and 80 detection tasks. There were eight topics for the tracking task (T1–T8), and four clusters for the detection task (D1–D4) in two sessions. After completion of the tasks, participants completed a questionnaire about using the interface. They were given two hours to attempt the entire tracking task and 15 minutes for each topic. Participants had 40 minutes to complete the detection task and were given 10 minutes for each cluster. The whole user experiment took about 2 hours 40 minutes to 3 hours, excluding a short training session. The time assigned to each task was sufficient based on the feedback received from the pilot test conducted. Each participant session lasted between one and one-and-a-half hours depending on the time taken to complete the assigned tasks, and the time taken by the participants to complete the questionnaires. The participants were offered a short break (5–15 minutes) after the first session.
The participants had the opportunity to perform the tasks using the interface. A Latin square [21–23] was used to construct the experimental design (see Table 3). This allowed us to evaluate the same topic using different set-ups. The order of topics assigned in the tracking tasks and the order of clusters given in the detection task were rotated to avoid any learning and fatigue factor. This principle also applied during the training session. Topic 1 (‘Oprah Lawsuit’), for example, had a chance to be the first, second, third and fourth in order, during the tracking task. The clusters assigned in the detection task were invisible when participants performed the tracking task to avoid any intersection of clusters. This is important, as the intersection affects participants’ performance: they might have come across the clusters used in the detection task during the tracking task, therefore making the tasks challenging to the participants. The selection of topics and clusters given in the user experiment had a combination of good and poor clustering performance based on the F1-measure. This was important to justify whether the iEvent interface helped the participants to perform the TDT tasks, even though they were given a bad cluster to track or a bad topic to detect.
Experimental design
S1 = Set-up1 (baseline set-up); S2 = Set-up2 (experimental set-up).
4.2. Evaluation participants
We conducted a user experiment to evaluate the iEvent interface. The participants were a combination of journalists and postgraduate journalism students from the Scottish Centre for Journalism Studies, University of Strathclyde. Of the participants, 20 were recruited, of which half were journalists and half were students. The average age of the participants was 30–40 years. In terms of educational background, 70 per cent of the participants had or were pursuing a postgraduate degree, 25 per cent had an undergraduate degree, and 5 per cent had a Higher National Diploma; 85 per cent of the participants had working experience in journalism, with 30 per cent having more than 10 years’ experience, and the average type of journalist was as a daily news reporter.
4.3. User tasks and questionnaire
The participants were given two types of tasks: tracking and detection. The start time to perform the tracking task for a topic was defined as the moment when the participant started using iEvent, and 15 minutes later was defined as the end time for the task. This principle also applied to the detection task, with 10 minutes allowed for each cluster. Since iEvent was new and unfamiliar to the participants, they received a training session on how to use it. A short time, around 30 minutes, was allocated for training at the start of the experiment. In all cases this appeared sufficient for the participants to familiarise themselves with iEvent. The training session was broken down into a series of stages.
The purpose of iEvent was explained – i.e. to cluster news stories into the same group of events or topics by visualising the clusters.
Participants were introduced to the interface components and features that appeared in the iEvent interface. We also printed a screenshot of the interface to describe the components and features of iEvent, which we believed would help the participants to understand how iEvent works and perform the task better.
Participants were given a live demonstration of each set-up using the same topic, ‘General Motors Strike’, for the tracking task, and Cluster 24 (‘Pope Visits Cuba’) for the detection task.
A training session was issued and participants were given the chance to attempt the tracking and detection tasks. This gave participants an opportunity to use iEvent in a realistic news tracking and detection context, and to become accustomed to the interface features.
The training session stopped once participants felt comfortable using iEvent.
Participants were allowed to comment or ask questions at any point during the session.
The questionnaires were designed in the form of an entry questionnaire, tracking task questionnaire, detection task questionnaire and iEvent post-evaluation survey.
4.3.1. Tracking task
In the tracking task, the participants had to track the cluster that contained the given topic and show that the system provides a sufficient amount of information on the event. This is in line with the journalist’s task of reporting news. There were two sub-activities in this task: reporting and profiling. The procedure for performing the tracking task was as follows.
Participants were welcomed and asked to read the introduction to the experiment provided on an information sheet. This set of instructions was developed to ensure that each participant received precisely the same information. Participants could retain the information sheet after the experiment.
The participants were given a short overview of what the experiment would entail. We also explained our role in this experiment – i.e. to observe participants’ interaction with the systems, provide participants with technical support and remind participants of the time taken in performing the tasks.
Participants were asked to complete an entry questionnaire. This provided background information on their education, work experience and previous experience of news network tools used.
Participants were given a demonstration of the iEvent interface with both set-ups by following the experimental design (as shown in Table 3). This included the features available on the interface, followed by a training session. The training session was the same for all participants using both set-ups, which gave participants a chance to familiarise themselves with the interface. Participants could ask questions or ask for general assistance at any time during the session.
Tracking task – once comfortable with iEvent, participants were asked to perform the tracking task. As indicated previously, there were two sub-activities in this task: reporting and profiling. Reporting required the participant to write an article on a topic by drafting the important facts. For profiling, the participant had to make a profile of a story by providing the important keywords. Participants were given 15 minutes to search, and could stop early if they were unable to find any more relevant information. Searching in this experiment refers to identifying the cluster related to a given topic.
After completing the search (successfully or otherwise), participants were asked to complete the questionnaire. The remaining tasks were given to the participants in the second session using a different set-up, following steps 5a–b. The participants were offered a short break after the first session.
4.3.2. Detection task
Meanwhile, in the detection task, the participants had to identify the topic dealt with by a specific cluster. This is in line with the journalist’s task of identifying an important event that happened on a specific day. The procedure to perform the detection task was as follows.
Participants were given 10 minutes to search and could stop early if they were unable to find any more relevant information. Searching in this experiment refers to detecting the topic for a given cluster.
After completing the search (successfully or otherwise), the participants were asked to complete the questionnaire.
The remaining tasks were given to the participants in the second session using a different set-up, following steps 6a–b.
At the end of the experiment, participants were asked to complete the post-evaluation questionnaire and an informal post-experiment interview was conducted. The post-evaluation questionnaire compared participants’ performance between set-ups.
5. Results
Participant performance was analysed to identify the effectiveness of iEvent interface in facilitating them to perform the tracking and detection tasks. As indicated previously, during the experiment 240 tasks were performed: 160 (66.67%) of these tasks were tracking, and 80 (33.33%) of these tasks were detection.
The findings revealed that 70 per cent of the participants liked iEvent, and 50 per cent of the participants liked to use iEvent in both tasks. A possible explanation for these results might be the participants’ success in performing both tasks (as reported in Sections 5.1.1 and 5.2). Of the participants, 20 per cent disliked iEvent, and 10 per cent of the participants were not sure. Those who disliked iEvent were all journalists with an average age of 30–40 years and average working experience of more than 10 years. From the interview session, these participants had previously used news network tools such as PressDisplay.com, 1 PaidContent.org 2 and Google Fast Flip, 3 thus having a high expectation when using iEvent. Of the participants, 10 per cent were not sure, although they mentioned some interesting features of iEvent; however, they disliked the fact that they had to scroll and mouse over the CV to find the topic in the tracking task.
The participants were asked about their topic familiarity and topic interest before they started using iEvent. Each degree of agreement was given a numerical value from 1 to 5, where a higher value corresponded to greater familiarity. The findings revealed that there was no statistical significant difference between the topics and the participants’ topic familiarity (Mann–Whitney test, p = 0.483). This indicated that the topics given during the experiment did not influence participants’ topic familiarity. In addition, the participants were not familiar with the topics given in the tracking task (mean = 2.01, SD = 1.03), and there was no statistically significant difference between the participants and their topic interest (Mann–Whitney test, p = 0.842). Their topic interest was average (mean = 3.27, SD = 1.09). This was a good indication for the experiment, since the participants were not affected by external factors such as topic familiarity and topic interest because we compared their results after using iEvent. The Wilcoxon signed ranks test proved that there was a statistically significant difference in both topic familiarity and topic interest before and after using iEvent. The mean for topic familiarity (before = 2.01, after = 3.26) and topic interest (before = 3.27, after = 3.63) was increased after using iEvent.
There was an increased percentage (five times higher) for participants who were familiar with the topic before (8%) and after (46%) using iEvent. The percentage decreased for participants who were not familiar with the topic before (69%) and after (27%) using iEvent. Of the participants, 69 per cent were not familiar with the topic because the collection used was in 1998 (TDT2 and TDT3 corpus). This supports the evaluation that iEvent influenced their topic familiarity and topic interest. If participants had been given more recent topics, they might have been more familiar with, or would have possessed, better knowledge of them, and so this would have influenced their performance in the tracking task.
For topic interest, there was an increasing percentage of participants who were interested in the topic before (50%) and after (59%) using iEvent. Meanwhile the percentage of participants who were not interested in the topic beforehand decreased to 10 per cent after using iEvent. These indicate that the participants were more familiar with and more interested in the topics in the tracking task after using iEvent.
A Mann–Whitney test confirmed that there was no statistically significant difference (p = 0.492) in topic interest before using iEvent across set-ups. However there was a statistically significant difference in topic interest after using iEvent across set-ups (Mann–Whitney test, p = 0.003). The participants were more interested in a topic in the tracking task after using Set-up 2 (mean = 3.81, SD = 1.032). Participants found that using Set-up 2 enhanced their topic interest. There was a ratio of 7:1 participants who found that they were more interested in a topic after using Set-up 2. They found that using Set-up 2 of iEvent significantly enhanced their topic interest, with 46.3 per cent of participants agreeing that they were interested (scale 4) in a topic. These results indicate that the participants were more familiar with the topics in the tracking task after using iEvent. In addition, they were more interested in the topics in the tracking task after using Set-up 2 of iEvent.
The participants were given an entry questionnaire before they performed the tracking and detection task. They were asked to list out the news network, tools or search engines used. They mostly used Google (95%) and BBC news (90%) as their main news network tools. In addition, the participants were asked to rate their experience of using the news network tools: 45 per cent found the news network tools that they used were easy (scale 4) (mean = 4.05, SD = 0.759); 50 per cent found that the news network tools were relaxing (scale 4) (mean = 3.45, SD = 0.887); and 35 per cent agreed that the news network tools were neither simple nor complex (scale 3) (mean = 3.10, SD = 0.912). Based on participants’ satisfaction levels, 35 per cent of them were dissatisfied (scale 2) and found that the news network tools were average (scale 3) (mean = 3.00, SD = 0.918). Finally, based on participants’ interest, 45 per cent found that the news network tools were averagely interesting (scale 3) and interesting (scale 4) (mean = 3.35, SD = 0.617). The participants interviewed mentioned that the Google style of searching contributed to ease of use of the news network tools, thus making the search process more relaxed and interesting.
5.1. Tracking task
Several analyses were performed on the captured data, and the following sections present the findings. First, the participants’ overall opinions of iEvent were examined. Next, we investigated participants’ performance using iEvent in the reporting task: that is, the amount of news written. Then we investigated the features of iEvent that the participants perceived as useful, effective, helpful and interesting.
5.1.1. Overall opinions
The iEvent interface that participants perceived as easy, relaxing, simple, satisfying and interesting during the tracking task was analysed.
Easy. A ratio of 3:1 participants found that iEvent was easy to use (mean = 3.57, SD = 1.079), with 38.8 per cent agreeing that it was easy (scale 4). During the interview session, the participants informed us that iEvent was easy to use because it has structured and clear components of CV, DV and TVs. There was a statistically significant difference in participants’ opinions (easy) across set-ups (Mann–Whitney test, p = 0.004), with 45 per cent of participants agreed that Set-up 2 (mean = 3.85, SD = 0.828) was easy (scale 4). Interestingly, 67.5 per cent of the participants found that Set-up 2 was easier, compared to 5 per cent who found it difficult. This indicates that 14 participants found that using Set-up 2 of iEvent made the tracking task easier. The participants interviewed agreed that Set-up 2 provides significant information on important named entities, which makes the tracking task easier.
Relaxing. A ratio of 4:1 participants found that iEvent was relaxing (mean = 3.50, SD = 0.883); 38.8 per cent agreed that it was relaxing (scale 4). The participants interviewed again associated the relaxing factor with the structured and clear components of iEvent, which also supports the perceived easiness of using iEvent to perform the tracking task. There was a statistically significant difference in participants’ opinions (relaxing) across set-ups (Mann–Whitney test, p = 0.003): 41.3 per cent agreed that Set-up 2 (mean = 3.71, SD = 0.860) was relaxing (scale 4), and 60 per cent found that Set-up 2 was more relaxing, compared to 7.5 per cent who found it stressful. This indicates that eight participants found using Set-up 2 of iEvent makes the tracking task more relaxing.
Simple. There was a ratio of 2:1 participants who found that iEvent was simple (mean = 3.30, SD = 1.039), with 37.5 per cent indicating that it was simple (scale 4). A Mann–Whitney test confirmed that there was no statistically significant difference in the participants’ opinion on simple (p = 0.840) in conjunction with the set-ups. The participants interviewed related this opinion to the clear and structured components of iEvent, but there were suggestions to revise the layout of iEvent, in particular for the CV to be vertical instead of horizontal. Thus the layout issue would be interesting for future work on iEvent.
Satisfying. There was a ratio of 5:1 participants who found iEvent to be satisfying (mean = 3.50, SD = 0.854), with 40 per cent agreeing that it was satisfying (scale 4). A Mann–Whitney test confirmed that there was no statistically significant difference in participants’ opinion on satisfying (p = 0.500) in conjunction with the set-ups.
We measured participants’ satisfaction by analysing their agreement on sufficient information gathered during the tracking task and the reporting task results. We believe that the participants were satisfied with iEvent if they managed to perform the tracking task by receiving sufficient information for a topic, and if they managed to report the story assigned by tracking the correct cluster. The participants were deemed to be satisfied if they found the information that they needed. Further analysis showed that 39.4 per cent agreed that they had gathered enough information using iEvent (mean = 3.50, SD = 1.082) during the tracking task. There was a ratio of 3:1 participants who agreed that they had gathered enough information using iEvent.
Interestingly, the satisfying factor was related also to the high percentage of correct clusters to be tracked. This gives strong evidence that iEvent mostly helped to facilitate the participants in tracking the correct cluster (mean = 3.87, SD = 0.49). We classified the correctness of the clusters as being tracked into four categories:
none – where participants did not provide any information or they did not complete the task;
wrong – where participants tracked the wrong cluster;
partially correct – where participants listed the minor cluster as their main finding; and
correct – where participants listed the major cluster as their main finding.
The entire tracking task was successful, with 91.9 per cent of the task being correct and 4.4 per cent being partially correct. There were two participants (1.3%) who did not complete the task on the topic ‘National Tobacco Settlement’. The participants were using Set-up 2 (experimental set-up) which displays information on named entities (e.g. ‘Congress’, ‘Clinton’); however they were looking for the term ‘tobacco’. This was the reason why the participants spent the full 15 minutes allocated and were still not able to find the correct cluster. There were four participants (2.5%) who were wrong about the topic ‘Mobil–Exxon Merger’. The participants were confused with this topic when they were using Set-up 1 (baseline set-up) from the term ‘merge’ which also highlighted the cluster on the topic ‘Microsoft Merger’. These uncompleted and wrong tasks represented just a small percentage compared to the successful tasks. This proved that iEvent managed to facilitate the participants in performing well in the tracking task.
Interesting. This opinion of iEvent received the highest ratio with 9:1 participants finding that iEvent was interesting (mean = 3.89, SD = 0.956), and 34.4 per cent agreeing that it was very interesting (scale 5). There was a statistically significant difference in participants’ opinion (interesting) across set-ups (Mann–Whitney test, p < 0.05). Set-up 2 (mean = 4.23, SD = 0.779) received the highest percentage, with 43.8 per cent of the participants agreeing that it was very interesting (scale 5). Surprisingly, none of the participants found that Set-up 2 was boring and 78.8 per cent found that it was more interesting. This indicates that the participants found that using Set-up 2 of iEvent made the tracking task more interesting than Set-up 1.
The participants interviewed agreed that the histogram with the timeline was one of the most interesting features, and from our observation they used it mostly as their main strategy when performing the tracking task. One of the participants said: ‘This is a new paradigm of monitoring news in journalism and it is absolutely interesting.’
5.1.2. Reporting task
This section reports the findings of the participants’ performance during the reporting task as one of the sub-activities of the tracking task. We analysed the number of lines that participants wrote, as this was an important measure of how effective iEvent was in providing information to the participants. The amount that they wrote indicated that the participants received enough information and were able to deliver it in a written form. We also analysed the number of lines that the participants wrote across set-ups. There was no statistically significant difference on the amount of news written in conjunction with the set-ups (Mann–Whitney test, p = 0.434), and no statistically significant difference on the amount of news written for different topics (Mann–Whitney test, p = 0.202). These indicate that the participants managed to write the amount of news equally using both set-ups, and for every topic given in this experiment.
The findings revealed that the participants wrote on average nine lines using iEvent (mean = 9.44, SD = 6.455). There was a statistically significant difference in the amount of news written in conjunction with the type of participants (Mann–Whitney test, p < 0.05). The journalists (mean = 7.09, SD = 5.45) wrote less than the students (mean = 11.79, SD = 6.56), meaning that the journalists were more selective and critical when writing news.
iEvent also facilitated the participants to report the correct news (mean = 3.80, SD = 0.708): 91.3 per cent managed to report the correct news and, interestingly, none of the participants provided the wrong information. There was no statistically significant difference on the amount of correct news written in conjunction with the set-ups (Mann–Whitney test, p = 0.651), which indicates that the participants managed to write the amount of correct news equally using both set-ups. We classified the correctness of news written into four categories:
none – where participants did not provide any information or did not complete the task;
wrong – the news written did not match the topic;
partially correct – part of the news written matched the topic; and
correct – the news written matched the topic.
5.1.3. Features
In this section, we analyse each feature of iEvent and assess which set-up participants perceived as useful during the tracking task, as shown in Table 4.
Percentage of participants who perceived the features of iEvent as useful in the tracking task
(−)ive = scale 1, 2; (+)ive = scale 4, 5.
Scale from 1–5, higher = better; highest value shown in bold.
Useful. The highest ratio for this opinion was for the ‘CV: cluster visualisation’ feature: a ratio of 10:1 participants found that this feature was useful in the tracking task (mean = 3.86, SD = 1.008); and 36.3 per cent thought that the ‘CV: cluster visualisation’ feature was useful (scale 4), as shown in Table 2. The size and the density of the clusters contained in this feature allowed the participants to identify how many topics were in each cluster, such that clusters with large size and high density indicated a high number of documents whose distribution was over a long period of time.
There were also two features perceived as useful by a ratio of 6:1 participants: ‘DV: histogram with the timeline’ (mean = 3.82, SD = 1.192), and ‘TV: histogram with the timeline’ (mean = 3.79, SD = 1.099). Of the participants, 44.4 per cent thought the ‘TV: histogram with the timeline’ feature was useful (scale 4), and 35 per cent found the ‘DV: histogram with the timeline’ feature to be very useful (scale 5). These features allowed the participants to see the document and occurrence of the term for a specific date.
Topics such as ‘Jonesboro Shooting’ did mention 29 April as the hearing date for the case, and using these features was an advantage in reporting the outcome of the trial. Moreover, 60 per cent of the participants agreed that the ‘DV: document histogram with the timeline’ feature was a way to analyse discourse analysis. Discourse analysis is important in journalism as it studies the information flow in a press article. These findings support the reason why the participants gave a high score (scale 5) on the usefulness of this feature in the tracking task. In addition, 40 per cent of the participants agreed that the document histogram with the timelines was the best feature of iEvent.
There was a statistically significant difference on the ‘CV: cluster visualisation’ feature (Mann–Whitney test, p = 0.002) and ‘DV: histogram with the timeline’ feature (Mann–Whitney test, p < 0.05) between students and journalists. These two features were significantly more popular among students compared to journalists. Students found the ‘CV: cluster visualisation’ feature useful (scale 4) and the ‘DV: histogram with the timeline’ very useful (scale 5).The findings also revealed that there was a statistically significant difference on the ‘CV: cluster labelling’ feature across set-ups (Mann–Whitney test, p < 0.05). The ‘CV: cluster labelling’ feature in Set-up 2 of iEvent was more useful (mean = 3.94, SD = 0.919) compared to Set-up 1 (mean = 3.16, SD = 1.267). A ratio of 9:1 participants found that the ‘CV: cluster labelling’ feature in Set-up 2 of iEvent was perceived as significantly useful, with 38.8 per cent agreeing that it was useful (scale 4). This indicates that the participants found the ‘CV: cluster labelling’ feature in Set-up 2 of iEvent more useful than Set-up 1.
Effective. A ratio of 11:1 participants found that the ‘DV: document content’ feature was effective in the tracking task (mean = 4.41, SD = 0.968). It can be seen from the data in Table 5 that the most striking result was that 45 per cent found the ‘DV: document content’ feature very effective (scale 5). Further analyses on the interaction logs among the successful tracking tasks proved that there was high activity using the ‘DV: document content’ feature, with 71.4 per cent of participants using it. This indicates that this feature was effective in facilitating the participants in tracking the correct cluster.
Percentage of participants who perceived the features of iEvent as effective in the tracking task
(−)ive = scale 1, 2; (+)ive = scale 4, 5.
Scale from 1–5, higher = better; highest value shown in bold.
There were also two further features perceived as effective by the participants. A ratio of 7:1 participants found that the ‘DV: histogram with the timeline’ (mean = 3.97, SD = 1.096) was effective. Moreover, 39.4 per cent found this feature to be very effective (scale 5). A ratio of 6:1 participants also found that the ‘CV: cluster visualisation’ feature (mean = 3.73, SD = 0.951) was effective, with 36.9 per cent rating it as effective (scale 4).
There was a statistically significant difference in perception of the ‘CV: cluster visualisation’ feature (Mann–Whitney test, p = 0.001) and ‘DV: histogram with the timeline’ feature (Mann–Whitney test, p < 0.05) between students and journalists. The ‘CV: cluster visualisation’ feature was popular among the students. The students interviewed mentioned that it was effective since it gave them information quickly on the number of documents and the density. They mentioned that clusters with large size and high density had more than one topic, so they preferred to investigate the clusters with medium size with medium or high density. The ‘DV: histogram with the timeline’ was popular among the journalists. The journalists interviewed claimed that the ‘DV: histogram with the timeline’ was effective, since this was critical when looking for very specific information. This feature allows them to answer the question: ‘When was the event?’
The findings also revealed that there was a statistically significant difference on the ‘CV: cluster labelling’ feature across set-ups (Mann–Whitney test, p = 0.008). The ‘CV: cluster labelling’ feature in Set-up 2 of iEvent was more effective (mean = 3.66, SD = 0.927) compared to Set-up 1 (mean = 3.21, SD = 1.110). A ratio of 5:1 participants found that the ‘CV: cluster labelling’ feature in Set-up 2 of iEvent was perceived as significantly effective, with 41.3 per cent of the participants agreeing that it was effective (scale 4).This indicates that the participants found the ‘CV: cluster labelling’ feature in Set-up 2 of iEvent more effective than Set-up 1.
Helpful. A ratio of 12:1 participants found that the ‘TV: histogram with the time line’ feature was helpful in the tracking task. The participants interviewed mentioned that they could see the specific occurrence for a specific term. The topic of the ‘Jonesboro Shooting’, for example, allowed them to scan the timeline for significant terms such as ‘Mitchell Johnson’ and ‘Andrew Golden’. Thus 38.1 per cent of the ‘TV: histogram with the time line’ feature was perceived to be very helpful (scale 5), as shown in Table 6.
Percentage of participants who perceived the features of iEvent as helpful in the tracking task
(−)ive = scale 1, 2; (+)ive = scale 4, 5.
Scale from 1–5, higher = better; highest value shown in bold.
There were three features that were perceived to be helpful by 6:1 participants: the ‘CV: top terms’ feature (mean = 3.77, SD = 0.992), the ‘CV: cluster visualisation’ feature (mean = 3.71, SD = 0.948) and the ‘DV: document content’ feature (mean = 3.90, SD = 1.083).
There was a statistically significant difference on the ‘TV: histogram with the timeline’ feature (Mann–Whitney test, p = 0.029) between the topics. This feature was particularly popular for topic 7 (‘German Train Derails’) because it required the participants to report an accident where timeline was an important feature in order to track the story of the accident and its investigation and consequences. Further analysis of the interaction logs for topic 7 proved that the participants were using this feature more frequently for this topic, with 10.2 per cent of activity compared to an average usage of 9 per cent. There was a statistically significant difference on the ‘CV: cluster visualisation’ feature (Mann–Whitney test, p < 0.05) and the ‘DV: histogram with the timeline’ feature (Mann–Whitney test, p < 0.05) between students and journalists. The two features were more popular among the students compared to the journalists. They found that the features were helpful (scale 4).
The findings also revealed that there was a statistically significant difference on two features across set-ups. The ‘CV: top terms’ feature (Mann–Whitney test, p = 0.033) and the ‘TV: keyword approach’ features (Mann–Whitney test, p = 0.011) in Set-up 2 of iEvent were more helpful than Set-up 1. A ratio of 17:1 participants found that the ‘CV: top terms’ feature in Set-up 2 of iEvent was perceived as significantly helpful, with 35 per cent agreeing that it was very helpful (scale 5). A ratio of 9:1 participants found that the ‘TV: keyword approach’ feature in Set-up 2 was perceived as significantly helpful, with 35 per cent agreeing that it was helpful (scale 4). This indicates that the participants found the ‘CV: top terms’ and the ‘TV: keyword approach’ features in Set-up 2 of iEvent more helpful than Set-up 1.
Interesting. It is apparent from Table 7 that there were three features that participants perceived as interesting which had a high ratio (more than 10:1) compared to other features.
Percentage of participants who perceived the features of iEvent as interesting in the tracking task
(−)ive = scale 1, 2; (+)ive = scale 4, 5.
Scale from 1–5, higher = better; highest value shown in bold.
A ratio of 14:1 participants found that the ‘DV: histogram with the timeline’ feature was interesting (mean = 4.04 SD = 0.983). The participants found that the ‘CV: cluster labelling’ feature (mean = 4.04, SD = 0.983) was interesting, with 42.5 per cent finding it very interesting (scale 5). During the informal interview session, the participants found that this feature was very interesting because they received information quickly on the topic using the three most frequent terms for the cluster. A ratio of 11:1 participants found that the ‘CV: top terms’ feature (mean = 4.03, SD = 0.968) was interesting also, with 38.8 per cent indicating that they felt this feature was very interesting (scale 5).
There was a statistically significant difference on the ‘CV: cluster visualisation’ feature (Mann–Whitney test, p = 0.049) between students and journalists. This feature was popular among students since they not only found it effective and helpful, but also interesting (scale 4).
The findings also revealed that there was a statistically significant difference on four features across set-ups. The ‘CV: cluster labelling’ (Mann–Whitney test, p = 0.033), ‘CV: top terms’ (Mann–Whitney test, p = 0.026), ‘DV: document content’ (Mann–Whitney test, p = 0.013) and ‘TV: keyword approach’ features (Mann–Whitney test, p = 0.035) in Set-up 2 of iEvent were more interesting than Set-up 1. There were two features in Set-up 2 perceived as very interesting (scale 5) by the participants. They were the ‘CV: cluster labelling’ feature (48.8%) and the ‘CV: top terms’ feature (43.8%). There were also two further features in Set-up 2 perceived as interesting (scale 4) by the participants: the ‘DV: document content’ feature (36.3%) and the ‘TV: keyword approach’ feature (37.5%). Surprisingly, these four features received a high percentage (70–82.5%) of participants who found it interesting. This indicates that the participants found the ‘CV: cluster labelling’, ‘CV: top terms’, ‘DV: document content’ and ‘TV: keyword approach’ features in Set-up 2 of iEvent more interesting than Set-up 1.
5.2. Detection task
The entire detection task was successful, with 85 per cent of task results being correct and 15 per cent being partially correct. Surprisingly, there was no unsuccessful detection task or participants who wrongly detected the topics, which proved that iEvent managed to facilitate the participants in performing well in the detection task. We also classified the correctness of the topic detected into four categories:
none – where participants did not provide any information or they did not complete the task;
wrong – where participants detected the wrong topic;
partially correct –where participants listed the minor topic as their main finding; and
correct –where participants listed the major topic as their main finding.
Interestingly, a ratio of 11:1 participants found that it was easy to detect the topic in this task; 51.3 per cent found that it was easy to detect the topic (scale 4), and 20 per cent found that it was very easy (scale 5) using iEvent. There was no statistically significant difference on ease of detecting a topic in conjunction with the cluster given (Mann−Whitney test, p = 0.735). This proved that although the participants were given a combination of good and poor cluster performance, they managed to complete the detection task and perform well using iEvent. Further results from the interaction logs among the successful tasks showed that the participants took 4 minutes and 49 seconds (mean clicks = 39) to perform this task on average, much less that the 10 minutes given to complete the task.
There was a statistically significant difference between the participants’ opinion of ease to detect a topic and the set-ups (Mann–Whitney test, p < 0.05): 60 per cent agreed that it was easy (scale 4) to detect a topic using Set-up 1. Surprisingly, none of the participants found that it was hard to detect a topic using Set-up 1, and 92.5 per cent found that Set-up 1 made the detection task easier than Set-up 2. The results also showed that there was no statistically significant difference in ease of detecting a topic in conjunction with the type of participant (Mann–Whitney test, p = 0.477). Both students and journalists found that iEvent assisted them in detecting the topic easily (mean = 3.85, SD = 0.813).
The highest percentage of features used was ‘CV: top terms’ (83.8%), while the lowest was ‘CV: cluster visualisation’ (53.8%). The participants were using the ‘CV: top terms’ feature to get more information when detecting the topics and the ‘CV: cluster visualisation’ feature less. A possible explanation for this might be that the participants only dealt with a specific assigned cluster without having to compare them with other clusters, which made this feature less useful in the detection task. Further analyses of the interaction log proved that there was a low activity (2.7%) using the ‘CV: cluster visualisation’ feature.
There were three features which also received a high percentage of use, namely the ‘DV: document content’ (81.3%), ‘TV: keyword approach’ (81.3%) and ‘DV: histogram with the timeline’ (80%) features. Further analyses of the interaction logs proved that there was high activity using the ‘DV: document content’ feature, with 77.6 per cent of the participants using it. There was also 3.9 per cent of activity using the ‘TV: keyword approach’ and 7.2 perc ent of activity using the ‘DV: histogram with the timeline’.
A possible explanation for this might be that the participants received more information from the ‘DV: document content’ feature to detect the topic, and that the ‘DV: histogram with the timeline’ feature gave an overall view on the distribution of the topics for the specific cluster. The participants could identify how many topics the cluster contained, while the ‘TV: keyword approach’ feature gave good information on the most frequent terms appearing in the cluster, thus making it easier for the participants to relate to the topics.
6. Discussion
We investigated the effectiveness of iEvent (iTDT interface) in facilitating journalists in performing TDT tasks. Thus we set out to determine which features of iEvent facilitate tracking and detection tasks. This is the first evaluation of an interface that contains these features, and we are not evaluating whether iEvent is better than another interface; rather, what features are perceived as being useful (from the interview data) and what features were actually used (from the log data). Based on the works reviewed, none of them measured the effectiveness of their approach and features applied on the interfaces from a formal user aspect. Most of them reported on the effectiveness of the technique for system performance using information retrieval and TDT-style evaluation. Therefore, this proved that this is the first evaluation from a formal user aspect. This experiment has shown that, generally, iEvent facilitates the participants in performing well in a high percentage of successful tracking and detection tasks. Surprisingly, only 3.8 per cent of tasks were unsuccessful in the tracking task, and none in the detection task. The findings reveal that the participants were more familiar with the topics in the tracking task after using iEvent. They were also more interested in the topics in the tracking task after using Set-up 2 of iEvent.
These were the features with the highest ratio that participants perceived as useful, effective, helpful and interesting, as shown in Table 8. The results reveal that generally, CVwas useful and interesting, DV was effective and interesting, and TV was helpful.
The ratio of each feature across participants’ opinion in the tracking task
Higher = better; highest value shown in bold.
For CV, a ratio of 11:1 participants agreed that the ‘CV: cluster labelling’ and ‘CV: top terms’ features were interesting. The participants found that the ‘CV: cluster labelling’ feature in Set-up 2 of iEvent was more useful, effective and interesting than Set-up 1. They also found that the ‘CV: top terms’ feature in Set-up 2 of iEvent was more helpful and interesting than Set-up 1. Meanwhile, a ratio of 10:1 participants perceived the ‘CV: cluster visualisation’ feature as useful during the tracking task. Thus this feature received the highest ratio for usefulness, but it was the lowest in the detection task because the participants only dealt with one specific cluster to detect the related topics compared to the tracking task, where participants had to track several related clusters. For the detection task, there was only one feature in CV, the ‘CV: top terms’, which received the highest percentage.
For DV, a ratio of 14:1 participants found that the ‘DV: histogram with the timeline’ feature was interesting, and 11:1 participants agreed that ‘DV: document content’ was effective. Interestingly, these two features also received the highest ratio on the opinions mentioned. The participants found that the ‘DV: document content’ feature in Set-up 2 of iEvent was more interesting than Set-up 1. It also appears that the DV was an important component, since two features in it – ‘DV: histogram with the timeline’ and the ‘DV: document content’ – received a high percentage in the detection task. These indicate that the DV with the features in it did facilitate the participants in performing both tasks. In addition, the ‘DV: document content’ feature in Set-up 2 of iEvent was used more frequently compared to Set-up 1 during the detection task.
For TV, a ratio of 5:1 participants agreed that the ‘TV: keyword approach’ feature was effective. They also found that the ‘TV: keyword approach’ feature in Set-up 2 of iEvent was more helpful and interesting than Set-up 1. Finally, the ‘TV: histogram with the timeline’ feature received the highest ratio, with 12:1 participants agreeing that it was helpful. Meanwhile, in the detection task, there was only one feature in TV that received a high percentage: the ‘TV: keyword approach’. In addition, the ‘TV: keyword approach’ feature in Set-up 1 of iEvent was used more frequently compared to Set-up 2 during the detection task.
As shown in Table 9, the CV with the features in it, such as the ‘CV: cluster labelling’ feature and the ‘CV: top terms’ feature, facilitated the participants in performing both tasks. Meanwhile, the ‘CV: cluster visualisation’ feature facilitated the participants during the tracking task, but not for the detection task, due to the nature of the task itself. The DV with the features in it, such as ‘DV: histogram with the timeline’ and the ‘DV: document content’, did facilitate the participants in performing both tasks. The participants found that the ‘TV: keyword approach’ feature was popular in both tasks because they needed to detect the related topics. The ‘TV: keyword approach’ feature allowed them to see the most frequent terms in the specific cluster assigned. Meanwhile, the ‘TV: histogram with the timeline’ feature was popular during the tracking task, which probably had to do with participants’ behaviour in trying to match the pattern of the ‘DV: histogram with the timeline’ feature with the ‘TV: histogram with the timeline’ feature. These results indicate that the ‘TV: keyword approach’ feature facilitated the participants in both tasks, while the ‘TV: histogram with the timeline’ feature only facilitated the participants in the tracking task. We believe the iEvent interface is good and worth hearing about, and that these results could be a guideline for the design of iTDT interfaces.
The comparison of each feature in facilitating TDT tasks
7. Conclusion
Overall, these findings reveal that the iEvent interface generally facilitated the journalists in performing well in the TDT tasks. There were a few features in Set-up 2 of iEvent that facilitated the journalists in performing well in the TDT tasks. This indicates that highlighting the named entities with different colours affected the participants’ opinions of iEvent, thus it would be interesting to merge Set-up 1 and Set-up 2 in one interface for future work on iEvent, such that journalists may have an option to enable highlighting named entities in the features of iEvent. Some comments were made suggesting a revision of the iEvent layout, which is interesting for future work on iEvent. A key contribution of this work was the design of a novel iTDT interface. The findings of this work have fundamental implications for the design of iTDT interfaces and their evaluation. The set of guidelines reported in this article is useful for future iTDT interface design, which could enhance the effectiveness of users’ performance in TDT tasks, thus the contributions made in this work will benefit the iTDT research community.
