Abstract
Within conversational media, how others perceive contributions affects his or her interactions with those contributions. This research explores a novel addition to conversational software, one that provides real-time assessment of quality across user contributions. An analysis of 2,157 online conversations examined attributes of quality, including lexical complexity and prompt-specific vocabulary. These factors helped to inform the redesign of an existing asynchronous online discussion board (AOD). More specifically, a real-time quality analyzer was constructed, which provides users with a visual breakdown of their post in relation to the overall group discussion thread. An experiment across two populations was performed and results found that the system increased overall levels of quality in conversations, while also increasing quality interactions across the system. The results were supplemented with survey data and a social network analysis (SNA), which discovered higher levels of system satisfaction and group cohesion.
Introduction
Members of a community of practice (CoP) will work together towards common goals, collaborate across common problems, share best practices, support one another and share in a common identity [25]. Online social networks (OSN) can be effective at replicating face-to-face CoPs [37] and provide common forums for information exchange, social support and social interaction [24]. In addition to the time and space benefits afforded through asynchronous technologies, OSN software provides additional affordances face-to-face CoPs cannot, such as promoting higher levels of reciprocity and providing all participants with the opportunity to have their voices heard [43]. OSN software can also facilitate trust within a CoP, providing users added opportunities to decide how and with whom they are comfortable interacting [40].
OSN software offers numerous modes of interaction including instant messaging, discussion forums, blogs and collaborative writing; each of which can be one-to-one and one-to-many in nature and promote reciprocity among users. This research focuses on asynchronous online discussion board (AOD) software, which, as detailed in Pituch and Lee [31] explicitly binds users’ participation within a CoP and promotes cognitive, on-topic, on-task, and sustained discourse among participants.
A significant challenge within an AOD, specifically academic AODs, concerns itself with quality. Fully explored in [28], the quality of an AOD is determined, not just by the rates of participation among users, but also in the content and context of those interactions. More so, since users tend to mirror the reading levels of their peers [6], elevating informal conversation to guide participants away from short reactive comments to more sustained substantive comments is an important research pursuit.
Advances in learning analytics and educational data mining, as defined in [11], offer unique solutions to addressing quality issues within a CoP. However, while existing systems look to address quality after content is submitted, such as those found in automatic essay grading systems first introduced in [2,21] and extended in [44], no systems exist that aim to provide real-time visual quality assessment as presented in this research. This research extends work in [38] and adopts a design science research methodology to design, construct, implement and test a real-time quality visualization component for an AOD. More specifically, this research provides users with a dynamic assessment of their response quality. Results found that the software improved overall levels of conversational quality, increased levels of participation across online conversations and produced a higher quality CoP.
Background and related work
User engagement
Within asynchronous learning environments, knowledge construction and comprehension occur best when participants are fully engaged with one another [26]. Yet engagement in a CoP can be influenced by many factors, including a member’s perception of overall levels of CoP engagement by their peers [34,42]. Eryilmaz et al. [12] investigated additional limiting factors including time, effort and attention to restrict collaboration activities among participants. Even when participants are fully engaged, further or future engagement can be influenced by a member’s own contributions and how they are perceived within the group [4]. In each case, reciprocity and valued added-ness of member contributions can play important roles. Furthermore, as suggested in [5], having an ability to monitor and quantify individual contributions can play an integral role in promoting user engagement within the CoP.
Captology
A distinct purpose of this research is to modify user behavior in order to enhance the quality of online conversations. Under the subfield of human computer interaction, the field of captology considers the influence of technological artifacts on human behavior. More specifically captology involves thinking clearly about target behaviors and how technology can be incorporated to achieve those goals. Captology has been used extensively as a framework for modifying behavior within online CoPs. Presented in [15], technology can apply social dynamics to convey social presence and to persuade users. In [13], researchers introduced multimedia blogging and podcasts to influence user-interaction. In [1], captology guided software construction to enhance activity awareness and increase social support within learning groups. In [39], and more closely aligned with AOD design, researchers introduced a real-time sentiment analyzer to encourage more positive interactions among CoP members. Furthermore, as evinced in [29], visualization systems, specifically within higher education, promotes competition, social comparison and social learning. In this research, conversational media is enhanced across multiple dimensions to evaluate contributions in real-time and represent these evaluations through visual cues. This research falls under the area of captology because these changes seek to modify behavior within a CoP to support higher quality user contributions and increase interactions with these contributions.
Quality factors
Automated assessment of text remains a relevant computing problem [8]. To better understand aspects of quality within existing CoPs, a historical analysis of 2,157 online conversations was performed and focused on lexical complexity and prompt-specific vocabulary usage. These features have been investigated across automatic essay grading systems [2,21,44] and comment complexity within OSNs [22]. These past studies yield important insights for this research, which aims to provide dynamic visualization of user contributions.
Lexical complexity
Lexical complexity refers to the overall complexity of a text, or simply stated, how difficult the text is to read. Numerous methods can determine lexical complexity, but there is no absolute formula that can determine the difficulty of a text, with each method suffering from certain limitations. The proposed method for calculating lexical complexity was derived based on a review of existing formulas. The first method considered the Automated Readability Index (ARI), which calculates complexity using word difficulty, or numbers of letters per word and sentence difficulty, or number of words per sentence [36]. This technique is similar to the Coleman-Liau Index, which also considers letters per word, but also considers the ratio of words per sentence. The simple rationale, according to Coleman [7] was that there is no need to estimate syllables, since word length in letters is a better predictor of readability than word length in syllables.
While readability and comprehension are important, with higher levels of education come higher levels of discourse (i.e. more elaborate sentence transitions and more complex word structures). In short, difficult sentences are not necessarily detrimental to the flow of a conversation and can highlight deeper conversations and knowledge integration. For this reason, other readability metrics are considered. The Flesch-Kincaid Readability Test was integrated as it also considers syllables per word, not just word count. In other words, the more syllables a word has, the more complex that word is [14]. This is a similar technique used in the Gunning-Fog Index [19], which considers word structure and syllables.
Exploring different application programming interfaces for sentence structure and word complexity, the Readability Metrics API [23] was discovered, which provided reading scores across numerous methods, including the methods indicated above. For the initial prototype, a hybrid model was adopted, which produced an average readability score based on ARI, Coleman-Liau, Flesch-Kincaid and Gunning Fog. Linear mapping helped to normalize these readability metrics to a 0- to 8-point scale.
Figure 1 illustrates trends across origination posts and response posts with respect to lexical complexity. It should be noted that posts with higher readability also received a higher number of responses across both origination and response posts. It should also be noted that readability scores across origination posts and response posts trended similarly. The general assumption here is that the easier a post is for a user to digest, the more likely that post will be responded to.

Lexical complexity (LC) vs. response count.
Educational dialogue functions best when participants remain focused on central topics. Within technical disciplines, where technical expressions are continuously evolving, awareness and integration of key terms play an important role in dialogue. Furthermore, the integration of these terms into conversational dialogue shows, to some extent, a level of their applied comprehension.
Prompt-specific vocabulary refers to words and phrases related to a given topic. Online conversations largely center on a specific topic and when responses diverge from the central premise, threads can start to lose meaning. Therefore, another aspect of a quality conversation can be how well a conversation stays on topic. A limitation of this, of course, considers restricting users’ creativity streaks and pigeon-holing contributions to focus only on information related to the discussion at hand. However, this limitation should be normalized against the overall objectives of a conversation. Additionally, prompt-specific vocabulary should be constructed in such a manner as to afford users with a broad range of vocabulary, including tangentially related topics from which a user can draw from and elaborate on.
Using the initial seed set of 2,157 conversations, prompt-specific vocabulary was analyzed using subject matter keywords, which can help contextualize conversations and anchor them back to a discussion. Subject matter keywords were identified through a combination of relevant texts using a keyword generator [17] to extract pertinent keywords as well as expert-driven keyword identification and inclusion. All keywords generated were reviewed by experts in the field for their relevance to CoP discussion topics. This approach, while rudimentary, offers a good starting point for identifying topic focus. This approach could be further enhanced to use activity-based topic discovery, which links terms based on topic areas, as proposed in [45].

Prompt-specific vocabulary (PSV) vs. response count.
Figure 2 illustrates trends across origination posts and response posts with respect to prompt-specific vocabulary. Keyword density was calculated using the total keywords found within a post divided by total post words posted less its stopwords. From this analysis a couple discoveries were made. First, mean density, or the number of keywords per post, was low, with averages below 10%. A second discovery was that posts with the largest number of keywords received the most response posts. It was discovered that keyword density trended downward as for posts with lower keyword density. This indicates that users tended to flock towards more familiar text when responding across a thread. A significant goal for this research is to reverse this trend and facilitate higher-density response posts with more targeted vocabulary.
Represented in Fig. 3 is the initial formula for a quasi-quality index (QQI), which considers the sum of average lexical complexity and weighted scores for prompt-specific vocabulary usage.

Two-factor quasi-quality index.

QQI vs. response count.
The algorithm was evaluated against the same initial set of 2,157 origination and response posts. Detailed in Fig. 4, QQI scores are represented out of 100 for easier assessment. The mean QQI score for origination posts was 88 compared to 72 for response posts. It was accepted that QQI scores for response posts would be lower because they are not required to be as substantive as origination posts. Additionally, response posts tend to have a lot of back and forth dialogue, which can be off-topic or tangential to the high-level discussion topic. Discussed further in Section 2.3.4, this discovery resulted in modifications to the initial algorithm in order to facilitate more on-topic response posts.
Additional considerations were made to the final QQI formula. More specifically, a moderator feature was included, which allowed conversation moderators to identify appropriate lengths for an origination post and a response post. This modification was important because, as noted in [30] guidelines for quality can vary from discussion to discussion and are oftentimes not explicitly stated. This addition also helped to prevent gaming of the system with short statements that included high ratios of prompt-specific vocabulary.
System design
Software designers are presented with unique opportunities for influencing behavior. While controlling interaction within an OSN remains a challenge as proposed in [32], designers can construct software artifacts that aim to modify human behavior. Design science research considers this very notion, where researchers devise technical artifacts to bring about change [35]. In this research, a dynamic quality analyzer is proposed and integrated within an existing AOD. The system is innovative because it provides a real-time, visual analysis of user response quality and purposeful because it extends users’ interactions within AODs and aims to increase overall CoP quality.
Online social networking platform
The AOD constructed in this research is built atop the Elgg open source social networking engine. Elgg provides a robust framework on which to build an OSN [10]. The software comes bundled with functionality for threaded discussions, blogging, file-sharing and peer-to-peer (P2P) networking capabilities including friending and messaging. Figure 5 illustrates the existing AOD design using the Threads v2.0 plugin and is integrated into the larger Elgg OSN platform. The AOD interface mirrors traditional threaded discussion boards and provides a basic, yet clean and simple user interface. The AOD also represents the primary artifact to be redesigned.

AOD: control software design.
The AOD was redesigned to include a real-time data visualizer. Minor improvements were made to the algorithm detailed in Section 2.3.3. The first improvement expanded the keyword dictionary in order to incorporate a wider range of appropriate vocabulary. These modifications involved using natural language processing (NLP) techniques for sentence parsing, more exact word tokenization and advanced word stemming and lemmatization. While the system still stands to be improved through better NLP, the preliminary system focused specifically on capturing terms common across presented learning materials.
The design analyzes users’ posts against aggregate contributions made by the group. The system can be deconstructed across data, business and presentation layers. The presentation layer focuses on the user interface. The business layer represents the business rules implemented as computer programming logic. The data layer considers the underlying data model and how user and application data is stored. To conserve space, the data layer is wrapped into the presentation and business layer descriptions.
Presentation layer
Within the presentation layer of the AOD, three distinct views are highlighted.
Response Level: Illustrated in Fig. 6, users are presented with a standard input box for responding to discussion posts. Prior to submitting a new entry, a user is required to analyze their contribution text. This feature produces a visualized QQI score, which is calculated against the aggregate QQI scores of all contributions. At this point, a user can choose to revise and reanalyze their post or continue with their submission. Since responses to origination posts and peers may differ considerably, origination response posts are compared against top-level discussion responses, while threaded discussion posts are compared against peer-to-peer responses.
AOD: treated design 1 – thread-level. User Level: To the right of the AOD, but not shown to conserve space, individuals are shown a list of trending keywords, or those terms relevant to the overall discussion topic. This feature allows users to view and integrate relevant topic keywords.
AOD: treated design 2 – group level. Discussion Level: Illustrated in Fig. 7, discussion boards are assigned an overall QQI score, which visualizes the overall quality level for each discussion. This calculation incorporates the algorithm outlined in Section 2.3.3 and adds an extra value for discussion requirements, which can be customized to the unique needs of the CoP.


The business layer considers the underlying algorithm and how computer logic facilitates our proposed design. Illustrated in Fig. 8 is the System Architecture, which leverages Mashape Cloud (now RapidAPI) for calculating lexical complexity scores. Prompt-specific vocabulary usage scores and discussion requirements scores are processed locally. At the data layer, the system stores all data related to processing, including 14 readability indices, keyword density and discussion requirements in a local MySQL database for faster processing when comparing and aggregating group QQI scores against individual QQI scores.

AOD: web services architecture.
Design science research requires that a design be rigorously tested. To measure the proposed design, a field study was performed on a targeted population of users. Data across both a control group (Group 1) and treatment group (Group 2) were collected and analyzed. Each group was comprised of upper-division undergraduate college students taking a cross-disciplinary general education course, required for graduation. Group 1 demographics consisted of 47% female and 53% male, with ages ranging 20–23. Group 2 demographics consisted of 46% female and 54% male, with ages ranging 20–23. Discussion topics and requirements for both groups were identical and consisted of a single top-level response post and three additional response posts to across weekly discussion topics. The only difference between the groups was that Group 1 utilized a more traditional AOD, first evaluated in [41], while Group 2 received the redesigned AOD. While population sizes for each group differed, general comparisons and system performance evaluations can still be made.
Results
Design validation
In total, 1,209 new conversations were generated across two groups consisting of a treated group and control group. A content analysis was performed and looked at general system effects across both populations. The content analysis was followed up with a social network analysis (SNA). Additionally, perceived levels of general system satisfaction, including constructs related to interaction and user engagement were also collected.
Quasi-Quality Index (QQI) scores
System usage
Group 1 consisted of 19 users. The total number of page visits per user across the entire CoP was 18,621 pages, or 980 pages per user. The total number of posts created per user was 563, or 30 posts per user. Group 2 consisted of 23 users. The total number of page visits per user across the entire CoP was 22,757 pages, or 989 pages per user. The total number of posts created per user was 646, or 28 posts per user.
Overall quality results
Detailed in Table 1, QQI scores were generated across both groups for all discussion responses. Overall, the mean QQI score for Group 1 was 73 compared to 86 for Group 2. Of the 563 posts generated by Group 1, 397 posts received no responses. For these posts, the mean QQI score was 72. Of the 646 posts generated by Group 2, 481 posts received no responses. For these posts, the mean QQI score was 77. For posts receiving at least one response, the mean QQI score for Group 1 was 77 across 166 posts compared to 86 across 165 posts for Group 2. Of posts receiving 3 or more responses, Group 1 had a mean QQI score of 87 across 55 total posts and Group 2 had a mean QQI score of 89 across 62 total posts.
QQI score vs. response count
QQI score vs. response count
Fully detailed in Table 2, QQI scores across G1 and G2 were tested for significance using the Mann–Whitney U test. Mann-Whitney, which is a simple test for measuring independent samples with small sample sizes. A Mann-Whitney U test can be used when the data is ordinal or when the assumptions of the t-test are not completely met [27]. Comparing QQI scores across G1 and G2 and assuming a NULL hypothesis that the proposed system would have no affect across the mean for each distribution, the Mann-Whitney U-value was 81 with a Z-score of 3.46207 and a p-value of.00027 resulting in significance at
Mann-Whitney U test for significance
In addition to overall QQI scores, scores for lexical complexity and prompt-specific vocabulary for origination and response posts were captured. Detailed in Table 3, G2 yielded higher overall QQI scores across both origination and response posts. Lexical complexity scores were similar across both groups, although G2 did maintain higher scores overall and across response posts. For prompt-specific vocabulary, G2 scores were higher across posts with more than one response, but lower for posts receiving no responses.
Lexical Complexity (LC), Prompt-Specific Vocabulary (PSV) and Quasi-Quality Index (QQI)
Lexical Complexity (LC), Prompt-Specific Vocabulary (PSV) and Quasi-Quality Index (QQI)
0 = Posts received no responses, >1 = Posts received responses
SNA background
Social network analysis helps to identify interactions within an associated network. More specifically, SNA research provides a visualized analysis of an existing social structure and allows for a better understanding of interaction patterns across online environments [16]. This ability to view the social graph structure and community evolution can be a crucial measure of a software’s design and can serve as an early indicator of the design’s success [3]. In addition to a visual representation of the social structure, an SNA provides quantitative metrics that define central players and relationships that exist.
SNA measurement
SNA graphs were constructed using the NodeXL Basic Template for Microsoft Excel version 1.0.1.380. NodeXL is open source and provides a range of basic network analysis and visualization features [20]. Utilizing the Fruchterman-Reingold algorithm to generate a force-directed layout, users (i.e. nodes) can be positioned graphically so that all edges are of more or less equal length and there are as few crossing edges as possible. Additionally, each arrow represents a weighted interaction, where larger arrows indicate a greater number of interactions between individuals. Furthermore, bi-directional arrows occur when there is interactivity between participants, measured indegree and outdegree values. Higher values for indegree and outdegree indicate that participants interact more frequently with one another.
SNA sociograms
SNA graphs were generated for both groups. Illustrated in Fig. 9 (Group 1) and Fig. 10 (Group 2), individuals are depicted by their placement within the graph. Each node represents a single user. Nodes are labeled with a user’s mean QQI score and, within parenthesis, the total average scores for lexical complexity and prompt-specific vocabulary respectively. Group 1 consists of 19 total nodes and Group 2 consists of 23 total nodes.

Group 1 SNA. Nodes are labeled with their QQI score and scores for lexical complexity and prompt-specific vocabulary in parenthesis.

Group 2 SNA. Nodes are labeled with their QQI score and scores for lexical complexity and prompt-specific vocabulary in parenthesis.
Elaborated on in the discussion, key metrics were captured for indegree, or the average number of responses received and outdegree, or the average number of responses made. Mean indegree/outdegree was 9.7 (
System feedback
User perceptions of both designs were captured through survey and provided a rudimentary baseline for comparing interventions. Survey instruments utilized a five-point numeric scale ranging Strongly Agree to Strongly Disagree. Feedback from 43 participants was obtained.
AOD design
A set of survey items looked at perceived usefulness and ease of use of the QQI and was only measured against Group 2, which received the new design. In total, 23 users from this group were surveyed. Instruments were measured for internal reliability resulting in a Cronbach’s alpha score of 0.79. This score is well above the generally accepted value of 0.70, indicating that the data collected is reliable for further analysis.
Detailed in Table 4 and elaborated on in the discussion, 87% of users agreed that the AOD was easy to use and 79% felt comfortable using it. 80% of users agreed that the QQI score was useful and 63% indicated that individual QQI scores and group QQI scores influenced their responses. 75% of users felt that the QQI score helped to improved their response quality.
Perceptions on system design
Perceptions on system design
Group 1 perceptions of interaction and community
Group 2 perceptions of interaction and community
A second set of questions focused on individual perceptions of interaction and community. Pre-validated instruments were measured for internal consistency across this construct, resulting in Cronbach’s alpha scores of 0.86 for the pretest instrument and 0.84 for the posttest instrument, suggesting that these instruments also had adequate levels of internal consistency.
Detailed in Table 5 and Table 6 respectively, are responses to those items. Elaborated on in the discussion, Group 1 perceptions on interaction, information exchange and community shifted from pretest (90%, 95% and 90%) to posttest (82%, 88% and 70%). Group 2 perceptions on interaction, information exchange and community shifted from pretest (92%, 92% and 100%) to posttest (95%, 90% and 90%).
Discussion
This research explores the effect of a real-time visualization system across attributes of conversational quality, including individual response quality, peer-to-peer interactions and overall community satisfaction.
Conversational quality
The Technology Acceptance Model suggests that the success of a system can be determined by how useful users perceive the system to be and how easy it is to use [9]. The quality visualizer was designed to provide users with a dynamic, yet simplistic, method for assessing the quality of a contribution prior to its submission to the discussion thread. In measuring response quality, multiple factors are considered. The first factor considers users’ perceptions of the proposed system. Posttest survey results found that 71% of participants understood the purpose of the QQI and 75% indicated that the QQI helped to improve their individual response scores, indicating that the system provided perceived utility. Additionally, users found the system to be easy to use. Even with the two-step process of first analyzing the QQI score post prior to submission, results do not reflect any negative perceptions with 87% of survey respondents agreeing that the system was easy to use. Additionally, 79% were comfortable using the system.
Perceived levels of satisfaction are important but are only able to capture users’ attitudes towards a system, which can be complicated by numerous factors including, but not limited to, individual rater bias. Therefore secondary factors are considered, which aim to construct a more objective relationship between the proposed system and user performance. For these, QQI scores were compared using a between groups study. The overall mean QQI score for Group 2 was 86 and considerably higher than Group 1’s average QQI score of 73. Additionally, a Mann–Whitney U test showed significance when comparing individual quality levels. This success can be attributed to a number of distinct design features. For example, a user’s post would experience lower quality scores should the response suffer from poor punctuation, incomplete sentences or if they failed to consider key vocabulary terms related to the discussion topic. Conversely, users received higher scores for longer contributions that were highly readable and integrated key terms related to the discussion topic. As inferred by perceived levels of usefulness, users in Group 2 tended to be more mindful of their contributions. As a result, the system afforded individuals the ability to assess the quality of their contributions and helped keep conversations topic-focused and more academic in nature. While future work aims to improve the QQI algorithm through more sophisticated NLP, as a rudimentary measure of quality, the incorporation of QQI scores into the AOD design can be considered successful when comparing it against control software.
Peer-to-peer quality
As a collaborative system, AODs support interaction between participants. In this research we compare peer-to-peer interactions across both groups. To measure this, multiple factors are considered. The first factor looks at the aggregate QQI scores. A content analysis discovered fewer responses made against lower quality posts across both groups, but a larger percentage for the treated group. For Group 2, 74% of responses, with a mean QQI score of 77, received no responses compared to Group 2, where 71% of responses, with a mean QQI score of 72, received no responses. Simply stated, lower quality posts received fewer responses. This is seen as a positive finding since the goal of the system was to support interaction across higher quality responses. These findings support the notion that individuals are not only producing higher quality posts, but they are also gravitating towards posts with higher QQI scores. Consequently, in addition to facilitating higher quality posts, overall, QQI scores helped guide users away from lower scoring posts, thus increasing interactions with more substantive, higher-quality posts. A simple explanation for this is that users targeted discussion posts with more substance in order to formulate higher quality individual responses of their own. This finding is particularly interesting since it was an intentional design consideration to hide QQI scores at the response level, so as to not broadcast ‘high’ versus ‘low’ quality posts and contribute to fear factors identified in [9,20]. Instead, QQI scores were only shown at the aggregate group levels and at individual submission levels.
A secondary factor considers key SNA metrics. Overall, while the mean number of posts were similar (
CoP quality
After measuring the systems influence on quality response posts and quality P2P interactions, the overall quality of the CoP was also assessed. In measuring this construct, key SNA metrics, including network density and median betweenness centrality were analyzed.
The strength of a social network can be measured by evaluating the density of its network connections. Density is important because it measures the strength of ties between all participating members. The number, on its own (e.g. 0.53 for Group 2), provides little meaning, however, when it is compared against a benchmark, the number can provide insights into the strength of the overall network. Comparing SNA metrics of Group 1 and Group 2, it was found that both groups maintained the same network density of 0.53. However, social networking theory asserts that as a network increases in size (21% in the case of Group 2), density should decrease. In an academic setting, this often holds true and as a classroom population grows, meaning more students enroll and participate, it becomes less possible for every student to be able to connect with every other student. Yet, what was discovered was the opposite. This can be partly attributed to the success of the enhanced AOD design, which provided students with a visible metric for identifying quality response posts and, thus, more options to explore. This allowed users to cast a wider net when responding to their classmates, since QQI scores functioned as a support structure for producing higher quality posts.
Another factor considered is median betweenness centrality, which considers an individual’s placement within a social network. Social networking theory asserts that nodes with higher betweenness centrality will exert more control and influence over a network. SNA metrics identified Group 2 as having a higher distribution of centrality across the network (Mdn 5.6,
Finally, this concept links back to group awareness theory. It can be inferred that providing users with a visual snapshot of individual quality in relation to the group’s quality, a user felt more comfortable engaging with more members of the CoP. This can be particularly helpful for individuals who are less inclined to share information should they be uncertain of how their post may be perceived by potential weak relationships, or peers they may communicate less frequently with. This apprehension or general fear of participating, as described in [33], may impede overall engagement or cause users to engage in more defensive behavior, as described in [18].
System quality
This research introduces a real-time system for analyzing quality in conversational media. At the presentation layer, users identified that the interface was easy to use (87% user agreement) with an acceptable color-scheme (92% agreement). The majority of users also understood the general purpose of the system (71% agreement) and most felt that the system affected their individual contributions (63% agreement). However, general improvements to the system can still be made. For example, keyword generation was not automated and were generated asynchronously for each discussion board, a process that should be automated in future implementations. Additionally, in total, the quality analyzer was used 2,582 times. Considering that each post is analyzed automatically prior to submission (2,582 less 646), only 33% of posts were analyzed further by users prior to final submission. Across future iterations, the software could be extended to include automated keyword suggestions and prompts. Additionally, linking keywords to additional resources, such as the group wiki, will allow users to expand on domain-level knowledge. Capturing quality statistics and affixing them to a user’s profile may also provide additional incentives for higher-quality contributions. Finally, integrating the system into other features of the OSN, such as blogs and wikis, may prove useful and offer a better and enhanced CoP experience.
Limitations
Limitations should also be addressed. First, regarding the software, unfortunately, the existing system still suffers from the cold start problem common across recommender systems. The current system provides a visualized snapshot of a QQI scores against the group. However, should a user be first to response, there is no aggregate data to compare against. One possible expansion would be to use historical data to provide that user with feedback. Additionally, future experiments should look to increase the population sizes of both the control and test groups. The population size was relatively small (19 users for the control group and 23 users for the treated group). Although a Mann-Whitney U test was performed, more advanced statistical analyses and better user sampling may prove more telling. Additionally, the QQI score requires refinement. Few posts ever achieved a perfect 100 score. Future analysis will explore measuring QQI scores against expert ratings to construct a more accurate score.
Conclusion and next steps
This research explored the introduction of a dynamic, real-time quality analyzer within conversational software. Prior to the construction of the system, data mining was performed, analyzing 2,157 historical online conversations. This analysis measured aspects of quality including lexical complexity (i.e. readability) and prompt-specific vocabulary usage (i.e. keyword density). A third factor, discussion post structure, was added prior to the final system design, which deterred users from gaming the system. The final artifact consisted of a real-time quality visualizer and was pilot tested to positive results.
A between-groups experiment was performed and measured the differences between the proposed system and a control group. It was discovered that the enhanced system produced a greater ratio of quality conversations. A social network analysis linked these conversations to the larger social network and showed that the treated group yielded higher levels of betweenness centrality compared against the control group. Users also perceived higher levels of interaction and overall course community. These findings show how the successful integration of a quality analyzer into conversational software can influence users to produce higher quality contributions, which in turn yields higher quality peer-to-peer interactions, which in turn yields a higher quality OSN. More so, the proposed design provides a proof-of-concept for greater infusion into other OSN features including blogging, collaborative writing and messaging systems.
