Abstract
The measurement of knowledge contribution in collaborative knowledge building is an important research topic in computer-supported collaborative learning. The information measures of knowledge contribution based on information theory are proposed in this study, which includes two measures: amount of information and information gain. Discourse data collected from a collaborative knowledge building activity were analyzed to validate these measures. The results showed that our information measures can complement the traditional behavioral. With the help of the two measures, community-level variation tendency and individual-level knowledge contribution characteristics could be analyzed in collaborative knowledge building activities. A log function was used to fit the community knowledge variation tendency to measure the convergence of knowledge building. Students were clustered into five types according to their behaviors and contributions in collaborative knowledge building. Both teachers and researchers can benefit from these two information measures by using them in practice.
Keywords
Knowledge building plays an important role in the process of learning. Combined with the affordance of a CSCL environment, Scardamalia and Bereiter (1991) proposed a collaborative knowledge building approach which focuses on the learners’ collective cognitive responsibility for the advancement of knowledge. Use of collaborative knowledge building to improve classroom teaching and learning has received considerable attention in recent years (Cress & Kimmerle, 2008; Ghazal et al., 2019; Hong & Scardamalia, 2014; Lei & Chan, 2018; Scardamalia & Bereiter, 1991, 2006, 2014). The theory of knowledge building has become a theoretical rationale and pedagogical basis for CSCL (Ming & Law, 2006). In collaborative knowledge building, students develop knowledge by collaborating to discuss and compare ideas in solving problems (Scardamalia & Bereiter, 2006). Collaborative knowledge building can develop students’ collective agency and responsibility for sustained idea improvement (Scardamalia & Bereiter, 2014). The aim of collaborative knowledge building is to create valuable knowledge structures as a group (Yücel & Usluel, 2016). Although knowledge building is a group phenomenon, contributions come from identifiable individuals (Hong & Scardamalia, 2014). However, measuring the contributions in the knowledge building process is difficult in teaching practice and the research field. One commonly used method is outer behavioral measurement, such as the number of the posts, length of the posts, and topics read (Xie, 2013). Another method is content analysis. Researchers evaluate the contributions of posts according to the coding schema (Stegmann et al., 2012; Xie et al., 2018). However, previous research illuminates the shortcomings of these methods. Behavioral measurement does not consider the content of the subject. Thus, while it is useful in capturing behavioral patterns, it is not good at measuring knowledge contribution (Hong & Scardamalia, 2014). Research has confirmed no association between forum behavioral participation and knowledge gain (Konstan et al., 2014; Wang et al., 2015). Content analysis can provide a more accurate evaluation of knowledge contribution. However, it is time consuming. And sometimes if the coding scheme is a little ambiguous, the results might be subjective (Strijbos et al., 2006).
Based on these deficiencies, we propose a new method to measure knowledge contribution in collaborative knowledge building environments from the perspective of information theory––Information Measures of Knowledge Contribution (IMKC). IMKC is based on the keywords of the posts. Keywords refer to those important words and phrases in a field which are defined by experts. Students contribute to the topic through the keywords they write in their posts. Their contribution was measured by calculating the amount of information and information gain of the posts, which were two aspects derived from the basic principles of information theory. In so doing, the lack of content in behavioral measurement is avoided, and the efficiency of content analysis is improved. Using IMKC, both community- and individual-level knowledge contribution can be measured automatically. Furthermore, characteristics of individuals in the knowledge building can also be discovered.
Literature Review
Community and Individual Characteristics in Collaborative Knowledge Building
Collaborative knowledge building is a crucial activity to promote individual and collective learning in the educational world (Alonso et al., 2015). Understanding the knowledge building process can help us organize it better. There are two typical granularities in the analysis of knowledge building activities: community/group level and individual level.
From the perspective of community-level, prior researchers have depicted the general process of knowledge building in knowledge building activities (Garrison et al., 2001; Gunawardena et al., 1997). Gunawardena et al. proposed a five-phase analysis model for examining social construction of knowledge. The five phases are sharing/comparing of information; discovery and exploration of dissonance or inconsistency among ideas, concepts or statements; negotiation of meaning/co-construction of knowledge; testing and modification of proposed synthesis or co-construction; and agreement statements or applications of newly constructed meaning (Gunawardena et al., 1997). Furthermore, Garrison et al. proposed a four-phase model to describe the cognitive presence in the community of inquiry. The four phases are triggering event, exploration, integration, and resolution (Garrison et al., 2001). Although they outlined different details of the phase, they all agreed that at the beginning of the process, participants begin to share their own ideas, which contributes a large amount of information. At the end of the process, they tended to reach an agreement with each other, and the information tended to converge. Knowledge building seeks systematicity, coherence, and convergence, as participants engage in meaning-making to extend their understanding of knowledge building (Hewitt, 2001; Suthers, 2001; Wells, 1999). So according to these previous studies, the ideal tendency of the community knowledge in knowledge building activities should be as follows: the amount of information increases very quickly at the beginning of the discussion, and then levels off gradually in the last part of the discussion process.
As to individual-level characteristics, studies are focused on the pattern of students’ behaviors in the knowledge building activities. Hod et al. proposed five co-development patterns in humanistic knowledge building communities: exploring prior knowledge, experience and self; re-interpreting prior experience; coordinating self with experience; connecting knowledge to personal experience; and making a holistic change (Hod & Ben-Zvi, 2018). Beaudoin classified students as learners and lurkers, defining a lurker as a student who reads but never writes in the discussion forum (Beaudoin, 2002). Chan et al. proposed seven user roles in knowledge building according to students’ participation: popular initiators, popular participants, joining conversationalists, supporters, taciturns, elitists, grunts, and ignored (Chan et al., 2010). Similarly, Rodrigues et al. also used clustering to identify the participation patterns in MOOC forums (Rodrigues et al., 2016).
These studies revealed the behavioral characteristics of both community-level and individual-level in knowledge building activities from a qualitative perspective. However, limited research has focused on the knowledge contribution of different granularity from a quantitative way.
Measurement of Knowledge Contribution
Knowledge contribution is a critical element in an online knowledge community, as individuals share and combine knowledge for their own benefits while enhancing the value of the community (Faraj et al., 2011). How to measure the knowledge contribution is crucial to improve the quality of knowledge building activities. The commonly used measurement methods can be classified into four categories: self-assessment and peer-assessment; automated analysis based on outer behaviors; content analysis; and calculation based on keywords.
The first kind of measurement is self-assessment and peer-assessment of community knowledge and contributions (Cheng & Warren, 2000; Lejk & Wyvill, 2001; Ma et al., 2020). In self-assessment and peer-assessment, students are integrated into the assessment process as assessors to evaluate the contributions of their own or others’ products (Orsmond et al., 2002). Self-assessment and peer-assessment are commonly used to identify the important knowledge contributions in collaborative knowledge building (Lu & Zhang, 2012; Ma et al., 2020 ), and has been indicated as a reliable and valid assessment approach (Lu & Zhang, 2012; Topping, 1998). Nevertheless, self-assessment and peer-assessment is a time-consuming task, and its validity depends on students’ individual factors such as cognitive level, motivation, engagement, attitude, etc.
The second measurement is automated analysis based on outer behaviors. Typical behavioral measures in collaborative knowledge building include the number of the posts, the length of posts, the number of topics read, and number of replies (Almatrafi & Johri, 2019; da Silva et al., 2019; Xie, 2013). These outer behaviors are suitable for identifying underlying behavioral patterns and have a high processing efficiency. However, outer behaviors ignore the content and subject of the knowledge, which is very important in knowledge building. Wang et al. also pointed out that behavioral participation is not correlated with knowledge gain (Wang et al., 2015). Thus, it is a partial solution for the measurement of knowledge contribution.
The third measurement is content analysis. Researchers fixed a coding scheme to evaluate the online posts in collaborative knowledge building according to their research object (Strijbos et al., 2006). For example, content analysis has been employed to index web resources (Yi et al., 2016). Seven speech acts were defined to analyze forum posts in Massive Open Online Courses (Arguello & Shaffer, 2015). A four-class coding scheme has been used to analyze the leadership role in knowledge building activities (Xie et al., 2018). Since content analysis is a relatively accurate method to measure knowledge contribution, it is time-consuming to code the posts and difficult to implement in massive learning situations such as MOOCs. In addition, the results are mostly used by researchers, not by teachers or students.
The fourth kind of measurement is the calculation based on keywords. Researchers count the keywords in the post to measure its knowledge contribution. Hong et al. designed key-term measures to assess community knowledge in knowledge building environments (Hong & Scardamalia, 2014). In addition, a key term-based indicator framework has been proposed to measure the level of knowledge elaboration in terms of coverage, activation, and equitability (Zheng et al., 2018). This method can achieve high efficiency in measuring knowledge contribution. It used the frequency of the words to define knowledge contribution, which means the number of the words appeared in the text. However, it ignored that the contribution of the keywords to a certain topic is different from each other. We can not just count the occurrence number of the keywords to measure the contribution of them. Those words who have a higher frequency in a topic may have a lower contribution to the knowledge building. For example, in the field of database security, “data encryption” have more contributions to the knowledge building than “database security”, though it may have a lower frequency than the latter. So we need to design a new method to measure the knowledge contribution not directly based on frequency.
Principles of Information Theory
Information theory studies the quantification, storage, and communication of information. Claude Shannon and Warren Weaver founded a still valid paradigm for the mathematical analysis of communication (Klüver, 2011). In their classic papers, they defined information as the resolution of uncertainty which refers to epistemic situations involving imperfect or unknown information, and outlined mathematical approaches to measure a communication system (Shannon, 1948; Shannon & Weaver, 1949). Shannon considered information as a set of possible messages, where the goal is to send these messages over a channel. He defined several quantities of information to measure the statistical uncertainty of an information system.
The first is the self-information of an event, which is used to measure its information content. When event a occurs in an information system, the amount of the uncertainty which is reduced by the event a can be measured using the following formula:
In this formula, p(a) represents the probability that event a is chosen from all possible choices in the state space A.
Information entropy is used to measure the amount of the uncertainty of an information source which may have many basic events. The information entropy of an information source A can be calculated using the following formula:
Information theory provides a quantitative method to measure the content of the information in an information-related system. It has been widely used in many fields including communication (Vrankovic et al., 2020), chemistry (Mac Fhionnlaoich & Guldin, 2020), computer science (Zhang et al., 2020), psychology (Exel et al., 2019). These applications confirm the effectiveness and universality of information theory. In our study, the self-information is used to calculate the knowledge contribution of a keyword in the post which is the basis of the calculation of knowledge contribution.
The Present Study
Based on the effectiveness and universality of information theory in terms of information measurement, we employed it to measure information in a collaborative knowledge building context. When the teacher releases a topic, the topic forms a small information system. All the keywords related to the topic constitute the uncertainty of the space, and all the content under the topic makes up this information system. Students write posts under the topic to contribute to building knowledge on it. If a post contains useful information for the topic, it makes knowledge contributions to the topic and reduces the uncertainty stemming from this small information system. Since words and phrases are the basic units in Chinese language processing (Liu et al., 2011), keywords which are the core words and phrases in a topic are used as the basic units to quantise the information in text posts. Each post is composed of a set of keywords. The information content of a post can be measured by the sum of the self-information of all the keywords therein. We named this method “Information Measures of Knowledge Contribution (IMKC)”. Using this method, we can explore different levels of knowledge contribution in collaborative knowledge building environments. For example, we can explore the variation tendency of community knowledge in the process of collaborative knowledge building. We can also identify the characteristics of individual-level knowledge contribution in collaborative knowledge building. The research hypotheses proposed in this study are as follows:
Research Hypothesis 1 (H1): The results of IMKC method can complement the results of the traditional behavioural method.
Research Hypothesis 2 (H2): With the help of IMKC, we can explore the variation tendency of community knowledge in collaborative knowledge building.
Research Hypothesis 3 (H3): With the help of IMKC, we can explore the characteristics of individual knowledge contribution in collaborative knowledge building.
Method
Research Design
Figure 1 depicts the research design of this study. In a course named “Principles and practice of databases,” all students were asked to express their opinions on certain topics in the online discussion forum. The course lasted for 12 weeks. The teacher released a topic relevant to the learning content which is a thread in the discussion forum every two weeks. Students replied to this topic with their opinions under this thread. To ensure that each reply added new information to the community, all the students were asked to read all the existing posts before submitting their own. After 12 weeks, six topics were discussed in the online discussion forum, which were shown in Table 1. Two kinds of measures were collected from these data which were shown in Table 2. The first was the traditional behaviors of students in the online discussion, including three measures: number of posts (including new posts and replies), average number of words per post, and number of posts viewed. The second was the information measures of students in IMKC method. As mentioned, two information measures of knowledge contribution based on information theory are proposed in this study: the amount of information and the information gain. The methods used to calculate these two information measures are described in section 3.4 “Method to calculate amount of information in IMKC” and section 3.5 “Method to calculate information gain in IMKC”.

Research Design of this Study.
Six Topics Released in the Course.
List of Traditional Behavioral Measures and Information Measures.
Participants and Data Source
In total, 45 students were engaged in this course: 32 females and 13 males. They are between 19 and 21 years old, with an average age of 20.02. All the students were university sophomores majored in educational technology and had used the online discussion forum for at least one year. As such, they were highly familiar with the collaborative knowledge building environment in this study. During the class, the teacher released six topics about the main content of the course “principles and practice of databases” which were shown in Table 1. Participation was not compulsory. The teacher announced that students could get 2 to 5 points reward in the final score according to their participation in the discussion. At the end of the class, students wrote 204 posts in total in these 6 threads. Table 3 shows the distribution and the average length of the posts. In addition, each student’s final score was recorded. The final score came from a final exam which was given by the teacher of the course.
Distribution and the Average Length of the Posts.
Keywords Extraction Method
In this study, keywords were used as an important indicator to measure the knowledge contribution of students. Thus, the first step in processing the posts was extracting keywords from the text. To decide which words to consider keywords, three graduate students were trained and then manually extracted keywords from the textbook, teaching materials, and related research papers after training to form an initial set of keywords for each topic. This initial set can reduce the work of the expert teachers and improve the accuracy of the results. To refine the initial sets of keywords, two expert teachers, both with long experience in teaching databases, independently checked each keyword in the whole set of keywords, removed those unrelated to the topics, and added words considered essential in the topics. Inter-rater agreement for the two teachers was 0.83. Any cases of disagreement were discussed until both teachers reached agreement. Finally, 1025 words and phrases were selected as the keyword sets used in this study. Table 4 shows part of the keywords.
Keywords in the Course “Principles and Practice of Databases” (Translated From Chinese).
Based on the predefined keyword sets, a computer program was developed to extract the keywords from each post. For each post, the list of the keywords appeared in the post and the frequency of each keyword could be calculated using the computer program. These data were then used as the basis of IMKC method.
Method to Calculate Amount of Information in IMKC
After extracting the keywords, the knowledge contribution of each post could be calculated originated from information theory. When a student submits a post, he/she makes contributions to the knowledge building through the meaningful keywords they used in the post (Zheng et al., 2018). The quantity of the information content of the post can be calculated according to the keywords in the post. In this study, we defined the amount of information of each post as the quantity of the information content contained in the keywords of the post, which can be calculated by the sum of the self-information of all the keywords in the post. According to the definition of self-information in Shannon’s theory (Shannon, 1948; Shannon & Weaver, 1949), the amount of information of a post can be calculated using the following formula:
In this formula, M represents the number of keywords contained in the post. For pj, the probability that keyword j appeared in the current topic I is calculated as follows:
Method to Calculate Information Gain in IMKC
Based on the definition of the amount of information of a post, we can further define the information gain of a post using the following method.
After the teacher released a topic, students discussed it in their posts over time. All the posts about a certain topic were placed on a timeline, as shown in Figure 2.

Timeline of Posts.
Each post consists of a few keywords that can be extracted using the method proposed in section 3.3. The keywords may be repeated in different posts. If a post merely repeats the keywords in other posts and no new keywords, then we can assume this post has contributed little new information to the topic. Thus, the information gain of the post is mostly based on the new keywords therein. Following this idea, we defined the calculation method of the information gain of a post as follows:
Method to Explore Individual Characteristics
In this study, we have no predetermined characteristics about the students at the beginning of the experiment. So we used an unsupervised machine learning algorithm to explore the characteristics of students: k-means algorithm. K-means algorithm is a clustering algorithm, which can group a set of objects in a way that the objects in the same group (which is called a cluster) are more similar to each other than to those in other groups. The k-means algorithm has two parameters: the initial cluster centers and the number of the clusters k. In most situations, the initial cluster centers are randomly choosed from the data set. The value of k is related to specific datasets. Different value of k will bring different clustering results. The best k value will make the results have the smallest within-class distance and the largest between-class distance. The best k value is usually determined by approaches based on Silhouette coefficient (Aranganayagi & Kuttiyannan, 2008; Sai et al., 2017) or Elbow method (Syakur et al., 2018). The basic process of k-means algorithm (Jain & Dubes, 1988) is shown in Table 5.
Basic Process of K-Means Algorithm.
Results
H1: The results of IMKC method can complement the results of the traditional behavioural method
When analyzing traditional learning behavior, commonly used learning indicators are students’ overt behaviors such as number of the posts, number of the words per post, number of words per sentence, number of posts viewed, time spent discussing, and so on (Almatrafi & Johri, 2019; Xie, 2013). To explore the relation between information measures and traditional behavioral measures in measuring knowledge contribution, we compared the information measures and the traditional behavioral measures using a correlation analysis. The five measures of each student in the discussion were calculated and subjected to a Pearson correlation analysis. Furthermore, a correlation analysis was also conducted for the information measures and student’s final score. Table 6 provides the results of the Pearson correlation analysis.
Correlations Between Information and Traditional Behavioral Measures.
**p < 0.01. *p < 0.05.
The results show that the average amount of information is significantly associated with three traditional behavioral measures: number of posts, average number of words per post, and number of posts viewed. Average information gain per post is significantly correlated with two traditional behavioral measures: number of posts and average number of words per post. There was no significant correlation between the amount of information and the final score. However, information gain was significantly associated with the final score. From the results we can see that the amount of information is consistent with the result of the traditional behavioral measures, while the information gain is not. But the information gain is correlated with student’s final score, which may be used as a predictor of the final score. This phenomenon may indicate that information measures can reflect new information beyond traditional behavioral measures and it can be used as a complement to the latter.
H2: With the help of IMKC, we can explore the variation tendency of community knowledge in collaborative knowledge building

Variation Tendency of Amount of Information of the six Topics. (a) Topic 1. (b) Topic 2. (c) Topic 3. (d) Topic 4. (e) Topic 5. (f) Topic 6.
The blue lines in the graphs show the variation tendency of community knowledge contribution. According to the previous research, the ideal tendency of information in knowledge building activities is like this: the amount of information increases very quickly at the beginning, and then levels off gradually in the last part of the process (Garrison et al., 2001; Gunawardena et al., 1997; Hewitt, 2001; Suthers, 2001; Wells, 1999). The trend of the log function just fits this tendency; thus, a log function is used to fit the curve as the baseline. In our method, we used the log function (y = a * log2(x + 1)) to fit the variation tendency curve. The least square method was used to find the optimum fitting parameter of the log function. Here, parameter a in the function is the slope of the curve and represents the increasing rate of the amount of information. The red dotted lines in the graphs show the fitted curves of the log function. The fitted equations are also indicated on the graphs.
The results show that the amount of information in all six topics increases gradually. This means that in the process of the discussion, new posts always bring new contributions to the topics. In addition, the graphs show more information. First, the value of reference a (a = 0.184) in topic 5 (Figure 3E) is the greatest of the six topics, and the curve is also the most fitted with the log function curve. The second highest reference a (a = 0.178) is topic 3 (Figure 3C) and the lowest is topic 4 (a = 0.127) (Figure 3D). The bigger the reference a is, the more fitted is the variation tendency.
The large reference a, for topic 5 means that at the beginning of the discussion, students provided different opinions on the topic. Then, as the discussion progressed, the increasing speed of the amount of information started to slow, indicating that the topic tends to be convergent. Topic 4 differs from topic 5. The curve of topic 4 seriously deviates from the baseline of the log function curve. At the beginning of the discussion, the increasing speed of the amount of information was not fast enough, indicating that the students did not present enough new information for the topic. At the end of the discussion, the speed increased faster than it did in the beginning, showing that the students did not reach an agreement with each other.
Second, the amount of information increased sharply for some posts such as n_25 in topic 1, n_3 in topic 2, and n_32 in topic 4. Increased amount of information came from the new keywords appeared in the post. Thus, the sharply increased curve meant that the post contributed a lot of new keywords to the topic, which also meant they made a great contribution to knowledge building for the topic.
Third, the line from post n_13 to post n_16 in topic 4 (Figure 3D) is almost horizontal. This means that the amount of information for posts n_14, n_15, and n_16 has almost not increased at all. Few new keywords are contained in these posts. They merely repeated that said in other posts, and few contributions were made to the knowledge building of the topic.
H3: With the help of IMKC, we can explore the characteristics of individual knowledge contribution in collaborative knowledge building
K-means algorithm which is introduced in section 3.6 “Method to explore individual characteristics” is used to explore the individual characteristics of students in knowledge building. The Silhouette coefficient Method were employed to get the best value of k (Aranganayagi & Kuttiyannan, 2008; Sai et al., 2017). Figure 4 provides the results. In Figure 4, the x-axis is the value of k and y-axis the silhouette coefficient score. The bigger the silhouette coefficient score is, the better is the k value. For our data, the silhouette coefficient score is the highest when k equals 2. However, two clusters did not enable easily determining students’ characteristics. Thus, we chose the second highest value of the silhouette coefficient score: k equals 5.

Silhouette Coefficient of Different K Values in K-means Algorithm.
Figure 5 shows the clustering result when k equals 5. In the three-dimensional space in Figure 5, the x-axis represents the average amount of information per topic of a student. The y-axis represents the average information gain per topic of a student, and the z-axis the number of topics a student engaged in for the discussion. In total, 45 students were classified into 5 clusters, as shown by the different icons in the figure.

Clustering Results of the Students.
To better understand these clusters, we calculated the mean value of the three characteristics for each cluster, the results of which are shown in Table 7.
Characteristics of the Five Clusters.
These five clusters each have unique characteristics. Cluster 1 has the highest amount of information and information gain, and the students engaged in all the topics of the discussion. As such, the students in this cluster were actively involved in the discussion and contributed high-quality posts that included a large amount of information and many new contributions. We classified these students as “active students”. Cluster 2 has the second highest amount of information and information gain, and these students engaged in almost all the topics. As such, they were highly active in the discussion, and their posts were of a pretty good quality, although this was slightly lower than that of those in cluster 1. We named students in this cluster “excellent students”. Cluster 3 has a medium amount of information and information gain. However, these students engaged in a lower number of topics than those in cluster 1 and 2. As such, the students in this cluster only participated in and contributed to part of the topics and ignored other topics. For those topics they didn’t participate in, they have little contribution towards knowledge building. We named these the “self-involved students”. Cluster 4 has a medium amount of information and high number of topics engaged in. However, information gain was relatively low. As such, these students engaged in almost all the topics, and their posts seem to include a high amount of information. However, few new contributions were proposed, and they tended to repeat what others had already posted. We named these students the “repeaters”. Cluster 5 has the lowest amount of information, information gain, and number of topics engaged in. Students in this cluster engaged in few topics and their posts provided few contributions to knowledge building. As such, they demonstrated markedly negative performance in collaborative knowledge building. We named these students the “passive students”.
Discussion
The IMKC measures are partly correlated with the traditional behavioral measures. The amount of information is correlated with most of the behavioral measures, but not with the final score. This indicates that the amount of information has the same variation tendency as that of the behavioral measures. Our findings were the same as some researchers. For example, Vu et al. and Wang et al. both proposed that the overall activity in discussion forums did not predict the submission score (Vu et al., 2015; Wang et al., 2015). Information gain was correlated with only two traditional behavioral measures, but significantly correlated with the final score. This means it has stronger predictive power for the learning outcome than do traditional behavioral measures. Compared to the amount of information, information gain reveals more ingredients correlated with student’s learning outcomes. The reason for this may be that the method used to calculate information gain emphasizes the new and innovative ideas of the post, implying that the ideas expressed differ from those in previous posts. The method used to calculate the amount of information emphasizes the information content of the current post without any comparison to the previous posts. Students can easily repeat what others have already said to increase the information content of the post. Providing new ideas and innovations points to a deeper understanding of the topic, which may be associated with a student’s learning outcomes.
The second finding of the study is the variation tendency of community knowledge found using IMKC proposed in this study. Prior researchers have reached a certain consensus on the basic process of collaborative knowledge building. While the process differs slightly in the different models, the fundamental trend is similar (Garrison et al., 2001; Gunawardena et al., 1997; Hewitt, 2001). At the first stage, students share their own opinions, and the information content for the topic increases very quickly. At the end stage, students need to reach agreement on the topic. In this stage, the information content increases slowly. While researchers have qualitatively described this trend, no quantitative method has been proposed to measure it. This study addressed this gap by using a log function to fit the variation tendency curve. The slope of the curve represents the increasing speed of the information content. Using this parameter, teachers can compare the process needed for different topics and decide whether to intervene in the topic. In addition, students who make either significant or no contributions can be easily identified through the curve. Teachers can provide these students with individual instructions based on the results. As such, both teachers and researchers can benefit from this quantitative method of measuring knowledge contribution.
The third finding pertains to classify the students into five clusters based on IMKC and traditional behavioral measures in the knowledge building process as “active students”, “excellent students”, “self-involved students”, “repeaters”, or “passive students”. Previous studies clarified the characteristics of students in knowledge building environments based on their outer behaviors, which is essentially their participation, ignoring their knowledge contribution (Beaudoin, 2002; Chan et al., 2010; Hod & Ben-Zvi, 2018). To address this gap, in our study, we used both behavioral aspects (number of topics the student engaged in) and measures based on information theory (amount of information and information gain). Using these aspects, we could determine both participation and contribution. In the five clusters in our study, a repeater has a high degree of participation and high amount of information, but information gain is relatively low, meaning he or she makes little contribution to the topic. The self-involved student has a low degree of participation, but the average information gain of them is relatively good, meaning that he or she contributes to the topic. These two kinds of students are difficult to identify using traditional participation clustering. These characteristics, which are based on knowledge contribution, enable teachers to provide personalized instructions for students.
Conclusion and Implications to Practice
The measurement of knowledge contribution is an important research topic in collaborative knowledge building environments. This study proposed the IMKC method based on information theory to measure the knowledge contribution, which contains two information measures: amount of information and information gain. The study showed that amount of information is significantly correlated with traditional behavioral measures, and information gain is significantly correlated with students’ learning outcomes. Using these two knowledge contribution measures, the characteristics of community- and individual-level of knowledge contribution in collaborative knowledge building were also explored.
Both teachers and researchers can benefit from this method. With the help of this method, teachers can gain better understanding of the class and the topics students discussed. They can also recognize those students who have great or little contributions to the discussion, and provide individualized instructions to the students. Researchers can also use this method in research practice. Besides counting the number of the posts or annotate the content manually, researchers have another option or supplement to measure the contributions in collaborative knowledge building. With the help of this method, researchers can calculate the contributions automatically, and improve the accuracy by using it as a supplement to other methods.
Limitations, and Future Study
There are three limitations associated with this study. The first is that sometimes information gain may be not just based on new keywords. For example, if the previous post has incorrect information and the next one corrects it. Our method may not detect this contribution just based on keywords. The second is the relatively small sample size. The experimental data contained only 45 students and 204 posts, which was slightly small for a robust quantitative analysis. Third, this study only contained one subject. All the data were for the course “Principles and practice of databases.” To ensure the universality of the measurement method, more data from other courses are needed. In a future study, the size of the research sample will be expanded. In addition, future study will collect more data from different courses to verify the universality of the measures based on information theory.
Footnotes
Authors’ Note
Data of this study cannot be made openly available due to confidentiality agreements and ethical concerns. It can be accessed by contacting the author. Ethical approvals were gained from the hosting institution.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship and/or publication of this article: This paper was supported by Chinese National Natural Science Fundation Projects (No.61772012, No.71974073), the Humanities and Social Science project of Chinese Ministry of Education (No.17YJA880104), Open Fund of Hubei Research Center for Educational Informationization, Central China Normal University (No.HRCEI2020F0105), and E-learning research project by School of Educational Information Technology in Central China Normal University (No.CCNUSEIT202004).
