Abstract
Some questions posted in community question answering sites (CQAs) fail to attract a single answer. To address the growing volumes of unanswered questions in CQAs, the objective of this paper is two-fold. First, it aims to develop a conceptual framework known as the Quest-for-Answer to explain why some questions in CQAs draw answers while others remain ignored. The framework suggests that the answerability of questions depends on both metadata and content. Second, the paper attempts to empirically validate the Quest-for-Answer framework through a case study of Stack Overflow. A total of 3000 questions divided equally between those answered and unanswered were used for analysis. The Quest-for-Answer framework yielded generally promising results. With respect to metadata, asker’s popularity, participation and asking time of questions were found to be significant in predicting if answers would be forthcoming. With respect to content, level of details, specificity, clarity and the socio-emotional value of questions were significant in enhancing or impeding responses.
1. Introduction
With knowledge-sharing platforms such as community question answering sites (CQAs), users harness the collective wisdom of the online community through asking and answering questions. A typical question–answering cycle in CQAs commences when a user asks a question, and awaits answers from others. The question is said to have been resolved once the asker chooses a best answer from among the responses.
However, not every question asked in CQAs is always resolved. In fact, numerous questions fail to attract a single answer. For example, about one-fifth of unresolved questions in popular CQAs such as Yahoo! Answers fail to receive responses while some 43% of unresolved questions remain unanswered in Baidu Knows, a popular Chinese CQA [1]. A CQA that consistently fails to return answers may deter future participation by users. Eventually, as asking and answering activities wane, the relevance of the site will be called into question.
The longevity of CQAs clearly depends on users’ goodwill in responding to questions. Recognizing the importance of voluntary contributions, some studies have examined users’ motivation to share knowledge in online communities [2, 3]. Users often share knowledge with the expectation that their contribution will be worthy of the effort to create new value in the community [4, 5]. Their motivation is further fuelled by factors related to satisfaction and knowledge self-efficacy [6].
In addition to research on motivation, the growth of the number of unanswered questions in CQAs has recently started to pique scholarly interest in question quality [7, 8]. The use of interrogative words such as ‘what’ and ‘where’ was found to enhance question quality [9]. Moreover, lack of clarity emerged as a key reason why a question might deter responses [8]. Despite these studies, however, the extant literature lacks an overarching framework to predict questions’ likelihood of attracting answers in CQAs.
Therefore, this paper seeks to shed light on the conundrum of unanswered questions in CQAs through the following two-fold objectives. First, it aims to develop a conceptual framework known as the Quest-for-Answer to explain why some questions in CQAs draw answers while others are ignored. The framework suggests that answerability depends on both the metadata and the content of questions.
Second, the paper validates the Quest-for-Answer framework using data from Stack Overflow, 1 a closed domain CQA devoted to programming and software engineering. A total of 3000 questions divided equally between those answered and unanswered were used for analysis. In particular, Stack Overflow was chosen because it is currently one of the largest CQAs with more than some 1.6 million registered users, who have contributed some 1 million questions and 2.8 million answers [10, 11]. Its popularity stems from the fact that earning badges of expertise in Stack Overflow serves as credentials for computer programmers seeking jobs [12]. Even as Stack Overflow is known to attract answers promptly [13], the volume of unanswered questions has been rising steeply in recent years [14]. This makes it a suitable platform for investigation.
On the theoretical front, this paper ventures into a relatively under-represented area of CQA research. The scope of most CQA studies hitherto has been confined to answer-related themes such as the quality [15–17] and speed [13, 18, 19] of answers. Research on question-related themes, on the other hand, is only budding. Therefore, developing the Quest-for-Answer framework to predict the answerability of questions in CQAs represents a timely endeavour. On the practical front, the findings might offer prescriptive guidelines for users posting questions in CQAs so as to improve their chances of receiving answers. Moderators of CQAs might also use the findings to devise strategies to minimize the volume of unanswered questions.
The remainder of this paper is organized as follows. The next section reviews the extant literature, and describes the Quest-for-Answer framework. The methods for data collection, operationalization, coding and analysis are presented next. This is followed by the results and the discussion. The paper concludes by highlighting its implications.
2. Literature review
2.1. CQAs
Most users rely on search engines as primary gateways to seek information. Those who are familiar with library services may also take advantage of virtual reference services to meet their needs for specialized information [20]. The advent of CQAs represents another convenient avenue for online information seeking. Users are now able to ask questions directly on the Web in order to harness the collective wisdom of crowds [21].
Users’ reliance on CQAs for information seeking has attracted considerable scholarly interests. In particular, a burgeoning area of research focuses on questions submitted in CQAs. A common objective involves developing question clustering algorithms [22] so that unanswered questions can be answered using the existing corpus of responses [23]. To further mitigate the problem of unanswered questions, studies such as Zheng et al. [24] sought to automatically recommend answer providers who could be interested in responding to specific questions.
Despite such scholarly efforts, the reasons why some questions in CQAs draw answers and others are ignored have not been comprehensively investigated. Among the few related studies, the presence of interrogative words such as ‘what’ and ‘where’ was found crucial in determining if a question will be answered [9]. Furthermore, vague questions that lack clarity are known to deter answers [8]. Building on such studies, this paper develops the Quest-for-Answer framework to examine the answerability of questions in CQAs.
2.2. The Quest-for-Answer framework
The clue to attract answers could be found in features related to both the metadata and the content of questions. Metadata contain additional information about questions that are potentially related to their answerability. For example, questions asked by reputed askers, as indicated by reputation scores, might be more compelling than those submitted by askers with a mediocre reputation [25, 26]. Likewise, content of questions, such as the way questions have been phrased, could either engender or impede responses from answerers [7–9]. For instance, clearly phrased questions are usually better poised to attract answers compared with those that are vague.
Metadata features are those that are automatically captured and displayed in CQAs. Some of these are retrieved directly from the interface, while others could be derived in the form of ratios from the retrievable metadata. They are henceforth referred to as primary and secondary metadata respectively.
Three pertinent primary metadata are popularity and participation of askers, as well as asking time of questions. Popularity refers to the extent to which an asker is recognized in the community. Recognition could be shaped in part by the asker’s inherent ability to contribute in the CQAs. Most CQAs display a reputation score to summarize individuals’ standing in the community. Any message contributed by a reputed source is viewed more favourably compared with one shared by an amateur source [27]. For instance, the answerer’s reputation shapes the perceived quality of answers in CQAs [25]. Extrapolating this in the context of questions, the asker’s reputation might shape the answerability of questions.
Participation is a measure of the level of activity of an asker in the CQA community. Most CQAs indicate the user’s participation by displaying the volumes of questions asked, and answers posted. An asker who participates regularly in asking and answering questions is perceived as an active member [26]. Questions posted by an active member might be deemed sincere, thereby making them likely to be answered.
Asking time refers to the hour of day as well as the day of week on which questions are posted in CQAs. In general, CQAs are not necessarily frequented by comparable user-bases throughout a day or a week [28]. If a question is posted at a time when numerous users are online, it might be answered. Hence, it could be interesting to study the extent to which a question’s asking time is related to its answerability.
Two secondary metadata could further determine if answers would be forthcoming in response to questions. These include an asker’s derived role and derived popularity. Derived role indicates the ways in which an asker contributes to the CQA community. In general, CQA users have two primary ways to contribute: asking and answering. The volumes of questions asked and answers posted by CQA users are known to shape their reputation in the community [29]. These could also suggest whether a user predominantly plays the role of an asker or an answerer, or both. If a question is posted by one who has the reputation of submitting frivolous enquiries indiscriminately, the CQA community might ignore it. In contrast, answers could be forthcoming in response to a question posted by someone who is known to contribute questions as well as answers proportionately.
Derived popularity refers to the extent to which an asker is trusted by the CQA community. Given that most CQAs allow users to like and dislike one another’s activities, the ratio of applause and criticism serves as a proxy for an asker’s derived popularity. Such ratios have been used in prior CQA studies as indicators for community acceptance [30]. Questions posted by askers who attract more applause than criticism could be likely to attract answers.
In addition to metadata, content features could also engender or impede responses from questions. This paper assesses the content of questions based on content structure and content quality. The former denotes the structural elements of questions, while the latter measures the extent to which questions are well articulated.
Two important content structure features are level of detail and specificity of questions. Level of detail is a measure of the extent to which questions extensively express the asker’s information needs. Overly sketchy questions might not attract answers. Too lengthy questions could also deter responses [14]. However, Yang et al. [28] suggested that questions that were overly sketchy as well as those that were too lengthy were more likely to attract answers compared with those of moderate length. The lack of consensus calls for further investigation into the relationship between questions’ length and answerability.
Specificity refers to the extent to which questions precisely express the asker’s information needs. Precision of questions is enhanced through the use of specific interrogative words such as ‘what’ and ‘where’, which in turn, could make the enquiries attractive to answers [9].
Three content quality features could be further related to questions’ answerability. These include accuracy, clarity and socio-emotional value. Accuracy refers to the extent to which questions are correct. Questions marred by incorrect spellings or ambiguous logic could fail to attract answers [31, 32].
Clarity refers to the understandability of questions. Clearly articulated questions that express information needs comprehensively could be more likely to attract answers than those that are complex [8, 33].
Socio-emotional value refers to the extent to which questions express emotions and subjectivity. Askers could thank answerers in advance by expressing emotions of appreciation [34]. Moreover, subjective expressions in questions could make answers forthcoming by portraying the social context for the asker’s information needs [28, 32].
The features discussed above are aggregated into the Quest-for-Answer framework as shown in Table 1. Specifically, the framework identifies features related to both metadata and content that could explain why some questions remain unanswered in CQAs. Metadata includes five features, namely, popularity, participation, asking time, derived role and derived popularity. Content includes five features, namely, level of detail, specificity, accuracy, clarity and socio-emotional value.
The Quest-for-Answer framework.
3. Methods
3.1. Data collection
As indicated earlier, data for this paper were drawn from Stack Overflow. The question–answering cycle in Stack Overflow commences when a user asks a question. Specifically, the CQA requires askers to post questions comprising two parts, namely, titles and descriptions. Stack Overflow allows askers to annotate questions with one or more tags such as ‘Java’ or ‘C++’ to indicate the subject matter of the enquiries. After receiving answers to a given question, the corresponding asker can choose a response as satisfactory. The question is then said to be accepted or resolved. Throughout this cycle, users can praise one another’s activity through upvotes, or criticize by casting downvotes. The extent to which users are popularly trusted by the community is indicated by their reputation scores.
For collecting data, SQL queries were executed in the Data Explorer service provided by Stack Exchange 2 in February 2014. In particular, all answered and unanswered questions with the tag ‘Java’ that had been posted in Stack Overflow since 2008 were retrieved. The ‘Java’ tag was chosen because it emerged as the most popular tag 3 as of January 2014. This corroborates with the fact that Java was the most popular programming language 4 among IT professionals as of January 2014. The data collection yielded an initial pool of 21,801 questions (12,940 answered + 8861 unanswered). Of these, simple random sampling was used to identify a total of 3000 distinct questions (1500 answered + 1500 unanswered) for admission into the final dataset. This was necessary to make the dataset size manageable for content analysis (cf. Section 3.3).
For every question, the Data Explorer returned a total of some 36 data fields. Of these, two represented content, namely, question title and question description including code snippets (if any). The remainder included metadata, which could be grouped into four categories. The first entailed identifiers such as question identifiers and asker identifiers. The second comprised timestamps such as those for asking questions and the most recent activity (if any). The third entailed details about questions such as number of tags, scores and view counts. The fourth comprised details about askers such as reputation scores, as well as counts for upvotes and downvotes. Informed by the literature [14, 25, 29, 33], the features in the Quest-for-Answer framework were operationalized by a set of 20 measures that could be assessed from the retrieved data fields.
3.2. Operationalization
With respect to metadata, the popularity of askers was operationalized as their reputation score, as well as volumes of upvotes and downvotes [25]. Conceivably, popular askers could have higher reputation scores with more upvotes but fewer downvotes than less recognized individuals. An asker’s level of participation was operationalized as their duration of membership in years in the community, as well as of volumes of questions asked and answers posted [26]. For duration of membership in years, ceiling values were taken. Asking time was operationalized as the hour of a day and the day of a week when questions were posted [28]. These were computed from the raw timestamps of question posting using built-in functions offered in Microsoft Office Excel. An asker’s derived role was operationalized as the number of answers they had posted per question [29]. A lower ratio would indicate that the asker posted questions in CQAs without attempting to answer others’ queries. On the other hand, a higher ratio would indicate that the asker played an active role in the community not only by posting questions but also by answering. An asker’s derived popularity was operationalized as the number of upvotes garnered per downvote. A lower ratio would indicate lower derived popularity and vice-versa [30].
With respect to content, level of detail was operationalized based on the length of questions in words [14]. Since questions comprise two parts, titles and descriptions, the number of words in both was considered. Specificity of questions was operationalized as the number of tags, the presence of code snippets and the presence of interrogative words such as ‘what’, ‘when’, ‘where’, which’, ‘who’, ‘whom’, ‘whose’, ‘why’ and ‘how’ [9]. Accuracy was operationalized as the extent of linguistic errors in questions (reverse-coded) [32]. Clarity entailed the completeness and the complexity (reverse-coded) of questions [33]. Socio-emotional value was operationalized as the politeness and subjectivity expressed in questions [34]. The operationalized measures for the features in the Quest-for-Answer framework are presented in Table 2.
Operationalized measures for the features in the Quest-for-Answer framework.
3.3. Coding
The operationalized measures for all the metadata, as well as those for level of detail and specificity, could be obtained directly from the dataset. However, the operationalized measures pertaining to accuracy (linguistic errors), clarity (completeness and complexity) and socio-emotional value (politeness and subjectivity) required content analysis [35]. For this purpose, three research associates were recruited as coders. They were graduate students of information systems, and users of Stack Overflow. Moreover, they had adequate knowledge of Java programming.
The coders were asked to code the quality of questions in terms of the five dimensions, namely, linguistic errors, completeness, complexity, politeness and subjectivity, on a scale of 1 (lowest) to 5 (highest). The coding comprised two stages. In the first stage, the coders conferred among themselves to become acquainted with the five dimensions, and coded an initial set of 180 questions (90 answered + 90 unanswered) randomly drawn from the dataset. The mean pair-wise Cohen’s κ among the coders indicated a non-chance level of inter-coder agreement as follows: linguistic errors (κ = 0.744), completeness (κ = 0.752), complexity (κ = 0.786), politeness (κ = 0.751) and subjectivity (κ = 0.792).
In the second stage, the coders separately coded the remaining 2820 questions. Each coder received comparable volumes of answered and unanswered questions. To avoid biases, coders were not apprised of whether a given question was answered or unanswered throughout the duration of the content analysis. Moreover, they were requested not to look up the question’s status on the Internet.
3.4. Data analysis
Hierarchical logistic regression was used for data analysis. The dependent variable (DV) included whether questions were answered or unanswered. It was dummy coded such that 1 indicated answered questions, and 0 denoted unanswered ones. The independent variables (IVs) included the 20 operationalized measures of the features in the Quest-for-Answer framework. Four of these IVs were categorical in nature. These included hour of the day (24 categories) and day of the week (seven categories) of asking questions, as well as the presence or absence of code snippets and interrogative words in questions (two categories for each). For each categorical IV, indicator contrast was used for analysis with the last category as the reference.
The hierarchical logistic regression analysis comprised three models. The first model included three control variables, namely, question’s age (in days), view count and asker’s profile view count because each of these variables could have a positive relationship to the question’s answerability. Moreover, they could be related to some of the operationalized measures such as reputation score, upvotes and downvotes, which were retrieved at the point of data collection and not at the time when a given question was asked. The second model comprised the IVs pertaining to metadata. The third entailed the IVs pertaining to content. Such a hierarchical approach allowed examination of the extent to which metadata and content could independently predict answerability of questions even after accounting for control variables.
Prior to analysis, the logistic regression model was checked for multicollinearity. A variance inflation factor value of <10 for all the IVs confirmed that multicollinearity was not a problem [36]. Thereafter, the performance of each of the three models was probed using Omnibus test, and Hosmer–Lemeshow goodness-of-fit test. The extent to which the IVs in the models could account for the variability in the DV was examined using Cox and Snell, as well as Nagelkerke pseudo-R 2 measures. The classification accuracy was also checked. The relationship between each IV and the DV was examined in terms of odds ratio.
4. Results
Table 3 presents the descriptive statistics of answered questions (N = 1500) and unanswered (N = 1500) questions pertaining to the continuous IVs. The distributions of answered and unanswered questions in the dataset across hour of the day as well as day of the week are depicted in Figures 1 and 2, respectively. A total of 1820 questions (827 answered + 993 unanswered) in the dataset were found to contain code snippets. Of the remaining 1180 questions, 673 were answered and 507 were unanswered. In addition, a total of 669 questions (409 answered + 260 unanswered) contained interrogative words. Of the remaining 2331 questions, 1091 were answered and 1240 were unanswered.
Descriptive statistics (means ± standard deviations, SD) of answered and unanswered questions.

Distribution of answered and unanswered questions in the dataset across the hour of the day.

Distribution of answered and unanswered questions in the dataset across the day of the week.
As evident from Table 3, several IVs had standard deviations greater than means. This in itself is not an anomaly. Several prior studies had documented variables for which standard deviations exceeded means [37–40]. Nonetheless, this hints at possible non-normality, which might interfere with the multivariate analysis of logistic regression. To mitigate this problem, variable transformation approaches such as logarithm transformation or square root transformation are advocated [41–43]. Specifically, square root transformation was applied on nine IVs, namely, reputation score, upvotes, downvotes, membership, questions, answers, answers per question, upvotes per downvote and description length. Logarithm transformation was deliberately avoided as it was found to result in multicollinearity problems.
Table 4 presents the results of the three logistic regression models. For brevity, only IVs with statistically significant odds ratio in the final model are shown. The control variable of questions’ age was positively related to the DV (exp(β) = 1.002, p < 0.001), indicating that questions that had been posted earlier were more likely to attract answers compared with those submitted recently.
Results of the hierarchical logistic regression analysis.
p < 0.001; ** p < 0.01; * p < 0.05; a Square root transformed variables.
With respect to metadata, volume of downvotes was positively related to questions’ likelihood to attract answers (exp(β) = 1.124, p < 0.05). Interestingly, answers were generally forthcoming in response to questions submitted by askers with numerous downvotes. Duration of the asker’s membership was negatively related to the DV (exp(β) = 0.524, p < 0.001). This indicates that questions posted by askers who were relatively new in the community were attractive to answers. Volume of questions asked by askers was, however, positively related to the DV (exp(β) = 1.102, p < 0.001). This shows that questions posted by askers who had submitted large numbers of queries were likely to be answered. In terms of asking time, questions posted during the following hours of the day were likely to attract answers: hour 20 (exp(β) = 2.229, p < 0.05), hour 21 (exp(β) = 2.063, p < 0.05) and hour 22 (exp(β) = 1.790, p < 0.05). Moreover, questions posted on the following days of the week deterred responses: Monday (exp(β) = 0.459, p < 0.001), Tuesday (exp(β) = 0.619, p < 0.05), Wednesday (exp(β) = 0.599, p < 0.01) and Thursday (exp(β) = 0.551, p < 0.01).
With respect to content, title length of questions was negatively related to the question’s answerability (exp(β) = 0.950, p < 0.001). Likewise, the description length of questions was negatively related to the DV (exp(β) = 0.919, p < 0.001). These indicate that questions with succinct titles and descriptions were more likely to attract answers compared with those that were verbose. The number of tags provided in questions was also negatively related to the DV (exp(β) = 0.703, p < 0.001). The fewer the tags associated with a question, the more likely it was to receive an answer. The absence of code snippets was positively related to the DV (exp(β) = 1.641, p < 0.001). In other words, questions without code snippets were receptive to answers. The absence of interrogative words such as ‘what’ and ‘where’ in questions was negatively related to the DV (exp(β) = 0.642, p < 0.001). Stated otherwise, questions without interrogative words deterred responses. Completeness of questions was positively related to the DV (exp(β) = 2.045, p < 0.001). This indicates that questions that expressed information needs comprehensively were more likely to attract answers compared with those that were incomplete. Complexity of questions was negatively related to the DV (exp(β) = 0.363, p < 0.001), pointing out that less complex questions were more likely to attract answers. Politeness in questions was also negatively related to the DV (exp(β) = 0.901, p < 0.05). In other words, less polite questions were ironically found attractive to answerers.
The final logistic regression model corresponding to the Quest-for-Answer framework was statistically significant based on the Omnibus test in predicting the answerability of questions (χ2 = 1347.32, d.f. = 50, p < 0.001). A non-significant Hosmer–Lemeshow goodness-of-fit test (χ2 = 6.24, d.f. = 8, p > 0.05) further confirmed good model fitness. Based on Cox and Snell, as well as Nagelkerke pseudo-R2 measures, the model accounted for about 36–48% of variability in the DV. A classification accuracy of 77.50% was recorded to distinguish between answered and unanswered questions.
5. Discussion
Three main findings were gleaned from the results. First, questions posted by askers with numerous downvotes and short duration of membership were found to be likely to be answered. Additionally, questions asked by those who had submitted large volumes of queries were more attractive to answers compared with those posted by askers who had submitted few queries. The community perhaps welcomes questions from newly registered users as well as those with several downvotes out of altruism. In addition, it seems to favour questions from those who are active in posting queries. Interestingly, it was found that an asker’s reputation score, volume of upvotes and downvotes attracted, and derived role as well as derived popularity were not significantly related to answerability. The community appears to accept questions from all users – novices and experts alike – without engaging in online hegemony. This contradicts Yang et al. [28], who found that questions posted by expert askers were more likely to be answered in Yahoo! Answers compared with those asked by newcomers.
Second, the ways in which questions were phrased were significantly related to their answerability. Questions with short titles, short descriptions, and few tags attracted answers. This is consistent with Saha et al. [14], who found that answered questions were shorter on average compared with unanswered ones in Stack Overflow. Interestingly, Yang et al. [28] found that sketchy as well as lengthy questions were more likely to be answered in Yahoo! Answers compared with those of moderate length. However, lengthy questions in Stack Overflow failed to attract answers. In another related study that analysed some 385 questions from Stack Overflow, questions with code snippets were likely to attract answers [44]. In contrast, using a dataset of 3000 questions, this paper found that such questions were less likely to be answered compared with those without code snippets. Although Stack Overflow is supposed to deal with programming issues, the community appears disinclined to respond to questions with code snippets. Perhaps, such questions are perceived as being lazy [45]. Moreover, evaluating code snippets, especially lengthy ones, and pointing out syntactical or logical errors could be daunting. If a suggested answer fails to work, the unsatisfied asker might seek further clarification from the answerer, leading to rounds of back-and-forth exchanges. This could be another possible reason why questions with code snippets deter responses.
Moreover, the presence of interrogative words in questions, as well as their completeness, was related to questions’ answerability [8, 9]. Conceivably, complex questions were unlikely to be answered. Consistent with prior studies on Yahoo! Answers [28], questions with an overly polite tone were found to be unlikely to be answered. Askers could use polite words in questions to thank potential answerers in advance. However, the use of such words perhaps backfires by serving as a distraction. Being a technical forum, the Stack Overflow community might not appreciate formalities in questions. Interestingly, the presence of linguistic errors in questions did not necessarily deter responses. This is inconsistent with Kitzie et al. [32], who suggested that questions plagued with linguistic errors in Yahoo! Answers would fail to attract answers. Stack Overflow seems to be patronized by users who are not too particular about the linguistic accuracy of questions as long as the queries are technically understandable.
Third, the time of posting questions in CQAs was significantly related to questions’ likelihood of attracting answers. Questions posted between 8 and 11 p.m. were generally most receptive to answers. Consistent with prior studies on Stack Overflow [8], the volumes of unanswered questions were generally lower at week-ends compared with week days. Interestingly, the majority of questions in Stack Overflow appeared to be posted from 6 a.m. onwards until 7 p.m. (Figure 1), and from Monday to Friday (Figure 2). Such an asking time strikingly overlaps with business hours to a large extent. Given that Stack Overflow is a CQA devoted to programming and software engineering, it is perhaps widely used by IT professionals as an information-seeking platform for their office work. However, the findings suggest that askers should avoid asking questions during business hours. Instead, it might be better to submit questions between 8 and 11 p.m. on Friday, Saturday and Sunday to maximize chances of receiving answers. The findings gleaned by validating the Quest-for-Answer framework using data drawn from Stack Overflow are summarized in Table 5.
Findings in Stack Overflow pertaining to the Quest-for-Answer framework.
6. Conclusion
Having examined the answerability of questions in Stack Overflow, the theoretical contribution of the paper is three-fold. First, it enriches the CQA literature by developing the Quest-for-Answer framework. Building on prior studies [7–9, 14], the framework identifies 10 features related to metadata and content that could be associated with questions’ answerability. With respect to metadata, the asker’s popularity, participation and asking time of questions were significant in predicting if answers would be forthcoming. With respect to the content, the level of detail, specificity, clarity and socio-emotional value of questions were significant in enhancing or impeding responses. The findings help explain why some questions in CQAs draw answers while others remain unanswered.
Second, this paper enriches the current body of works on questions’ answerability by revealing both consistent and inconsistent findings. Specifically, vague and long questions deter answers [8, 14] while those with specific interrogative words [9] and that are not overly polite [28] appear to be answerable. However, questions posted in Stack Overflow could still attract answers even if they were marred by linguistic errors. This is contrary to a prior study done in Yahoo! Answers [32]. It could well be that users in Stack Overflow focus more on the technical aspects than linguistic deficiencies of questions. Also, in another study conducted using a small dataset of 385 questions in Stack Overflow, question answerability correlated with the use of code snippets [44]. Yet in this paper, which draws data from 3000 questions, the opposite was found to be true.
Third, this paper represents one of the earliest attempts to categorize metadata into primary and secondary based on their level of conspicuousness. Specifically, those retrieved directly from the interface were deemed as primary, while those derivable from primary metadata were referred to as secondary. Even though primary metadata features such as the volume of downvotes received by askers were significantly related to questions’ answerability, neither of the two secondary metadata features, namely, derived role and derived popularity, were associated with the dependent variable. The non-significant relationship could be attributed to the answerer’s patterns of observational learning. Perhaps primary metadata – that are generally more conspicuous than secondary metadata – play a greater role in eliciting responses from answerers.
On the practical front, this paper offers insights into how and when questions should be posted in Stack Overflow for answers to be forthcoming. It suggests that askers should ask clear and succinct questions without being overly polite. They should use specific interrogative words, but abstain from using code snippets. Moreover, they would do well to post questions between 8 and 11 p.m. on Friday, Saturday and Sunday. Moderators of Stack Overflow could also use the findings to educate askers on ways to ask questions. This is especially important for Stack Overflow given that it is widely frequented by employees of IT companies. This paper therefore has community management implications for effective knowledge sharing among IT professionals.
Three limitations of this paper need to be addressed by future research. First, it investigated the Quest-for-Answer framework only in Stack Overflow. Future research could validate the framework by drawing data from multiple platforms. The operationalized measures of metadata could be tweaked to suit other CQAs. For example, since CQAs such as Yahoo! Answers lack the upvote–downvote functionality for askers, it could be replaced by the proportion of best answers and non-best answers. Validating the framework is especially necessary because over 50% of the variability in questions’ answerability remained unexplained. However, triangulation with large datasets would require considerable time and effort given the content analysis involved. Computational linguists might hence explore the possibility of developing automated tools to measure features such as socio-emotional value.
Second, this paper dichotomized questions as either answered or unanswered, ignoring whether they were marked as accepted by the Stack Overflow community. Since answered queries are not always accepted, future studies might investigate nuances among accepted, answered and unanswered questions to yield richer findings.
Third, the cross-sectional nature of the dataset prevents any inference of causality [46]. For example, even though questions posted by askers with several downvotes were found likely to be answered, it could not be verified whether the former caused the latter. Future research could rely on experimental designs to examine causality in terms of the extent to which the metadata and the content of questions motivate answering intentions. It could be interesting to study the ways in which conspicuous features such as the asker’s reputation score differ from relatively inconspicuous features such as questions’ description length in shaping answerers’ knowledge sharing intentions in CQAs. Answerers’ individual differences could also be taken into account to offer greater nuances.
Footnotes
Acknowledgements
The authors would like to acknowledge Gary Kuen for his help in preparing the initial draft of this paper.
Funding
This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.
