Abstract
During the methods crisis in psychology and other sciences, much discussion developed online in forums such as blogs and other social media. Hence, this increasingly popular channel of scientific discussion itself needs to be explored to inform current controversies, record the historical moment, improve methods communication, and address equity issues. Who posts what about whom, and with what effect? Does a particular generation or gender contribute more than another? Do blogs focus narrowly on methods, or do they cover a range of issues? How do they discuss individual researchers, and how do readers respond? What are some impacts? Web-scraping and text-analysis techniques provide a snapshot characterizing 41 current research-methods blogs in psychology. Bloggers mostly represented psychology’s traditional leaderships’ demographic categories: primarily male, mid- to late career, associated with American institutions, White, and with established citation counts. As methods blogs, their posts mainly concern statistics, replication (particularly statistical power), and research findings. The few posts that mentioned individual researchers substantially focused on replication issues; they received more views, social-media impact, comments, and citations. Male individual researchers were mentioned much more often than female researchers. Further data can inform perspectives about these new channels of scientific communication, with the shared aim of improving scientific practices.
Science is renewed by periodic disruptions, and the current methods crisis seems to be no exception to this abrupt form of self-correction. In addition to gradual types of feedback—peer review, replication and extension, competitive model testing—more urgent paradigm shifts also have a role in self-correction. For example, in the late 1970s social psychology underwent a scientific crisis arguably worsened by incomplete methods reporting and inadequate operationalization (Greenwald, 1976). Methodological reforms suggested reporting effect sizes and conducting power analyses (Rosenthal, 1979). Critics also identified the crisis as having theoretical roots (Deutsch, 1976; Smith, 1978).
Psychology’s current crisis revolves more pointedly around improving methods. The field is debating false-positive results, neglect of null findings, confirmation bias in data analysis, and incomplete reporting, among others (Nelson, Simmons, & Simonsohn, 2018). Questionable research practices, unsophisticated power analyses, and neglect or misuse of meta-analyses add to the issues discussed (Shrout & Rodgers, 2018). Proposed remedies often include open-science norms (Munafò et al., 2017) such as preregistering hypotheses and analyses, more complete reporting, posting data and code, and more open peer review.
The current crisis in psychology also stands out in a different way. Much of the discussion surrounding these topics seems to have developed online, as social media have become popular forums for research methods and metascience. For example, on Facebook, two general psychology-methods discussion groups were created in 2015 (perhaps the first of their kind), the largest of which has more than 12,000 members. Many of the authors at the forefront of the methods discussion maintain blogs (e.g., Nelson et al., 2018). One blogger’s post reportedly had more views than the journal article to which it was a response (Lakens, 2017). However, beyond these informal observations, few sources report usage trends in scientific social media. Thus, here we provide an initial exploration of the author demographics, properties, content, and impact of one of these online forums: research-methods blogs.
Blogs have the potential to reach large audiences, promote discussion, and shape scientific norms faster than traditional channels. However, some authors have voiced concerns about the current online discourse being often unmoderated, arguing, for example, that online criticism of an individual researcher across social media can become excessive (Fiske, 2016). Others have spoken out against the “(relatively infrequent) occasions when disagreements erupt into personal attacks and harassment” (Klaiman, 2017, para. 8) both online and offline. Some have argued that these behaviors could lead to intolerance for new research ideas (Sabeti, 2018), whereas others have speculated that they relate to researchers’ feelings of disengagement with the field (West & Skitka, 2017). Other much-discussed issues include blogger diversity (e.g., race, gender): Some point out blogs’ ability to provide everyone with a voice (e.g., Lakens, 2017), whereas others discuss demographic discrepancies in who participates (e.g., West & Skitka, 2017; see next sections). Of course, most of these concerns apply to more traditional scientific-communication venues, such as peer reviews and conferences. However, social media provide an accessible format for studying these issues in a quantifiable manner. Nonetheless, we caution throughout against drawing contrasting inferences between scientific forums from our results alone. This article is exploratory.
Why Blogs?
We focus on blogs among other social media for several reasons, all supported by data (see Results). First, blogs have appeared as scientific forums for much longer than other social-media outlets such as Facebook, providing data over several years and making them a potentially more established medium. Second, posts are longer than other social media, allowing for a more in-depth analysis. Finally, several other minor concerns favored using blogs over other data-source options; for example, blogs allow the possibility of collecting variables such as post citations.
Documenting the features of psychology-methods blogs matters to our current science because it constructively describes their approach to improving psychological methods, and it can inform their continuing contributions to methods development. In addition, recording a slice of current history, such as documenting some of blogs’ basic features, will facilitate later comparisons to current trends. Previously discussed ongoing controversies about blogs may also benefit from evidence such as that provided herein. Finally, evidence can inform some equity concerns about who blogs, what is blogged, about whom, and with what effect. In each case, the current exploration is potentially useful because a priori the results are not obvious; indeed, the authors disagreed about what we might find, and this is new territory, so we opted for an exploratory approach.
Who Blogs?
One perspective could be that, because blogs afford democratic open access to voice, we might expect better demographic representation; a contrasting view is that traditional power structures would reproduce in this new medium. For gender, male bloggers could predominate, for example, because of traditional gender patterns—in emergent leaders (e.g., Eagly & Karau, 1991) or assertiveness (e.g., Feingold, 1994). Another possibility is that women blog more than men do because blogging might fit gendered leadership roles (e.g., democratic, participatory, communal leadership; Eagly & Karau, 1991) or because of the increased numbers of female academics (e.g., in the United States; Burrelli, 2008; National Science Foundation, 2015). Likewise, blogger career stage would be hard to predict. As a newer digital medium, younger bloggers might predominate. Alternatively, more established researchers might enjoy a larger reach and thus blog more often. Given blogs’ role in scientific communication, a lack of diverse voices is detrimental for equity and perhaps even minorities’ attitudes toward the medium itself, as well as its future adoption patterns.
Blogs What?
Regarding what blogs say, the controversies described earlier represent two contrasting points of view: One is that the blogs contain entirely technical topics (e.g., statistics); the other is that they mainly discuss specific individuals’ research. Both perspectives are unrealistic prototypes, but they illustrate the endpoints of a potential distribution of topics. A quantitative examination on blogs’ contents is lacking. Content analysis describes the historical moment and how blogs are approaching methods improvement (e.g., determining which topics, such as meta-analyses or statistical power, to emphasize and cover).
Blogs About Whom?
Coverage of researchers as the focus of criticism could inform controversies and equity concerns. For example, women score lower than men on some metrics of influence (Diener, Oishi, & Park, 2014). Thus, men could be mentioned more often to the extent that more publications and impact are a factor in social-media postpublication review. Alternatively, this same power imbalance, as well as gender bias, could lead to higher rates of mentioning female researchers (e.g., see Elsesser, 2018). Women might also be salient because they are more cited per article than men (in at least one flagship journal; Cikara, Rudman, & Fiske, 2012).
Blogs With What Effects?
Finally, several possibilities emerge for the impact of blogging. For example, gender and career-stage differences could reflect the previously noted patterns in more traditional outlets (e.g., journal publications). Alternatively, blogs may provide a democratized space that allows different demographics to have a more equal reach (Lakens, 2017).
Overall, as noted, analyzing blogs could inform how to improve methods, describe current historical trends, bring data to controversies, and address equity concerns. These are among the motivations for exploring the who, what, whom, and effects of methods blogs.
Precedents for Who Posts What About Whom, and With What Effects?
A few scattered studies are beginning to describe social media’s roles in psychology. For example, in a sample of 327 speakers at the 2016 Psychonomic Society conference, 19% had Twitter accounts and 6% had blogs (Weinstein & Sumeracki, 2017). These findings suggest a passive use of social media: Few are leading the conversation, but many more may be following it. In addition, women in the sample tended to have Twitter accounts more often than men. As another example, a short demographic analysis of bloggers (described below) appeared, but it did not further explore the characteristics of their blogs (PsychBrief, 2017).
Others have described participants’ self-reports about platforms other than blogs. For example, a survey of members of the Society for Personality and Social Psychology listserv found that men reported using Facebook groups more often than women do; the most common reported uses were reading about and sharing new research findings and methods, as well as learning academic gossip (West & Skitka, 2017). In addition, a nontrivial number of respondents reported feeling personally attacked in the groups (or Twitter) at least once (54% said never; 12% said sometimes).
Overview
Given the lack of data on the characteristics of psychology’s social media (especially blogs), the discussions that have surrounded these forums, and their apparently increasing role in setting scientific norms, the current article aims to provide additional insights about research-methods blogs in psychology. As new forms of communication grow in popularity, understanding patterns of adoption, content, and impact might provide insights into how to maximize their positive influence toward improving our science. We explore who is posting, what they are posting, who they are mentioning, and how people are engaging with them. We use web-scraping and natural-language processing to describe some properties of a sample of current blogs, including their general characteristics (e.g., frequency of posts), bloggers (e.g., demographics), content (e.g., topics covered), and impact (e.g., post citations).
Reported results are strictly exploratory (p values are not reported in the main text but are included in Table S6 in the Supplemental Material available online; uncertainty measures are provided not for inferential purposes but only as an exhortation that estimates not be interpreted as overly precise). All data and code are available at https://osf.io/qf8zv/.
Method
Materials and procedures
This section describes how we obtained the blog sample as well as how we operationalized who is blogging (blogger characteristics), what they are posting (topics), about whom (researcher mentions), and with what effects (engagement and citations).
Blog sample
A list of prominent and active research-methods blogs in psychology and related disciplines appears in PsychBrief’s blog feed (PsychBrief, n.d.), which included 43 blogs at the time of data collection (by April 11, 2017). However, only 41 blogs (11,539 posts) were usable because one of the links listed was a journal, and another’s server was unstable, so we could not access it reliably. The blog feed did cover a sizable number of research-methods blogs in the area at the time of data collection, and the source was independent from the authors’ judgment. We also sought suggestions from some experts in the field, all of which were included in the feed. Nonetheless, we are aware of at least 5 blogs at the time of collection that could meet the criteria of being research-methods blogs in psychology that were not included in the feed at collection time, and other readers would surely be aware of more. As with any convenience sampling, we note that this sample is incomplete and biased, including toward salience or other preferences the feed author might have (although note that the author of the feed had actively attempted to minimize gender, race, and career-stage bias by the time of data collection; PsychBrief, 2017). See the Supplemental Material for a description of web scraping and preprocessing.
Some computed variables describe blog characteristics, including the number of posts, date of blog creation, and posting rate (number of posts divided by weeks since date of creation). For analyses, these are blog-level variables (i.e., they do not vary within blogs). Variables in the following sections may vary between posts of the same blog (i.e., they are at the post level).
Blogger characteristics: who?
There were 70 total identifiable bloggers across the 41 blogs, 5 of which were multiauthored. Bloggers’ demographic data (gender, nation of institutional affiliation, and career stage) were obtained from various sources. We used institutional websites such as the blogger’s or university website or reputable journalistic sources to code gender based on pronouns. Thus, our coding may not match some of the bloggers’ identities. Missing values remained, not only for secondary authors in multiauthored blogs but also for four primary authors (2 of whom were anonymous bloggers).
Early-career researcher (ECR) was defined as someone who has been hired to perform research within the past 5 years (Research Excellence Framework, 2014); we included graduate students and postdocs in this category. Because of the time-changing nature of this coding, we also categorized as ECRs bloggers who met the criteria for most of their posts. ECR determinations were based on the bloggers’ websites and curriculum vitae. In addition, to provide some insight into the geographic distribution of blogs, we coded the country of institutional affiliation of the bloggers from sources such as Google Scholar.
For analyses, gender was further recoded into male or female and career stage into ECR or not-ECR. Guest authors are uncommon for most blogs and were not included in the analyses; specific authors were identified only when the information was available as metadata. For multivariate analyses we focus on gender and career stage as the most relevant data to current controversies for which we considered to have reliable coding for most bloggers.
For models at the blog level, we recoded demographics such that multiauthored blogs with gender-congruent data (e.g., two female bloggers) were coded accordingly (three blogs); otherwise, if the blog had a lead author (based on the author page, blog metadata, and number of posts), his or her information was used (two blogs).
Topics: what?
After preprocessing the text, we used latent Dirichlet allocation (LDA; Blei, Ng, & Jordan, 2003) to model the topics covered by the posts. LDA is a probabilistic generative model that attempts to discover latent semantics embedded in document collections. LDA assumes that topics can be represented as word distributions, such that each topic is associated to each word from the posts to different extents. For example, one of the topics suggested by the model (which we later labeled to be about teaching) had high probabilities for words such as “student” and “class” but low probability for words unrelated to teaching, such as “Bayesian” or “Republican.”
LDA also assumes that documents (i.e., posts) can be represented as distributions of topics, such that each post can cover a variety of topics to different extents. For example, LDA suggested that a post titled “Bayesian Perspectives on Publication Bias” had high proportions of words assigned to two topics, which we later labeled replication (52%) and statistics (32%), whereas more than 75% of the words in a post titled “Why Brain Scanners Make Your Head Spin” were assigned to a topic we labeled neuroscience. Thus, through LDA, we obtained the number and proportion of words within each post that came from the different topics discovered (for more information, see the Supplemental Material; for a comprehensive review, see also Griffiths, Steyvers, & Tenenbaum, 2007).
Tests to decide the number of topics (see Supplemental Material) suggested 22 topics. Next, two of the authors independently labeled the topics provided by the 22-topic LDA model. To arrive at a label, the authors looked at several indicators, including (a) the top words within each topic, (b) the distinctive words within each topic (e.g., “Stapel” 1 occurred exclusively in the topic labeled fraud), and (c) example posts with high topic proportions. Thus, labeling took several indicators into account. Nonetheless, topic labels may not fully represent some topics, as multiple labels may sensibly describe topics: Labeling prioritized issues of a priori interest (such as replication and fraud).
Topics interpreted as highly overlapping were combined, resulting in a grouping of 12 final topics, as shown in Table 1 (and detailed further in Table S1 in the Supplemental Material). For example, for each post, we added the number of words allocated to topics such as Bayesian statistics, frequentist statistics, regression, visualization, and other related topics, resulting in an overarching statistics topic word count.
Topic Proportions
Note: The topics and means and standard deviations of their share of words per post are shown at the blog level and were averaged across each blog’s posts.
We conducted further analyses on four topics of interest: statistics (which included top words such as “model,” “data,” and “probability”), research findings (e.g., “difference,” “control,” and “claim”), replication (e.g., “replication,” “effect,” and “power,” words related to issues in the replicability crisis), and fraud (e.g., “fraud,” “share,” and “report”; this topic also distinctively included words about retraction, misconduct, and ethics). Statistics and replication were the most common topics, but we included research findings and fraud for their potential relevance to other variables (e.g., individual researcher mentions) and to methods improvement.
Researcher mentions: whom?
To provide an initial exploration of bloggers’ discussions of individual researchers, we identified and tagged people’s names in the data by using an automated method (named entity recognition, essentially a mechanism to locate, classify, and count proper nouns; see Supplemental Materials). However, because these names could refer to anyone, from researchers to statistics named after people (e.g., Cohen’s d), we also obtained a list of salient names from judges who have weighed in publicly on the replication crisis, but not as bloggers. All had spontaneously contacted the senior author, implicitly indicating their interest and availability to discuss this topic (in response to Fiske, 2016). Admittedly, judge selection was ad hoc, not blind, and nonrandom. Automating the identification of researcher last names was not possible (isolated from statistics named after people, statisticians, celebrities, first names, and fragments), and we did not want to trust only our own judgment.
The judges were asked to nominate up to 12 psychological researchers who have been under critical focus, allowing us to obtain a more specific sample of names to compare with general discussions that involve names. Of 16 judges contacted, 13 provided usable responses; 9 judges were male. The judges generated a list of 38 researchers (20 were male). Nominated researchers have been the focus for multiple reasons, and thus this list simply represents salient names that have received postpublication critiques; we make no assessment about the appropriateness of any individual criticism. But these analyses could inform further discussions about postpublication review and provide some correlates of individual mentions.
For analyses on relationships between variables, we matched the names nominated by judges to the named-entity-recognition output. Although this is less reliable than matching to all data, it made an appropriate comparison to nonnominated names (also identified through named entity recognition): When using name mentions as predictors, we simultaneously enter variables on whether a post mentioned a judge-nominated researcher or not, and whether a post mentioned a nonnominated researcher or not. However, for accuracy, we provide descriptive statistics by matching the nominated researchers’ last names to the entire corpus of data. Note that some name mentions could possibly refer to another person who shares a name with a nominated researcher (some of these were deleted if identified).
Engagement and citations: what effects?
As measures of active engagement, we created variables indicating the number of comments and commenters (i.e., number of comments posted by distinct authors) for each post. In addition, we collected information on posts’ social-media impact (including Facebook reactions such as likes, comments, and shares, as well as the equivalent of likes on Pinterest and StumbleUpon; these variables were aggregated; data from Twitter were not available). As a measure of passive engagement, we requested post views from bloggers, which indicate how many times each post had been accessed. However, as a result of nonresponses, denials to share, or technical issues, we obtained post views for only 59% of the blogs. Thus, an alternative interpretation for all comparisons with post views is that differences are due to missingness. Furthermore, views may be noisy (e.g., because they may indicate either total or unique views). Finally, both post views and social-media impact variables were collected between May and June 2018, after most other time-sensitive variables, so care is needed when comparing with other metrics.
As a measure of more long-term impact, we collected data on post citations. We obtained citations by searching the minimal URL for accessing the blog (e.g., by stripping “https://”) in Google Scholar. Citations were recorded from September 16 to September 21, 2017. We also collected the number of blogger academic publications and citations in Google Scholar for comparison.
Other variables used in analyses are the date and length of the post and the length of the comments to the post. More information about additional variables (e.g., date and sentiment analyses) is offered in the Supplemental Material.
Data analysis
Data were analyzed using the R software environment (Version 3.5.0; R Development Core Team, 2017). We first present univariate descriptions of the data and then present some relationships between variables. Model details appear in the Supplemental Material. 2 The Open Science Framework online repository (https://osf.io/qf8zv/) provides figures for most results to display the raw data beyond our models; we also provide the data and code for further exploration.
More than 60% of our data came from a single blog (for information on blog samples, see Table S2 in the Supplemental Material), an influence that we attempted to balance by conducting analyses at the post level, accounting for the multilevel structure (i.e., blog as a random factor), or at the blog level (i.e., with each blog being an observation). Thus, each blog is relatively equally weighted, and most results describe properties of the average blog, not the average post. For simplicity, we exclude interactions. A maximal random structure (Barr, Levy, Scheepers, & Tily, 2013) was used when models converged.
In general, models allow for outlier influence, but in some cases, we also explored (mostly graphically; see https://osf.io/qf8zv/) how the data look without these outliers. Most results are directionally robust to outlier exclusions except when indicated.
As mentioned, all analyses at the post level used mixed models to account for nonindependence (i.e., posts are nested within blogs). Continuous predictors were standardized (rather than just centered) for convergence and interpretability. Continuous variables with post-level variation used as predictors (mainly posts’ topic proportions) were transformed into two variables. First, a post-level variable standardized within each blog (e.g., each post’s proportion of statistics words, standardized using only the data for its corresponding blog). 3 Second, a standardized blog-level variable (e.g., each blog’s mean proportion of statistics words across its posts, standardized using all data). This partition allowed us to explore both between-posts and between-blogs variation, controlling for each other (see, e.g., Enders & Tofighi, 2007).
For analyzing date, with blog properties as outcomes, we collapsed across blogs to analyze not only the number of posts by date but also the number of blogs and the posting rate as a function of the number of blogs. In these analyses, the date predictor was coded with months as units, with higher numbers indicating more recent months (for further date analyses, see Table S7 in the Supplemental Material).
Gender and career stage were always entered simultaneously in models with post-level outcomes. Models estimated to control for a date coefficient for engagement and citation outcomes are reported in Table S8 in the Supplemental Material along with other supplementary results. When these models’ interpretations differ from the ones presented here (which do not control for date), we report this difference.
Results
Our exploration addresses several questions: Who? (Does a particular generation or gender contribute most?) What? (Do blogs focus narrowly on methods, or do they cover a range of issues?) Whom? (How do they discuss individual researchers?) Effects? (How do readers respond? What are some impacts?)
Bloggers: who posts?
Most bloggers were male (73% of n = 52). For a comparison base rate, women now equal to or exceed a majority among American doctoral psychologists (Burrelli, 2008; National Science Foundation, 2015). However, not all bloggers are academics, psychologists, or American—and these figures include clinical training—so perhaps not the most appropriate base rates for research methods.
Despite blogs being part of the new media, which might suggest younger authors, the bloggers are not early-career researchers (ECRs; 67% are non ECRs of n = 54). Moreover, only 26% of male bloggers were ECRs, whereas 50% of female bloggers were ECRs. This might forecast a generational shift in blogger gender ratios. Bloggers were mostly associated with American universities (59% of n = 56; 13% were from United Kingdom universities, and 11% were from Dutch universities). A previous report on PsychBrief’s feed coded most of the lead bloggers as White (93%; PsychBrief, 2017). Two of the bloggers were entirely anonymous. In short, most bloggers were established, male, and associated with American institutions.
Blog properties: who posts how often, and for how long?
For current context and the historical record, documenting blogs’ basic properties is useful (for full details on descriptive statistics, see Table S3). Most blogs had a relatively small number of posts in the 12.5 years since the first blog began (Mdn = 43), although one prolific blog had 7,211 posts at the time of collection (62.5% of the 11,539 posts analyzed here). Blogs tended to be approximately 3 to 4 years old. Blogs by female authors were newer (Mdn = 126.5 weeks, M = 168.64 weeks, SD = 124.78) than male-led blogs (Mdn = 165.29 weeks, M = 205.74 weeks, SD = 144.66).
The posting rate (i.e., number of posts divided by weeks since creation) was relatively low for most blogs, with a median of a little over 1 post per month (but with outliers averaging up to 11.1 posts per week). Male-led blogs had a higher mean post rate (M = 0.74 posts per week, SD = 2.08) than female-led blogs (M = 0.64 posts per week, SD = 0.66). However, for medians, because of one male-led outlier, female-led blogs posted more per week (Mdn = .52 posts per week) than male-led blogs (Mdn = .28 posts per week). More established (non-ECR) bloggers posted at a higher rate (M = 0.95 posts per week, SD = 2.26, Mdn = 0.43 posts per week) than ECRs (M = 0.3 posts per week, SD = 0.21, Mdn = 0.22 posts per week).
Time trends indicate that both the overall posting rate, rate ratio (RR) 4 = 1.079 (95% CI = [1.05, 1.11]), and the blog creation rate, RR = 1.33 (95% CI = [0.97, 1.84]), have increased over the months included. However, the rate of posts per blog decreased over time (RR = 0.453, 95% CI = [0.43, 0.48]).
The percentage of new blogs led by women per month also grew over the months (RR = 1.46, 95% CI = [0.77, 3.02]). Thus, overall the trends indicate more robust participation by women. In sum, most blogs were 3 to 4 years old and posting about monthly, with increases in both new blogs and posts per month. However, the posting rate per blog decreased over time.
Individual researcher mentions: blogging about whom?
One controversy concerns the rate of posts that mention individual researchers. Of the 38 individual researchers nominated by judges as frequent targets of critical focus, 11% of the analyzed blogs mentioned at least one such researcher. Six blogs mentioned none of the nominated researchers, whereas one blog mentioned at least one such researcher in 34% of its posts. The percentage of posts mentioning judge-nominated researchers was higher than the proportion of comment sections that did. The most common nominated researcher mentioned differed between blogs, so we collapsed across blogs to look at individual mention patterns. This approach indicated that the most common nominated researcher (even when accounting for date of first mention) was Daryl Bem, who published an article on precognition (Bem, 2011) that garnered widespread attention (e.g., Engber, 2017). Bem was mentioned in 154 posts and 130 comment sections. Generally, looked at individually (rather than summing across all names), nominated researchers appeared in a median of 11 posts and 10.5 comment sections (for more details, see Tables S4 and S5 in the Supplemental Material).
Part of the controversy also touches on power dynamics: With bloggers being mostly male, are female researchers mentioned most? Female researchers (M = 3.00, SD = 3.38) and male researchers (M = 2.95, SD = 2.39) were mentioned by a similar number of judges on average. However, on average, nominated male researchers were mentioned in more than twice as many posts (M = 28.65, SD = 39.23, Mdn = 17) as female researchers (M = 10.61, SD = 11.16, Mdn = 5). For comments, the difference between male mentions (M = 24.7 posts, SD = 34.49, Mdn = 13 posts) and female mentions (M = 16.27 posts, SD = 19.33, Mdn = 10 posts) was notable but smaller.
The pattern appears stable over time. Male first mentions are on average 28.9 days more recent than female first mentions. As a rate, differences between male- and female-nominated researchers remain, with more mentions per week for men (M = 0.12, SD = 0.13, Mdn = .07) than women (M = 0.06, SD = 0.04, Mdn = .05).
Men’s posts mentioned nominated researchers more often than did women’s posts (OR = 1.49, 95% CI = [0.89, 2.48]). Men also mentioned nonnominated named entities such as statisticians and celebrities (OR = 2.42, 95% CI = [1.32, 4.45]) more often than women did. Non-ECR versus ECR blogs showed slighter differences—nominated: OR = 1.14, 95% CI = [0.65, 2.01]; nonnominated: OR = 1.49, 95% CI = [0.81, 2.74].
Overall, judge-nominated names appeared rarely. Equal numbers of men and women were nominated, but nominated men were mentioned in posts more often. Male bloggers mentioned names more often than female bloggers.
Topics: posts what?
Regarding blogs’ focus on improving methods, topic analysis suggests that blogs’ average posts do include a large proportion of words related to statistics, replication, research findings, and science communication more broadly (see Table 1). In terms of disciplinary content, none was a particular focus: Blogs covered clinical, neuroscience, and social-science topics to similar degrees. Some of these broad topics can subdivide with differing levels of specificity (down to the 22 topics provided by the LDA model), revealing, for example, that discussions of statistics often revolved around Bayesian statistics. In addition, the distribution of terms for each topic (see Table S1 in the Supplemental Material) suggests that statistical power is the most discussed issue within replication-related posts, whereas for the fraud topic, salient words refer to data sharing and included one nominated researcher’s name (Stapel). For results see Table 2.
Topic Analyses
Note: Outcomes are the proportions of words per post referring to each topic (more specifically, the number of topic words per post with the log of the number of words in the post as an offset). Sixty-nine posts were excluded from these analyses because the post length was zero. Gender and career-stage predictors control for each other. Baseline predictors are women, early-career researchers, and name not mentioned.
Which bloggers?
Men talked more about replication, fraud, and research findings than did women. 5 ECRs talked more about statistics and research findings than did non-ECRs.
Mentioning individual researchers?
Posts mentioning names, both nominated and nonnominated (e.g., statisticians’ names), were compared with those mentioning no names. The posts mentioning names had more talk about replication and research findings. Posts mentioning nominated researchers had a smaller proportion of statistics words, whereas those mentioning nonnominated names had a slightly larger proportion. Posts mentioning nominated researchers had more fraud-related words, whereas posts mentioning nonnominated names had fewer fraud-related words.
In sum, the replication and research-findings topics were more common in posts by men or those that mentioned nominated researchers. Discussions about fraud were more common in posts by men or those that mentioned nominated researchers. Statistics was covered more by ECRs and less in posts that mentioned nominated researchers.
Engagement: with what immediate effects?
In terms of immediate impact through comment engagement, blogs received a median of 3.34 comments per post (2.10 unique commenters), but a prominent blog could receive up to 16.40 comments per post (9.63 from different commenters). Note that these counts include comments by the blog authors themselves; five of the blogs did not allow comments.
For social-media impact, blogs received a median of 16.48 interactions (including reactions such as likes, comments, and shares). Numbers for views were larger, with a median of 1,959.81 (range = 64.6–9,354.15). Many people are visiting, but few are engaging.
Which bloggers?
More established and male bloggers captured more engagement (for results, see Tables 3 and 4). This was the case for all indicators, including the number of comments, commenters, views, and social-media impact; men mostly obtained twice or more the size of these indicators than did women. Non-ECRs also tended to double or more the impact of the ECRs (but when controlling for the date of the post, ECRs had slightly more comments than non-ECRs).
Comment Analyses
Note: Gender and career-stage predictors control for each other. Categorical baseline predictors are women, early-career researchers, and name not mentioned.
These post-level standardized predictors (i.e., those indicating average differences in topic proportions between posts within a blog) control for corresponding blog-level predictors. bThese blog-level standardized predictors (i.e., those indicating differences between the average topic proportions between blogs) control for corresponding post-level predictors.
Social Media, Views, and Post-Citation Analyses
Note: Social media impact refers to the number of social-media likes, comments, or shares. Gender and career-stage predictors control for each other. Categorical baseline predictors are women, early-career researchers, and name not mentioned.
These post-level standardized predictors (i.e., those indicating average differences in topic proportions between posts within a blog) control for corresponding blog-level predictors. bThese blog-level standardized predictors (i.e., those indicating differences between the average topic proportions between blogs) control for corresponding post-level predictors.
Mentioning individual researchers?
Whether a post mentioned names did relate to engagement: The number of comments in posts that mentioned judge-nominated researchers was almost twice as much as in posts with no mentions, and it was also somewhat larger in posts mentioning nonnominated names (vs. no mention). The case was similar, but to a smaller degree, for the number of commenters. Social-media impact and views were also higher for posts mentioning names (but unlike other indicators, nonnominated had a larger social-media impact than nominated names).
Which topics?
Increased replication talk at the post level resulted in increases across all indicators. Blogs that discussed replication more often received more views and had more social-media impact than those that discussed the topic less, but the difference was very small for the number of comments and commenters.
The relationship with comments and commenters was small for differences in the proportion of statistics coverage. On the other hand, social-media impact was lower for blogs (and even less so for posts) that discussed statistics more often, whereas post views were slightly higher (except at the blog level when controlling for date). Thus, statistics posts may be passively engaging but not as actively discussed or shared. Fraud-related words slightly related to less engagement across most metrics. Engagement with research-finding talk depended on the medium, and effects were small.
Overall, views were high, but engagement via comment was relatively low. Engagement was higher when specific names or replication appeared. Male and established bloggers elicited more engagement.
Citations: with what longer-term effect?
A more formal measure of impact is the number of blog citations in Google Scholar: The same pattern of low numbers of total citations emerged for most blogs (Mdn = 3, M = 22.8, SD = 51.4), with exceptions going up to 308 citations. In terms of the bloggers, Google Scholar profiles 6 include journal articles, chapters, and books but not blog posts, with very few exceptions. All bloggers had some citations (minimum = 3, Mdn = 3,199.5, M = 9,651.3, SD = 15,966.8), and this varied up to bloggers with citation counts indicating very influential publications (maximum = 72,332). The median number of blog citations was 0.07 per post (M = 0.16, SD = 0.28, maximum = 1.71); the median rate of Google Scholar citations was 43.5 per publication (M = 58.24, SD = 66.87).
Which bloggers?
Men had more post citations than women. 7 Non-ECRs had more post citations than ECRs. These results are in line with shorter-term impact.
Mentioning specific researchers?
Post citations were positively related to name mentions, particularly nominated names, which almost tripled the number of citations.
Which topics?
In terms of content, replication coverage was again the strongest predictor, increasing the post-level citations by more than 50%. Post citations were slightly positively related to statistics coverage. Research findings’ post-level relationship with citations was positive. At the blog level, fraud talk was negatively correlated with post citations.
In sum, long-term impact reflected the same variables as did short-term impact: Established male bloggers and those discussing specific researchers and replication elicited more engagement.
Discussion
Online channels of scientific communication are changing our field’s feedback systems. Research-methods blogs’ data suggest an increasing number of blogs over the period included. Studying blogs’ properties and influence provides avenues to understanding how scientific communication is shifting, with opportunities to encourage positive impact and reduce drawbacks. We used text analysis to describe some patterns.
Blogs with what effect?
Analyses point to a small number of citations for most blogs, but some outliers have more than 25 times as many citations per post as the median blog. For a coarse comparison, the median blogger’s journal and book citations per publication was 669 times the number of post citations per post. Likewise, most blogs engaged only approximately two commenters per post (but with the potential to get 144 commenters in a post). Patterns were similar (but larger) for engagement through other social media, such as Facebook. Post views, on the other hand, were much larger, with a median of almost 2,000 for blogs with data (but noise, missing data, and differing collection times require caution when comparing across metrics). Thus, influence may be more passive: Readers engage by viewing or sharing the posts but not necessarily joining a discussion or citing the posts.
We note that some indicators of the effects of blogging, such as post views (including both distinct and recurrent views), identities of guest authors, and in some cases even the exact time and date of posting, were often unavailable. We exhort bloggers to make as much of this and other information available as possible for future examinations.
Blogs what?
Our results also highlight the diversity of topics covered by methods blogs. Discussions of replication were dominated by words such as “power” and “sample size”; other top words, such as “bias” and “report,” might indicate that publication bias and incomplete reporting were also in the conversation. Other topics suggested to remedy the replication crisis, such as meta-analyses or preregistration, seem to be less discussed. The top words of the fraud topic suggest it is closely linked to specific cases (in particular, Stapel) and data sharing. Replication talk was related to more impact and was covered more often by men than women. Relationships for other topics were smaller and less robust, but discussions of the fraud topic (mostly covered by men) were associated with less impact. Statistics (more often covered by ECRs than non-ECRs) related to less social-media impact but more passive impact. Thus, less technical discussions of replication-relevant issues seem to draw more attention than topics related to purely statistical discussions.
As an imperfect comparison, we applied the same topic model to 1,039 response articles in the journal Behavioral and Brain Sciences from 2014 through 2017 (for a summary of methods related to analyses of this journal, see the Supplemental Material). These are open peer commentaries to target articles and may thus be closer to blog posts and peer review than other traditional outlets. From the topics covered by the blogs, the journal articles covered mostly replication (13.9%), science communication (13.7%), and theory/research findings (10.4%). These percentages do not represent the actual coverage of these topics and should be interpreted only in relative terms. However, they do suggest differences, such as blogs’ higher focus on statistics, as well as similarities, such as shared interest in replication. Creating new LDA models for these data also suggested a larger focus on theoretical content (vs. statistics) for the commentaries.
Finally, for another comparison, we reviewed a human-coded analysis of 402 actual reviews from several American Psychological Association journals (note that the categories are not entirely comparable, and the era is earlier; D. W. Fiske & Fogg, 1990): The most frequent comments concerned interpretations and conclusions (16.1%) and preexecution conceptualization (15.2%)—statistical analyses (8.5%) and measurement (7.3%) issues were half as frequent. Together with the emphasis in the 1970s on theoretical critiques (see opening paragraph), this suggests that blogs focus more on methods/statistics and less on conceptualization than some other forms of peer review.
Blogs about whom?
In an initial look at some of the criticisms that have been raised against current social-media practices, we explored the frequency of individual-researcher mentions. Blogs mentioned at least one of the judge-nominated researchers (viewed as having been discussed critically) in an average of 11% of posts, and specific nominated researchers were mentioned in up to 154 posts in a little more than 6 years. Posts with person mentions received more views, social-media impact, and comments and citations, and these results were generally larger for judge-nominated researcher mentions (vs. nonnominated names).
Compared with posts with no nominated researcher mentions, those with mentions talked less about statistics and fraud but more about replication and research findings. This might suggest that posts mentioning nominated researchers tended to focus on high-level discussions of replication, power, and experimental design—but less on more technical statistical details.
Implications for controversies, equity, methods, and history
This exploration has several potential implications. Given that online social-media readers who report feeling personally attacked and seeing others being attacked also report feeling disengaged from the field (West & Skitka, 2017), the kinds of examinations just summarized may aid understanding how to most effectively communicate scientific criticism through blogs. For example, some have suggested that target gender may play a role in online criticism (e.g., Elsesser, 2018). Speaking to this issue, our data show no large discrepancies in the gender of nominated researchers, and in fact male researchers were mentioned by (particularly male) bloggers more often than were female researchers. This does not necessarily speak to the presence or absence of bias, which would require more complex analyses of, for example, base rates (e.g., gender distributions, publication outcomes), but these numbers are a first step in this direction. Future analyses can consider a variety of information and explore details such as the nature of individual researcher mentions (see the Supplemental Material for a limited attempt). Proposed civility norms include focusing on ideas, not the person; promoting inclusivity by attending to power differences; and fostering cooperation, openness, curiosity, and learning (Ledgerwood, 2017).
Our results may also speak to other discussions on how to improve online discussion. Despite blogs being able to provide an open-access platform for diverse voices, we have found discrepancies in who uses them and to what effect. For example, the larger number of men with blogs in our sample, despite estimates of a gender-balanced or female-majority psychology during the data-production period (e.g., Burrelli, 2008; National Science Foundation, 2015), is in line with previous reports (e.g., Weinstein & Sumeracki, 2017) suggesting that men use social media more actively (in this case by having blogs more often) than women do. As with the topic analysis, we looked at the gender of 1,039 responses in the journal Behavioral and Brain Sciences using an automated gender-classification algorithm (thus, estimates are noisy; see the Supplemental Material). Results suggested that, in line with the results from the blogs, most authors in this more traditional format of open peer review were male (63.3% male vs. 31.7% female). However, in our blog data, the women who did have blogs posted more often (measured by medians) than men did. Nonetheless, male-led blogs had more citations, social-media impact, and comments per post, a finding that replicates the pattern in scientific journals (see Eagly & Miller, 2016). In addition, established researchers had more posts, and their posts had more impact. Thus, incentives and priorities associated with who speaks in social media may be a function of career stage and demographics.
In general terms, the questions we deal with here have implications for issues such as the consideration of blog posts as legitimate citations and otherwise regarded similarly to traditional peer-reviewed publications, or the role that blogging could play in hiring and promotion decisions. For example, our results suggest that indeed some are already citing blogs in outlets indexed by Google Scholar, and in turn these citation counts may inform career-status decisions. Our data speak in other ways to these issues, but mostly, we provide an initial step in opening up the use of more exploratory methods to inform discussions about the role of social media in science communication.
Limitations and future directions
To be sure, this report has some limitations. For example, the absence of directly applicable base rates or direct comparison groups makes it difficult to extrapolate some of our findings to several current discussions surrounding online forums (e.g., diverse representation). The presented gender base rates might not be the most applicable (e.g., some bloggers were not psychologists), and using bloggers’ journal and book citations as a comparison for blog citations is not an ideal anchor. Appropriate comparison groups could range from conference discussions to editorial boards to peer review. Although we have attempted to review or provide data on some of these comparison groups, many are not amenable to data collection. The issues discussed here about newer scientific outlets may also appear in more traditional forums. The possibility to conduct this exploration, which can examine properties of the communication and medium, in itself is a strength of open media, such as blogs, as a format of scientific communication. Nonetheless, even if the base rates or comparison groups provided do not cover all of the appropriate options, estimating demographics or impact is valuable and provides a historical record for future examinations of how scientific communication in the field continues to evolve.
In addition to these issues, we present correlational data and try to keep our models simple, allowing for many plausible explanations that can be further explored from the data. Moreover, our findings explore a subset of psychology-methods blogs that we believe represents most such blogs, independently selected, but might be biased toward salient blogs. Hence, we refrain from making strong inferences beyond describing the data and simply present potential interpretations. Nevertheless, we believe this is a step in the direction of using resources such as text analysis to ground discussions about science communication in psychology. Future research could expand on this approach by incorporating qualitative work and by drawing additional relevant comparisons with other traditional publication practices. Follow-up work could explore the content and impact of other relevant social media, including Facebook discussion groups and Twitter. Each of these platforms has unique attributes (e.g., tweets have a word limit of 280 characters vs. the average post length of 1,116 words; Facebook groups have multiple posters and sometimes moderators); these could make them prone to specific content and differential reach.
As scientific forms of communication change, tracking these adjustments will help monitor how they impact scientific discussion. In current and future crises in psychology and other fields, understanding the conversation may help us more effectively improve the science.
Supplemental Material
Nicolas_Supplemental_Material – Supplemental material for Exploring Research-Methods Blogs in Psychology: Who Posts What About Whom, and With What Effect?
Supplemental material, Nicolas_Supplemental_Material for Exploring Research-Methods Blogs in Psychology: Who Posts What About Whom, and With What Effect? by Gandalf Nicolas, Xuechunzi Bai and Susan T. Fiske in Perspectives on Psychological Science
Footnotes
Action Editor
Brad Bushman served as action editor for this article.
Declaration of Conflicting Interests
The author(s) declared that there were no conflicts of interest with respect to the authorship or the publication of this article.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
