Abstract

In this issue of the Journal of Visual Impairment & Blindness (JVIB), in the article “The relationship between time-related characteristics of visual impairment and psychological symptoms in adults who are blind,” by Soyoung Choi, the author uses a wide range of statistical tests. This article illustrates one reason why it is important for readers of research articles to have at least a passing familiarity with statistics and statistical tests. In this article, the author reports results for independent t-tests, analysis of variance (ANOVA) tests, Wilcoxon–Mann–Whitney tests, chi-square tests, multiple linear regressions, and ordinal logistic regressions. It is as though the reader is offered a short course on reading statistical results in a single article. However, given the limit that authors have on article length, this array of statistical results does not allow the author to give a lot of explanation to the reader regarding the tests themselves. Luckily, long-time readers of these Statistical Sidebars will have had an introduction to all of these statistical topics in the past, and they can feel a little more confident in how they wade through the results section of this article.
For example, when readers are told that the statistical tests reported in the article are evaluated against a p value of .05, they will recognize that this figure is a standard level that is used in the behavioral sciences, and they will know that it means that if the significance level of a test is less than .05, it will be considered “statistically significant.” What this number essentially means is that the researcher has found that there is a less than 5% chance that the results under examination are due to chance. Establishing this level before running the statistical tests is called an a priori significance level.
Choi conveniently explains that continuous variables can be explored using independent t-tests, ANOVA, and Wilcoxon–Mann–Whitney tests. These tests require that the variables being studied are continuous, which means that the variables can theoretically assume any value along a continuum. The independent t-test would be used to compare the means of two groups, the ANOVA would be used to compare the means of three or more groups, and the Wilcoxon–Mann–Whitney is used to compare the medians of two groups (which means that a researcher uses it for a similar purpose as they would a t-test).
The Wilcoxon–Mann–Whitney test, by relying on medians instead of means in its administration, is not unduly influenced by outliers in the dataset; thus, it is used when the distribution of scores for a variable does not follow a normal distribution. For variables that were not continuous, but were categorical (which means the data were in categories), the researcher used chi-square tests.
All of the tests mentioned in the previous paragraphs are tests that readers of these sidebars have been introduced to and that should not, at this point, cause a glazing over of their attention (hopefully). A t-test reports a t value, an ANOVA reports an F value, and a Wilcoxon–Mann–Whitney test reports a U value. Each of these test results would have a significance level reported (p value) that is compared to the a priori level of .05 to see if the test p value is lower.
I am hoping that readers would be, at this point, feeling fairly comfortable in their reading of the results section of an article published in JVIB. Choi does not, however, allow readers to rest in comfort, but challenges readers with results of correlations, linear regression, and logistic regression. Again, long-time readers of these sidebars will have had a passing introduction to these topics, as well. These tests also rely on the variables that are under scrutiny being continuous.
The difference between the previously mentioned tests and correlations, linear regression, and logistic regression is that the linear and logistic regression tests create a mathematical model in which a selection of predictor variables is used to explain the distribution of an outcome variable. In the case of this article, variables such as onset of blindness, gender, and age are used to predict the values of the outcome variables of CESD score (a measure of depression) or a person's probability of having major depressive symptoms (MDD). This article illustrates why it is useful for readers to have at least a basic understanding of statistical tests. This author explains how data were treated so that the statistical tests would work properly and take into consideration idiosyncrasies in the dataset. The fact that the author reports these elements of the analysis leads to improved clarity but for many readers would only serve to confuse and confound since this reporting brings into the discussion many topics of statistical analysis and statistical language that takes the reader beyond a basic understanding of the tests being used. Without the basic understanding, a reader might easily become lost and adrift in the results section. However, armed with a knowledge of how the different tests are to be used, what they represent in terms of analysis, and how they can be used, readers will be more likely to be able to weather this additional content without being buffeted too fiercely.
When the author rescales variables in the regression analyses so that their scores can only be between 0 and 1 (a process called normalization), they are simply putting the data into a form that allows for a more intuitive understanding of the results. It is like putting all of the variables into the same framework so that a reader can more easily appreciate how the level of one variable relates to the level of another because they are on the same scale. The author also reports on how well the data in the article addresses regression assumptions. These are characteristics of data that must be in place in order for use of a regression analysis to be valid. Rather than trying to understand the mathematical background of these assumptions, it is sufficient that readers understand that all statistical tests have a set of assumptions for their use and researchers should check that these assumptions of the data are satisfied before using a test. This author peels back the curtain and allows the reader to see what assumptions they tested and that the assumptions were satisfied.
Finally, this author mentions the concepts of sample size, effect size, and power; in one sentence, she alludes to how these three statistical concepts are inter-related. These topics have been covered in previous sidebars but let me summarize by saying that effect size (which can have different symbols, depending on the test being used) is a measure of how meaningful the difference between variables is. The larger of an effect size that exists between variables, the smaller a sample size required to demonstrate that difference mathematically with a given level of assurance (power). In the behavioral sciences, we generally strive to have a statistical power of at least 0.8. This is a measure of how likely it is that a given statistical test will reflect an actual difference between variables and not some spurious difference due to chance in the data. You can see how larger effect sizes and larger sample sizes would lead to greater power. And you can also see how this author has conveniently offered to readers a short course on reading statistical tests and results, covering most of the primary concepts a reader should keep in mind when reading results sections.
