So there is a difference,but how big is it? Measuring the effect size for numerical outcomes using the mean and median

Abstract

Introduction

In the previous issue of Phlebology, we considered how to summarize the strength of an association between two variables for a binary outcome.¹ We were interested in investigating not only whether differences between groups of patients were statistically significant, but also whether they were clinically relevant. We used an example of a randomized controlled trial, but these methods can apply equally to cross-sectional studies and longitudinal cohort studies. In this short report, we shall consider how we might simply summarize data in studies with a numerical (quantitative) outcome.

Distribution of data

When considering numerical data, one of our first considerations should be the distribution, or ‘shape,’ of the data we are analysing. Choosing the correct summary measure and the correct statistical test frequently relies on knowing the data distribution (see other references for further details, e.g.^2,3). A common and easy way to examine the shape of data is by plotting a histogram. Some examples of the common types of data encountered in medical research are shown in Figure 1.

Figure 1

(a–d) Examples of distributions of numerical data: (a) symmetrical bell-shaped normal distribution; (b) positive skewness (skewed to right); (c) negative skewness (skewed to left); and (d) other distribution

Figure 1a shows data that follow a normal distribution (or Gaussian distribution). As we are aware, in general, data that follow a normal distribution are desirable in medical research. This is because several methods of analysis, such as correlation, regression, t-tests and analysis of variance all make assumptions about normality.⁴

In contrast, Figures 1b and c show data that follow a skewed distribution. We look at where the ‘tail’ of the distribution lies to determine whether our data are right (positively) skewed, such as in Figure 1b, or left (negatively) skewed, such as in Figure 1c. Finally, the data included in Figure 1d do not follow any of these distributions.

Other ways to assess whether data follow a normal distribution include using a statistical test, drawing a Q–Q plot and by calculating simple summary measures.^5,6

Mean and differences in means

The mean is obtained by summing all of the values in our sample, and then dividing by the total number of observations.⁷ Thus, it provides an overall summary of the values of all observations included in our data-set. We usually use the mean to summarize data that come from a normal distribution.

If we have two groups that we wish to compare, we can calculate the difference between means in the two groups, by subtracting the mean in one group from the value obtained in the other group. This then gives us an estimate of the effect size between the two groups.

Median

To calculate the median, we must first rank all our data in order, from the smallest value to the largest value. The median is then simply the value that lies in the middle of this list.⁸ Thus, for example, if there were 39 observations in our data-set, the median would be the 20th value once the data were listed from the smallest to largest. Thus, we can see that the median depends on the ordering of the data, but does not incorporate every value into the calculation. Therefore, it is less affected by particularly large or particularly small values (outliers). This measure is particularly useful for summarizing data that are not normally distributed, especially when there is a small sample size.⁸

Geometric mean

As we have seen that normally distributed data give us scope to perform more analyses, we can sometimes try to apply a transformation to our data which means that it then follows a normal distribution.^6,7,9 As an example, consider a set of data that are positively skewed. We frequently find that if we take the logarithm of each of our observations, then the distribution of our data on this log scale approximates a normal distribution.⁹ We can then confidently calculate the mean of the logged values. However, this mean is then on the log scale and therefore is difficult to interpret. However, if we take the anti-log (exponential) of this mean we find we have a summary measure that is in the same units as our original data. We call this summary measure the geometric mean. Readers interested in further information on this should consult the following Refs.^6,7,9

Measures of variability and confidence intervals

We can also give an indication of how variable our data are by presenting a standard deviation (SD) or a range along with our estimate of the mean.^6,7 Similarly, a median is often accompanied by a range or an interquartile range (IQR). These measures of variability provide us with some information regarding how to spread our data.

Furthermore, as for most statistics, we can calculate a confidence interval for both a mean and a difference between means. This will provide further information on the certainty of our findings.⁶ We can also calculate confidence intervals for medians, although these depend on extra assumptions,^6,10 which Altman and Bland state are almost as strong as those for methods which rely on the assumption of normally distributed data.²

Example

In a recent article in Phlebology, Carradice et al.¹¹ investigated the treatment response of individuals following surgery and endovenous laser ablation in a retrospective cohort study. The authors wished to compare patients with great saphenous vein and small saphenous vein incompetence. The authors state in the methods that continuous data were first tested for normality, and quoted as mean (95% confidence interval or SD) for normally distributed data, and as median (IQR) for non-normally distributed data.

Summary

In this short report, we have considered using the mean and median to summarize numerical data in medical research. When we wish to compare numerical data between different groups, merely showing that there is a ‘statistically significant’ difference between groups (through use of a hypothesis test and calculating a P value) is unlikely to be sufficient. We also need to consider the likely magnitude of any differences between groups, and whether we think this difference is likely to be clinically important. We can do so by choosing the correct summary measure, based on the distribution and sample size of the data in our study.

References

Smith

. So there's a difference, but how big is it? Measuring the effect size for binary outcomes. Phlebology 2012;27:38–40

Altman

, Bland

. Parametric v. non-parametric methods for data analysis. BMJ 2009;338:03167

Smith

, Fox

. The use and abuse of hypothesis tests: how to present P values. Phlebology 2010;25:107–12

Altman

, Bland

. Statistics notes: the normal distribution. BMJ 1995;310:298

Altman

, Bland

. Detecting skewness from summary information. BMJ 1996;313:1200

Petrie

, Sabin

. Medical Statistics at a Glance. Chapter 6. 3rd edn. Chichester, West Sussex, UK: John Wiley and Sons, Ltd, 2009

Kirkwood

, Sterne

JAC

. Essential Medical Statistics. 2nd edn. Oxford, UK: Blackwell Science

Altman

, Bland

. Quartiles, quintiles, centiles, and other quantiles. BMJ 1994;309:996

Bland

, Altman

. The use of transformation when comparing two means. BMJ 1996;312:1153

10.

Campbell

, Gardner

. Medians and their differences. In: Altman

, Machin

, Bryant

, Gardner

, eds. Statistics with Confidence. 2nd edn. London: BMJ Books, 2000:36–44

11.

Carradice

, Samuel

, Wallace

, Mazari

FAK

, Hatfield

, Chetter

. Comparing the treatment response of great saphenous and small saphenous vein incompetence following surgery and endovenous laser ablation: a retrospective cohort study. Phlebology 2011 [Epub ahead of Print]