Abstract

Introduction
In the previous two Research Design and Statistics reports in Phlebology, we have considered potential ways of measuring the effect size for continuous and binary outcomes. 1,2 In discussing these issues, we have also briefly mentioned confidence intervals. In this report, we shall consider the definition of confidence intervals and their interpretation in more detail.
We shall use the example previously presented in ‘So there's a difference, but how big is it? Measuring the effect size for binary outcomes’. 1 Here, we wished to investigate patient satisfaction with a new treatment for varicose veins, compared with the gold standard of traditional surgery. 1 The two interventions were compared in a randomized controlled trial. We found that 475/501 (94.8%) of individuals who received the new technique, and 446/503 (88.7%) who received the standard of care, were satisfied with the outcome of their treatment. There was very strong evidence that the new treatment is associated with increased patient satisfaction (P < 0.0001; chi-squared test). The observed relative risk (RR) was 1.07 (=94.8%/88.7%).
Samples and populations
We have discussed samples and populations in a previous publication in Phlebology. 3 In brief, when performing our studies, we are usually interested in answering a research question in a complete population. However, it is nearly always impossible to obtain the relevant information on every single person in our population. We therefore usually take a representative sample of individuals from our population of interest to include in our study. In our example, our study population was all adults requiring treatment for varicose veins. Our sample may perhaps have been 1004 individuals recruited from hospitals across the UK.
Therefore, the results we obtain (i.e. our RR of 1.07) describe the effectiveness of the new treatment in our study sample, not in the entire population. So, how do we use the results of our study to discover the effectiveness of the new intervention in the population?
Sample statistics and population estimates
We wish to estimate the RR for our population – i.e. if our intervention was given to all individuals with deep vein thrombosis, how would patient satisfaction compare with if the gold standard was used? What is our ‘best guess’ of the true efficacy of our treatment? One logical choice is to use the RR obtained in our sample of 1004 individuals to estimate the RR in the population. Thus, we could conclude that our estimate of the population treatment effect is 1.07.
However, it is clear that we cannot definitively state that the true effect of our intervention is 1.07, as we have only estimated the population RR using our sample RR. We have therefore introduced random error. If another sample of individuals were instead included in our study, we would have obtained a slightly different result by chance. How sure are we of the accuracy of the estimate of 1.07? Further information is gained by calculating a confidence interval.
Definition and interpretation of a 95% confidence interval
The formal definition of a 95% confidence interval is that, ‘if we were to draw several independent, random samples from the same population and calculate 95% confidence intervals for each of them, then on average 95% of such confidence intervals would contain the true population estimate’ (quote taken from ref. 4 , see also ref. 5 ). As we have previously seen, the 95% confidence interval for the RR in our hypothetical study was 1.03–1.11.
So, how do we interpret this confidence interval in practice? Although not strictly correct, many interpret the 95% confidence interval as the range of values in which we are 95% sure that the true population RR lies. Thus, they would say that we are 95% sure that the true effect of the new treatment is somewhere between 1.03 and 1.11. Alternatively (and perhaps preferably), we can think of this interval as the range of plausible values for the population RR. Thus, we would say that we believe that the population RR could plausibly take any value in the range of 1.03–1.11, but we believe that values outside of this range are not plausible (as they are unlikely to be the true effect).
The width of the confidence interval gives us some idea about how uncertain we are about the population RR. A common approach is to look at the lower limit and upper limit of the confidence interval in turn, and imagine what our conclusions would be based on these values. If our conclusions would be the same then we can state that our confidence interval is precise
How to calculate the confidence interval
In practice, confidence intervals are usually calculated using statistical programmes rather than by hand. As this is a practical review of confidence intervals, readers are encouraged to refer to other sources if they are interested in the formulae used to calculate confidence intervals. 6 However, it is important to note that it is possible to calculate a confidence interval for nearly every sample estimate. 7 Furthermore, the two factors that primarily determine the width of the confidence interval is the sample size of the study and, for continuous variables, how variable the measure is. 8 The reasons for this are clear: the larger the number of individuals in a study, the more certain we will be that the result is accurate and we will therefore have a narrower confidence interval. Similarly, the less variable a continuous variable is, the more certain we will be of the accuracy of the findings, and we will again obtain a narrower confidence interval.
Other confidence intervals
Here we have considered 95% confidence intervals, as these are typically used in medical literature. However, there is no magical reason as to why 95% confidence intervals are presented, other than the fact that this is probably a reasonable level of accuracy. However, it would be equally valid to present 90% or 99% confidence intervals. Clearly, we will find that the 90% confidence interval is narrower than a 95% confidence interval, and a 99% confidence interval will be wider.
Conclusion
Confidence intervals provide additional information as to the certainty of our results of a study, and to the likely effect size of any intervention or risk factor. Guidelines for reporting results of randomized controlled trials and observational studies now recommend that confidence intervals are always presented in medical studies, and therefore understanding how they should be interpreted is vital. 9,10
