Skip to Main Content

Confidence Intervals: Maths and Stats

Guide contents


Fast facts

  • Confidence intervals are based on the normal distribution
  • The confidence level and significance level are related: confidence level = 1 - α
  • You can calculate the confidence interval for any confidence level you like: the most common is 95%.

Statistical Confidence

When we gather real-life data, it is often the case that the sample taken does not perfectly represent the population it is taken from. Therefore, for example, the mean of the sample is not exactly that of the mean of the population, which means that using just the sample mean to represent the population risks being inaccurate, simply due to the variation which naturally exists in different samples.

To accommodate for this, we create an interval of possible ranges the mean could exist within, based upon the information provided from the sample.

Statistical intervals include, but are not limited to, the confidence interval, the prediction interval, and the tolerance interval. A confidence interval is a range of values based on our sample that we can be quite sure contains a certain parameter, such as the population mean. Statistical confidence is typically given as a percent, which is calculated by:

Confidence = 100%(1 - α).

α is usually taken to be .01, .05 or .1, so we commonly see 99%, 95% or 90% confidence levels.

 


Central Limit Theorem

A sample can be taken of a population in many, many different ways, and it is possible that each different sample will produce a different sample mean. If we (hypothetically) repeatedly take different samples (of the same size!) of the same population and calculate the sample mean, we can form a distribution of sample means. 

As long as the sample size is large (generally, n > 30), the Central Limit Theorem states that this distribution will have the following characteristics:

  • The distribution will be normal (as in, it will have the smooth bell-shaped curve to it)
  • The mean is equal to the population mean
  • The standard error is equal to the standard deviation divided by the square root of the sample size n

When these cases are met, we have the formula for the confidence interval to be:

Confidence Interval equals point estimate plus minus Critical Value times Standard error

or, in other words:

Confidence Interval equals point estimate plus minus Critical Value times Standard deviation divided by square root of the sample size

We use the z-distribution to calculate the critical value. Note that for small sample sizes (n ≤ 30), the t-distribution should be used instead.

 


95% Confidence Interval Formula

In life, we can never be 100% sure of anything...but being, say, 95% sure of something is - in most cases - good enough. The 95% confidence interval is a range of sample values in which we can be 95% sure contains the population mean.

Here, 95% is called the confidence level, and is related to the significance level. In fact, since 

Confidence level = 1 - α

we therefore have, for a 95% confidence interval, the significance level α = .05.

Recall that, under the normal distribution, 95% of the data lie within 1.96 standard deviations of the mean. Since under the Central Limit Theorem the distribution is normal, we can say the same for sample means from the population mean. Therefore, the formula for the 95% confidence interval (CI) is:

95% confidence interval equals x bar plus minus 1.96 times s divided by root n

where:

  • is the sample mean
  • s is the standard error
  • n is the sample size.

 

Example

Let's say we take the following sample of ages of children and young people who signed up to participate in a clinical trial:

21, 20, 15, 15, 20, 21, 20, 14, 25, 26, 18, 25, 22, 19, 21, 23, 19, 21, 21, 21, 18, 19, 21, 19, 22, 23, 21, 15, 17, 26, 12, 15, 21, 14, 20

With only this sample, what would we expect population mean of ages in the clinical trial to be, with 95% certainty? 

The mean of this sample, , is 19.71, and the sample size n is 35. Let's assume that we are given the standard deviation of the population s to to be 3.99.

With the formula, we calculate:

95% CI = 19.71 ± 1.96(3.99/√35)

95% CI = 19.71 ± 1.32

95% CI = [18.39, 20.03]

Hence, we are 95% certain that the mean student age of the module lies between 18.39 and 20.03.

We were not able to say for definite what the population age is, but that is okay! Having a range of values it could be in is sufficient.


99% Confidence Interval Formula

There are some cases where we need to be more than 95% sure that a sample of values contains the population mean: for example, we could require a confidence level of 99%, and therefore require the 99% confidence interval.

Recall that confidence level and significance level are related, so for a confidence level of 99%, we need a significance level of α = .01.

Under the normal distribution, 99% of values lie within 2.58 standard deviations of the mean, and so the 99% confidence interval formula is given by:

99% confidence interval equals x bar plus minus 2.58 s divided by root n

where:

  •  is the sample mean
  • s is the standard error
  • n is the sample size.

 

Example

Let's revisit the above example, with the sample of ages of children and young people who signed up to participate in a clinical trial:

21, 20, 15, 15, 20, 21, 20, 14, 25, 26, 18, 25, 22, 19, 21, 23, 19, 21, 21, 21, 18, 19, 21, 19, 22, 23, 21, 15, 17, 26, 12, 15, 21, 14, 20

With only this sample, what would we expect population mean of ages in the clinical trial to be, with 99% certainty? 

Once again, the sample mean, , is 19.71, and the sample size n is 35. Let's assume again that we are given the standard deviation of the population s to to be 3.99. Therefore, we have:

99% CI = 19.71 ± 2.58(3.99/√35)

99% CI = 19.71 ± 1.74

99% CI = [17.97, 21.45]

Hence, we are 99% certain that the mean student age of the module lies between 17.97 and 21.45.