Measures of central tendency can help describe and summarise quantitative data, which in turn helps us understand it better. The mean, median and mode are all examples of measures of central tendency.
The arithmetic mean is the arithmetic average of all the values in the data set, and is calculated by adding all the values in the data set together and then dividing by the total number of values.
The Greek letter µ (pronounced 'mew') is used to denote the mean of a population, whereas (pronounced 'x bar') is used to denote the mean of a sample. Therefore, we can write
and
for the formulas of the population and sample mean respectively, where N is the number of data points in the population and n is the total number of data points in the sample.
Observe the following data set:
7, 12, 5, 18, 5, 9, 10, 9, 12, 8, 12, 16
which has twelve values (that is, n=12). The mean value is therefore calculated to be:
The median is the middle value of the data set, provided that the set has been ordered numerically. In general, for a data set of size n, the median is the
-th
value.
For example, observe the data set we had before:
7, 12, 5, 18, 5, 9, 10, 9, 12, 8, 12, 16
To find the median, we need to order it:
5, 5, 7, 8, 9, 9, 10, 12, 12, 12, 16, 18
Now, since there are twelve values (remember, n=12), we will need to find the -th value, or the 6.5th value - which is simply the number between the 6th and 7th values. Here, we have the sixth value as 9 and the 7th as 10, and therefore the median value is:
Notice that, like with the mean, this median value does not actually appear anywhere in our dataset. That's perfectly fine, sometimes this happens.
The mode is the value which appears the most frequently in the data set. For this reason, it is possible for data sets to have more than one mode in it. If there exist two modes in a data set, then it is said to be 'bimodal'; if there exist more than two, then it is said to be 'multi-modal'.
For example, observe the following observations:
7, 12, 5, 18, 5, 9, 10, 9, 12, 8, 12, 16
Here, we can see that the mode of this data set is 12, as it occurs the most frequently.