“By visualizing information, we turn it into a landscape that you can explore with your eyes. A sort of information map. And when you’re lost in information, an information map is kind of useful.”
-David McCandless, data journalist
It may be helpful to include figures in your projects, dissertations and research papers to engage with the reader. Figures can be used to emphasise your point and support your arguments, which can simplify complexity in your findings, especially if your topics are highly specialised! Your figures must be clearly labelled, and you must also always explain in writing what they mean.
Both graphs and charts are used to display data in a visual format in order to convey complex information in an easier way than just reporting numbers. Representing data visually can help us identify patterns and trends in the data you collect, which you can then explain to your reader. They can be more useful to you than just writing that a pattern occurs and your readers having to take your word for it.
Strictly speaking, the two terms 'graph' and 'chart' are different to each other, however they are often used interchangeably. Generally, a graph is used to display numerical data, to help you understand the shape and distribution you data has, whereas a chart is used to display non-numerical data. What you must remember though is that if you are including any graph or chart in a piece of academic writing such as a dissertation, you must only refer to them as figures!
There are many ways to present data in an analysis, and some are more suitable for particular types of data than others. In order to choose which format to display your data in, you need to consider what variables you have and what information your audience need to get from them.
The tabs of this guide will support you in understanding and utilising various graphs and charts to display and visualise your data. It is recommended to look at a variety of graphs and charts to see which is the most appropriate for displaying the data you have. The sections are organised as follows:
and each tab will provide an explanation of each graph and chart, with step-by-step guides to create them using various softwares.
Think about how you would like to present your data, whether it be visually through a graph or chart, or in a table. Figures can also include photographs and images, or diagrams to illustrate your point, but these options will not be discussed here.
Bar charts use bars to visually compare the values of different categories or groups using vertical (or horizontal, depending on the orientation of your chart) bars of equal width but varying length. They are structured and organised, and can be used to easily display frequencies or percentages of categorical variables.
Bar charts with more than one bar next to each other per category are called 'clustered bar charts', whereas those with more than one bar on top of each other are called 'stacked bar charts'.
The x-axis of a bar chart typically displays the categories or groups of the variable, whereas the y-axis is used to display either the values or percentage values the categories or groups hold. Sometimes a bar chart will switch the x- and y-axis around: when this happens the chart is then called a 'horizontal bar chart'. The length of the bars in the chart are proportional to the number/percentage size of the category.
Bar charts can display data horizontally or vertically, and really there is no way which is better than the other.
To create a bar chart using Excel, you need to highlight your data and then go to the Insert tab at the top. In the Charts group, select Insert Column or Bar Chart, which is the first one listed. From there you can choose to create a 2D or 3D bar chart, whether it should be clustered or not, or if it should be stacked or not.
The way to create bar charts in MATLAB is to use the 'bar' syntax, and specify what sets of values you wish to display in it. Let's assume you have defined a and b as your variables such that a is categorical and b is numerical.
bar(b)
will create a simple bar chart, with each bar representing the frequency of each element in b.bar (a, b)
will create a bar chart, with each bar representing the frequency of each element in b at the locations specified by a.bar (b, 'stacked')
will create a stacked bar chart of the frequency of the elements of b.bar (a, b, 'stacked')
will create a stacked bar chart of the elements of the frequency of a at the locations specified by b.bar (b, 'group')
will create a clustered bar chart of the frequency of the elements of b.bar (a, b, 'group')
will create a clustered bar chart of the frequency of the elements of b at the locations specified by a.
The way to create bar charts in R is similar to in MATLAB, but this time the syntax you need is 'barplot'. Let's assume you have created a dataset in R named dataset where two of its categorical variables are called a1 and a2, and one of its numerical values is called b.
barplot(b)
will create a bar chart of the values of b.barplot(b, horiz = TRUE)
will create a horizontal bar chart of b.If you have ggplot installed and running, you can use 'geom_bar'. Remember that the geom function in R is used for data points, and the aes function is used for variables.
ggplot(dataset) + geom_bar(aes(x = a1))
will create a bar chart of the values of a1.ggplot(dataset) + geom_bar(aes(x = a1, fill = a2))
will create a stacked bar chart of the values of a1 at the locations specified by a2.ggplot(dataset) + geom_bar(aes(x = a1, fill = a2)), position = position_dodge(preserve = 'single')
will create a clustered bar chart of the elements of a1 at the locations specified by a2.
Creating bar charts makes use of the 'SGPLOT' procedure, which involves the statements 'DATA', 'VBAR', 'GROUP' and 'GROUPDISPLAY'. To create bar charts in SAS, let's assume you have a dataset ready named dataset with categorical variables a1, a2 and numerical variable b.
proc sgplot data = dataset; vbar a1; run;
will create a bar chart of the values in a1.proc sgplot data = dataset; hbar a1; run;
will create a horizontal bar chart of the values in a1.proc sgplot data = dataset; vbar a1 / group = a2; run;
will create a stacked bar chart of the values of a1 at the locations specified by a2.proc sgplot data = dataset; vbar a1 / group = a2 groupdisplay = cluster; run;
will create a clustered bar chart of the elements of a1 at the locations specified by a2.
To create a bar chart in SPSS, go to Graphs and then Chart Builder.
In the new open window choose Bar in the Gallery pane and click and drag the relevant one you wish to create into the main 'Chart Builder' dialogue box. Then, click and drag the relevant variable(s) into the relevant box(es) and click OK.
Creating bar charts in STATA requires the 'graph bar' syntax. Let's assume you have a dataset named dataset with categorical variable a and numerical variable b.
graph bar over(a)
will create a bar chart with the groups of a on the x-axis.graph hbar over(a)
will create a horizontal bar chart, with the groups of a on the y-axis.Boxplots, also known as box and whisker diagrams, are graphs which easily display the minimum non-outlier value, first quartile, median, third quartile and the maximum non-outlier value in a dataset, as well as any outliers if present. With these display features, they are most appropriate for continuous data.
By displaying more than one boxplot together on the same graph you can compare distributions between groups, or even compare two datasets. Boxplots can be horizontal or vertical, it doesn't matter.
Boxplots consist of a box and two 'whiskers' (lines) extending either side to visually represent a numeric variable. The box extends from the first quartile (sometimes referred to as the 'lower quartile') to the third quartile (sometimes referred to as the 'upper quartile'), with a line in the middle where the median is. 50% of the values are contained in this box, and the other 50% lie outside on the whiskers (and beyond if necessary).
The whiskers extend to the minimum and maximum non-outlier values. Outliers are data points which lie further from 1.5 box lengths of each end of the box, and any outliers present are represented by dots lying outside the reach of the whiskers.
We can detect skew in a dataset by investigating the location of the median line in the box plot: if it is closer to the first quartile than the third, we can say that the data is right-skewed and if the median is closer to the third quartile than the first, we can determine that the data is left-skewed.
To create a boxplot in Excel, highlight your data and then go to the Insert tab at the top. In the Charts group, select Insert Statistic Chart and choose the Box and Whisker option. The boxplot will automatically appear.
Boxplots in MATLAB are created using the 'boxchart' syntax. Suppose you have a numeric variable b and categorical variable a.
boxchart(b)
will create a boxplot displaying b.boxchart(a, b)
will create a boxplot displaying b according to the groups of a.
Creating boxplots in RStudio requires the 'boxplot' syntax. Suppose we have a dataset named dataset which consists of numeric variables b1 and b2.
boxplot(b1)
will create a boxplot of b1.boxplot(b1, horizontal = TRUE)
will create a horizontal boxplot of b1.boxplot(values ~ group, dataset)
will create boxplots of variables b1 and b2 next to each other in the same graph.If you have ggplot2 installed and running, you can alternatively create boxplots using the 'geom_boxplot' syntax. Remember that the geom function in R is used for data points, and the aes function is used for variables.
ggplot(dataset, aes(x = b1, y = b2, fill = group)) +
geom_boxplot()
Boxplots in SAS are created using the 'SGPLOT' procedure, which makes use of the 'DATA' option and the 'VBOX' statement. By default, boxplots in SAS also will show the mean, which is indicated by a diamond. Let's assume we have a dataset ready in SAS named 'dataset' with continuous variable b which we wish to display. We would then write:
proc sgplot data = dataset;
vbox b;
run;
To create a boxplot in SPSS, go to Analyse and then Descriptive Statistics and then Explore.
In the new open window click and drag the relevant variable (or variables, if you wish to create more than one boxplot side-by-side) into the 'Dependent List' box.
Select Plots at the bottom of the window (not the option on the right-hand side!) and click OK.
Boxplots in STATA use the 'graph box' syntax. Let's assume you have categorical variables a1 and a2 with numeric variable b.
graph box b
will create a boxplot of b.graph hbox b
will create a horizontal boxplot of b.graph box b, over(a1)
will create boxplots next to each other of b according to the groups of a1.graph box b a2, over(a1)
will create boxplots next to each other of b according to the groups of a1 separated by a2.Heat maps are popular charts which use variations in colour, saturation or luminance to show the magnitude of individual data values in an area or category, and therefore are very easy to read. They can be used in matrices of data or in literal two-dimensional maps (however these are more likely going to actually be a 2D density plot or a choropleth).
Heat maps traditionally display bivariate data, where one variable lies on the x-axis and the other on the y-axis, like a line graph or scatterplot would. Each axis is divided into a consecutive, non-overlapping series of intervals, most likely of equal length, to make a grid. The number of observations in each cell is calculated and given a corresponding shade of colour from a designated colour gradient.
This type of chart is not the best one to use to evaluate patterns, but they can provide an overview of overall trends.
Cell colour can correspond to many different things, from frequencies to non-numeric groupings such as low/medium/low. It is important to observe the colour gradient key which should always appear with the heat map in order to make sense of the colourings. The general rule is that, when using a colour gradient, paler colours is used to denote lower frequencies and bolder, brighter colours are used to denote higher frequencies.
You can create a heat map in Excel using Conditional Formatting:
That's it! The selection of numbers you have highlighted will then be coloured according to the colour scale you chose. Be aware that the conditional formatting is dependent on the data you have in your dataset, so if any value was to change the format will be recalculated as a result, and may change the look of the whole thing!
Heat maps in MATLAB are created using the 'heatmap' syntax, which is a function which is used with tabular or matrix data. Let's suppose we have some tabular data tabdata, which contains categorical variables a1 and a2 and numerical variable b.
heatmap(tabdata, 'a1', 'a2', 'ColorVariable', 'b')
will create a heatmap using tabdata where a1 lies on the x-axis, a2 lies on the y-axis and the points in the middle are coloured and labelled according to the mean of b.Let's suppose now we have some (numerical) matrix data matrixdata.
heatmap(matrixdata)
will create a heat map with the values in the matrix coloured accordingly.
Create heat maps in RStudio using the syntax 'heatmap'. Let's assume we have a dataset 'dataset' in which lies a numeric matrix M.
heatmap(M, scale = "row")
If your dataset does not contain a matrix M to use, one can be created using the 'data.matrix' function.
Heat maps in SAS can be created for tables with the syntax 'HEATMAP' (or alternatively 'HEATMAPPARM') for the variables V1 and V2.
proc sgplot data = dataset;
heatmap x = V1 y = V2;
keylegend / title = "Heat Map";
run;
Heat maps in SPSS can only be created in version 28 or higher, so if yours is older than this you would not be able to create this using this software.
First, create a crosstable by going to Descriptive Statistics and then Crosstabs. Choose the relevant variables you would like to display into the row and column boxes and clicking OK. You crosstable will then be generated in the Output window. Double-click this to open the 'Pivot Table' pop-up window.
In this new window highlight all the cells to be coloured in the heatmap, right-click and select 'Colour Scales' to open the Colour Scales pop-up. You can choose which colours to display for your low and high values - a good idea is to choose a paler colour for low values and a more bright and bold shade for the high values. Then, click OK to exit the 'Colour Scales' pop-up and then click the X in the top corner to exit the 'Pivot Table' pop-up.
A Histogram is a type of frequency distribution graph as it effectively summarises the distribution of quantitative interval data. They are large-sample tools, which means that they are best suited to display large samples of data (over 100 data points).
Histograms are constructed by dividing the x-axis into a series of consecutive, non-overlapping intervals called 'bins', usually of equal length, and drawing a rectangle over each bin whose area is proportional to the number of data points in that bin. Each rectangle in the histogram is consecutive, meaning that they lie next to each other with no gaps in between, and the shape of the histogram is deduced by the overall shape the rectangles provide. In this way, you can easily see the central tendency, spread, skewness and kurtosis a frequency distribution has.
A histogram has an x-axis labelled with the variable or data being represented, and a y-axis labelled 'frequency' or 'relative frequency'. The area of each of the rectangles is the frequency of each value in the frequency table, which means that you can clearly see the overall shape the dataset has by observing the overall shape the rectangles give the graph.
You can observe the overall central tendency a dataset has by observing where the histogram's peak lies, as this is what represents where the concentration of points lies.
Histograms and bar charts can potentially look very similar, but there are a few key differences between them. In a bar chart, the categories on the x-axis can be put in any order in a bar chart, as there is no set order for them to go in, which means that you cannot say how the data is distributed based on the shape as the it will change every time you re-order the groups!
To create a histogram in Excel, highlight the relevant data to display and then go to the Insert tab at the top. In the Charts group, select Insert Statistics Chart and select Histogram to generate the graph.
Excel will automatically format the histogram in a certain way, however you may need to change for example the number of bins required. To make changes, you will need to right-click the chart axis and select Format Axis, modifying as necessary in the pane which appears.
For MATLAB you will only need to use the 'histogram' syntax, so for the continuous variable b we write:
histogram(b)
We can create a histogram in R using the built-in 'HIST' function, which by default will create a frequency histogram. For a dataset named 'dataset' continuous variable b we simply write:
hist(b)
Otherwise, if ggplot2 is installed, we can instead write:
ggplot(dataset, aes(x = b)) +
geom_histogram()
Histograms in SAS are created using the 'SGPLOT' procedure, which makes use of the 'DATA' option and the 'HISTOGRAM' statement. For the dataset 'dataset' and continuous variable b, we create a histogram with the following:
proc sgplot data = dataset;
histogram b;
run;
To create a bar chart in SPSS, go to Graphs, Legacy Dialogues and then Histogram. In the new open window choose the variable you wish to display from the left on the left and click-and-drag it over into the Variable box.
If you do not wish to have the normal curve plotted on the graph make sure the box 'Display normal curve' is unticked. Then, click OK to have the graph generated.
Histograms in STATA require the syntax 'HIST'. Let's assume we have a continuous variable b that we wish to display. We would then type:
hist b, freq
to generate the histogram.
Line graphs use lines to connect data points, and are useful to show continuous changes with respect to another variable (over time, for example). The data points are joined together with a line to easily show trends and patterns...if they exist!
The x-axis is reserved for the independent variable whereas the y-axis is used for the dependent variable. As the independent variable varies, you will easily see how the dependent variable changes as a result, and therefore identify any trends/patterns which may be present. Most commonly, the independent variable will be time, so that you can track how the dependent variable changes as time goes on. Line graphs are suitable for datasets of all sizes.
Line graphs have an x- and y-axis, with one variable each plotted against one axis. If the line graph is being used to show changes over time, then the time variable would lie on the x-axis with the other variable on the other. The line on a line graph continuously joins together the data points. You can easily identify changes, trends and patterns in the data by observing the shape of this line. By identifying patterns in past data, you can predict what may happen in the future.
We can contextualise and interpret trends in the line graph to give us information on the data being presented. The close the R^2 value is to 1, the closer the trendline fits the data.
Trend Type | Explanation |
Linear |
A linear trend is one in which the line on the line graph is relatively straight, suggesting a constant, incremental rate of change over time. This trend can either be positive, where the data values are consistently increasing over time; negative, where the data values are consistently decreasing over time; or stable, where the data values are relatively constant over time. The equation of a linear trend is given by y = mx + c |
Polynomial |
A polynomial trend deviates from a straight line by having curves and or fluctuations, and is more useful than linear trends in accommodating fluctuations in the trend which may arise due to 'noise' in the data. Polynomial trends can take the form y = m1x1 + m2x^2 + ... + c which make them more accurate in displaying non-linear patterns, and may provide a more useful and accurate data representation. Polynomial trends appear in stock market analysis, climate change analysis and product lifecycle. |
Logarithmic |
A logarithmic trend is one where the data points on a graph show an increasing or decreasing trend which over time becomes shallower, and which may eventually plateau, suggesting that the rate of change is decelerating over time until there is no more growth. This trend is so-called as it follows the shape of the logarithmic curve, and the equation for this is given by: y = m ln(x) + c |
Exponential |
An exponential trend is one where the data points on a graph show a rapid increasing or decreasing trend which over time becomes steeper, suggesting that the rate of change is accelerating over time. The equation of the exponential line is y = m e^(nx) + c Exponential and logarithmic trends are commonly found in economics, for example compound interest. |
Power |
A power trend is one which displays an exponential growth or decay. They are modelled by the equation y = mx^(n) Notice that there is no constant, so the change in y is dependent on the changing values of x. |
Periodic | A period trend is where the data points on the graph repeat in a cyclical or repeated pattern over time, suggesting that there is some seasonality to the data. |
This type of graph can contain more than one line - each represents another variable being displayed, making for an easy comparison between how each variable changes over time.
To create a line graph in Excel, highlight your data and then go to the Insert tab at the top. In the Charts group, select Line and choose the pie chart option. The chart will automatically appear.
Note that your data needs to be in a tabular format in order for the line chart to form properly.
Line graphs are the easiest thing to plot in MATLAB as the syntax used is simply 'plot'! So, for two variables m and n, we just need to write
plot (m, n)
Here, m would end up being plotted against the x-axis and n would be against the y-axis.
Similarly to MATLAB, the syntax for R is simply 'plot', so for two variables m and n contained in the dataset 'dataset' we can write:
plot(m, n, type = 1)
Or, utilising ggplot2, we can instead write:
ggplot(dataset, aes(x = m, y = n, col = line)) +
geom_line()
Line graphs in SAS make use of the syntax 'proc sgplot', so for a dataset 'dataset' and variables m and n we type:
proc sgplot data = dataset;
series x = m y = n;
run;
To create a line graph in SPSS, go to Graphs and then Chart Builder.
In the new open window choose Line in the Gallery pane and click and drag the relevant graphic you wish to create into the main Chart Builder dialogue box. Then, click and drag the relevant variable(s) into the relevant box(es) and click OK.
You can create a line graph using the STATA menus by going to Graphics and then Twoway graph (scatter, line, etc.). In the pop-up window select 'Basic plots' and then 'Line' in the 'Basic plots: (select type)' menu. Then, select the relevant X and Y variables from the drop-down menus in the 'Plot type: (line plot)', click Accept and then Submit.
So called because they look rather like a pie, pie charts are used to depict how a dataset is made up using 'pie slices' to show relative sizes. Each segment adds up to the total number of the population (or 100% if it is being used to show percentages), and each segment is sized according to its percentage proportion. This is because pie charts show proportions of a whole, as opposed to the differences between groups.
Each segment must be appropriately labelled and coloured so that the reader can easily understand the information being displayed.
If precision is paramount in the display of data, a pie chart may not be best to use: instead, consider using a bar chart.
The categories in a pie chart are displayed by wedges of a circle proportional to the percentage size that category has.
Both bar and pie charts can display categories of data, the proportions of which are graphically represented by the size of the bars or slices of the pie. However, a pie chart can only be used to show the breakdown of the whole, rather than just the variability between categories, which means that a pie chart can be used with fewer
If the total proportions of the data do not add up to 100% — whether it be greater than or less than 100% — then we cannot use a pie chart to display the data. Pie charts always show the breakdown of the whole, so it does not make sense if the proportions do not make up 100%. In this case, a bar chart would be more appropriate to display this data.
To create a pie chart in Excel, highlight your data and then go to the Insert tab at the top. In the Charts group, select Insert Pie or Doughnut Chart and choose the pie chart option. The chart will automatically appear.
Be aware that your data needs to be in a certain format in order for the pie chart to form properly: like with line data, it is best to create a table with the headings being the segments of the pie and the frequencies underneath:
Apples | Bananas | Kiwi fruits | Oranges | Peaches |
18 | 24 | 15 | 20 | 14 |
as opposed to listing, for example, 'apples' 18 times in a column, 'bananas' 24 times, and so on.
If you have a vector V saved in MATLAB then you can easily create a pie chart using the syntax
pie(V)
which will create a pie chart depicting the segments of the vector as proportions of the whole.
If the sum of the entries in V is greater than 1 then the pie chart generated will be proportional, however if the sum is less than 1 then the pie chart will be incomplete, so be aware!
For the data set 'dataset' containing the categorical variable a and continuous variable b, you can either use the built-in syntax
pie(b, labels = a, main = "Pie Chart")
to create a pie chart in R, or alternatively can use the 'ggplot2' package with the syntax
ggplot(data, aes(x = "", y = b, fill = a)) +
geom_bar(stat = "identity", width = 1, color = "white") +
coord_polar("y") +
theme_minimal() +
theme(axis.text = element_blank(),
axis.title = element_blank(),
legend.position = "bottom")
If you have a dataset 'dataset' containing a categorical variable a and continuous variable b you can create a pie chart in SAS using the following:
prog gchart data = dataset;
pie team;
run;
quit;
To create a pie chart in SPSS, go to Graphs and then Chart Builder.
In the new open window choose Pie/Polar in the Gallery pane and click and drag the relevant graphic you wish to create into the main Chart Builder dialogue box. Then, click and drag the relevant variable(s) into the relevant box(es) and click OK.
Alternatively, you can create a pie chart by going to Analyze, Descriptive Statistics and then Frequencies, then then dragging over the relevant variable into the 'Variable(s)' box in the window which pops up. Click on Charts on the left and make sure 'Pie Charts' is selected, then click Continue and then OK.
In STATA you will need to use the syntax 'graph pie' to create a pie chart. If you have a dataset with continuous variable b and categorical variable a, then you can write:
graph pie b, over(a),
plabel(_all name, size(*1.5) color (white))
Scatterplots, also known as scatter graphs or scatter charts, visualise data which is numerical data: more specifically, they show the potential relationships (called 'correlations') between two quantitative variables. They contain an x- and y-axis, and the data is displayed in dots or points on the graph which represent the corresponding points. Drawing a line of best fit through the data points can emphasis the strength of the correlation.
With one variable plotted along the x-axis and the other along the y-axis, these graphs are useful to show if a non-/linear relationship exists between the two by observing the pattern the points on the graph make. These patterns can show us if the relationships between variables display linearity or non-linearity, positivity or negativity (increasing or decreasing), and also the strength.
The graph can be fitted with a line of best fit to emphasise this relationship. It is very easy to spot anomalies in your data using a scatter graph. The closer this line is to 45°, the stronger the relationship.
For more information on interpreting the line of best fit, have a look at the table of trend lines on the Line Graphs tab.
To create a scatterplot in Excel, highlight the relevant data to display and then go to the Insert tab at the top. In the Charts group, select Insert Scatter (X, Y or Bubble Chart) and choose the Scatter option. The graph will automatically appear.
To create a scatterplot in MATLAB with two continuous variables b1 and b2, use the syntax
scatter(b1, b2)
Creating such a graph in R can be done with the built-in function 'plot' which will automatically create a scatter plot. Suppose we have a dataset named dataset which contains two categorical variables b1 and b2. Then, all that is needed to create the scatter graph is writing:
plot(b1, b2)
Otherwise, if you have the 'ggplot2' package installed and loaded, you can instead write:
ggplot(data = dataset, aes(x = b1, y = b2)) +
geompoint()
To create a scatterplot of the categorical variables b1 and b2 which exist in the dataset 'dataset' in SAS you can write:
proc sgplot data = dataset
scatter x = b1 y = b2;
run;
To create a scatterplot in SPSS, go to Graphs and then Chart Builder. In the new open window choose Scatterplot in the Gallery pane and click and drag the relevant plot you wish to create into the main 'Chart Builder' box. Click and drag the relevant variable into the relevant x- and y-axis boxes and click OK.
A scatterplot in STATA can be created for two continuous variables b1 and b2 with the syntax 'twoway (scatter X1 X2)'.
scatter b1 b2
Alternatively, scatterplots can be created using the Graphics menu by going to Graphics and then Twoway graph (scatter, line, etc.) and Create. In the pop-up window, select Basic plots from the list on the left, and then select Scatter from the list on the right. Then, under the Plot type: (scatterplot) section select the relevant X and Y variables under the relevant drop-downs. Finally, click Accept and Submit.