Graphs and Charts: Maths and Stats

Overview of Graphs and Charts

“By visualizing information, we turn it into a landscape that you can explore with your eyes. A sort of information map. And when you’re lost in information, an information map is kind of useful.”

-David McCandless, data journalist

It may be helpful to include figures in your projects, dissertations and research papers to engage with the reader. Figures can be used to emphasise your point and support your arguments, which can simplify complexity in your findings, especially if your topics are highly specialised! Your figures must be clearly labelled, and you must also always explain in writing what they mean.

Both graphs and charts are used to display data in a visual format in order to convey complex information in an easier way than just reporting numbers. Representing data visually can help us identify patterns and trends in the data you collect, which you can then explain to your reader. They can be more useful to you than just writing that a pattern occurs and your readers having to take your word for it.

Strictly speaking, the two terms 'graph' and 'chart' are different to each other, however they are often used interchangeably. Generally, a graph is used to display numerical data, to help you understand the shape and distribution you data has, whereas a chart is used to display non-numerical data. What you must remember though is that if you are including any graph or chart in a piece of academic writing such as a dissertation, you must only refer to them as figures!

There are many ways to present data in an analysis, and some are more suitable for particular types of data than others. In order to choose which format to display your data in, you need to consider what variables you have and what information your audience need to get from them.

Guide contents

The tabs of this guide will support you in understanding and utilising various graphs and charts to display and visualise your data. It is recommended to look at a variety of graphs and charts to see which is the most appropriate for displaying the data you have. The sections are organised as follows:

• Bar Charts
• Boxplots
• Heat Maps
• Histograms
• Line Graphs
• Pie Charts
• Scatterplots

and each tab will provide an explanation of each graph and chart, with step-by-step guides to create them using various softwares.

Think about how you would like to present your data, whether it be visually through a graph or chart, or in a table. Figures can also include photographs and images, or diagrams to illustrate your point, but these options will not be discussed here.

Bar Charts

What are they?

Bar charts use bars to visually compare the values of different categories or groups using vertical (or horizontal, depending on the orientation of your chart) bars of equal width but varying length. They are structured and organised, and can be used to easily display frequencies or percentages of categorical variables.

Bar charts with more than one bar next to each other per category are called 'clustered bar charts', whereas those with more than one bar on top of each other are called 'stacked bar charts'.

How to Read a Bar Chart

The x-axis of a bar chart typically displays the categories or groups of the variable, whereas the y-axis is used to display either the values or percentage values the categories or groups hold. Sometimes a bar chart will switch the x- and y-axis around: when this happens the chart is then called a 'horizontal bar chart'. The length of the bars in the chart are proportional to the number/percentage size of the category.

Bar charts can display data horizontally or vertically, and really there is no way which is better than the other.

Creating Bar Charts Using Software

Excel

To create a bar chart using Excel, you need to highlight your data and then go to the Insert tab at the top. In the Charts group, select Insert Column or Bar Chart, which is the first one listed. From there you can choose to create a 2D or 3D bar chart, whether it should be clustered or not, or if it should be stacked or not.

MATLAB

The way to create bar charts in MATLAB is to use the 'bar' syntax, and specify what sets of values you wish to display in it. Let's assume you have defined a and b as your variables such that a is categorical and b is numerical.

• `bar(b)` will create a simple bar chart, with each bar representing the frequency of each element in b.
• `bar (a, b)` will create a bar chart, with each bar representing the frequency of each element in b at the locations specified by a.
• `bar (b, 'stacked')` will create a stacked bar chart of the frequency of the elements of b.
• `bar (a, b, 'stacked')`will create a stacked bar chart of the elements of the frequency of a at the locations specified by b.
• `bar (b, 'group')` will create a clustered bar chart of the frequency of the elements of b.
• `bar (a, b, 'group')` will create a clustered bar chart of the frequency of the elements of b at the locations specified by a.

R

The way to create bar charts in R is similar to in MATLAB, but this time the syntax you need is 'barplot'. Let's assume you have created a dataset in R named dataset where two of its categorical variables are called a1 and a2, and one of its numerical values is called b.

• `barplot(b)` will create a bar chart of the values of b.
• `barplot(b, horiz = TRUE)`will create a horizontal bar chart of b.

If you have ggplot installed and running, you can use 'geom_bar'. Remember that the geom function in R is used for data points, and the aes function is used for variables.

• `ggplot(dataset) + geom_bar(aes(x = a1))`will create a bar chart of the values of a1.
• `ggplot(dataset) + geom_bar(aes(x = a1, fill = a2))` will create a stacked bar chart of the values of a1 at the locations specified by a2.
• `ggplot(dataset) + geom_bar(aes(x = a1, fill = a2)), position = position_dodge(preserve = 'single')` will create a clustered bar chart of the elements of a1 at the locations specified by a2

SAS

Creating bar charts makes use of the 'SGPLOT' procedure, which involves the statements 'DATA', 'VBAR', 'GROUP' and 'GROUPDISPLAY'. To create bar charts in SAS, let's assume you have a dataset ready named dataset with categorical variables a1, a2 and numerical variable b.

• `proc sgplot data = dataset; vbar a1; run;` will create a bar chart of the values in a1.
• `proc sgplot data = dataset; hbar a1; run;` will create a horizontal bar chart of the values in a1.
• `proc sgplot data = dataset; vbar a1 / group = a2; run;` will create a stacked bar chart of the values of a1 at the locations specified by a2.
• `proc sgplot data = dataset; vbar a1 / group = a2 groupdisplay = cluster; run;` will create a clustered bar chart of the elements of a1 at the locations specified by a2

SPSS

To create a bar chart in SPSS, go to Graphs and then Chart Builder.

In the new open window choose Bar in the Gallery pane and click and drag the relevant one you wish to create into the main 'Chart Builder' dialogue box. Then, click and drag the relevant variable(s) into the relevant box(es) and click OK.

STATA

Creating bar charts in STATA requires the 'graph bar' syntax. Let's assume you have a dataset named dataset with categorical variable a and numerical variable b.

• `graph bar over(a)` will create a bar chart with the groups of a on the x-axis.
• `graph hbar over(a)` will create a horizontal bar chart, with the groups of a on the y-axis.

Boxplots

What are they?

Boxplots, also known as box and whisker diagrams, are graphs which easily display the minimum non-outlier value, first quartile, median, third quartile and the maximum non-outlier value in a dataset, as well as any outliers if present. With these display features, they are most appropriate for continuous data.

By displaying more than one boxplot together on the same graph you can compare distributions between groups, or even compare two datasets. Boxplots can be horizontal or vertical, it doesn't matter.

Boxplots consist of a box and two 'whiskers' (lines) extending either side to visually represent a numeric variable. The box extends from the first quartile (sometimes referred to as the 'lower quartile') to the third quartile (sometimes referred to as the 'upper quartile'), with a line in the middle where the median is. 50% of the values are contained in this box, and the other 50% lie outside on the whiskers (and beyond if necessary).

The whiskers extend to the minimum and maximum non-outlier values. Outliers are data points which lie further from 1.5 box lengths of each end of the box, and any outliers present are represented by dots lying outside the reach of the whiskers.

We can detect skew in a dataset by investigating the location of the median line in the box plot: if it is closer to the first quartile than the third, we can say that the data is right-skewed and if the median is closer to the third quartile than the first, we can determine that the data is left-skewed.

Creating Boxplots Using Software

Excel

To create a boxplot in Excel, highlight your data and then go to the Insert tab at the top. In the Charts group, select Insert Statistic Chart and choose the Box and Whisker option. The boxplot will automatically appear.

MATLAB

Boxplots in MATLAB are created using the 'boxchart' syntax. Suppose you have a numeric variable b and categorical variable a.

• `boxchart(b)` will create a boxplot displaying b.
• `boxchart(a, b)` will create a boxplot displaying b according to the groups of a.

R

Creating boxplots in RStudio requires the 'boxplot' syntax. Suppose we have a dataset named dataset which consists of numeric variables b1 and b2.

• `boxplot(b1)` will create a boxplot of b1.
• `boxplot(b1, horizontal = TRUE)` will create a horizontal boxplot of b1.
• `boxplot(values ~ group, dataset)` will create boxplots of variables b1 and b2 next to each other in the same graph.

If you have ggplot2 installed and running, you can alternatively create boxplots using the 'geom_boxplot' syntax. Remember that the geom function in R is used for data points, and the aes function is used for variables.

`ggplot(dataset, aes(x = b1, y = b2, fill = group)) + `

`geom_boxplot()`

SAS

Boxplots in SAS are created using the 'SGPLOT' procedure, which makes use of the 'DATA' option and the 'VBOX' statement. By default, boxplots in SAS also will show the mean, which is indicated by a diamond. Let's assume we have a dataset ready in SAS named 'dataset' with continuous variable b which we wish to display. We would then write:

`proc sgplot data = dataset;`

`vbox b;`

`run;`

SPSS

To create a boxplot in SPSS, go to Analyse and then Descriptive Statistics and then Explore.

In the new open window click and drag the relevant variable (or variables, if you wish to create more than one boxplot side-by-side) into the 'Dependent List' box.

Select Plots at the bottom of the window (not the option on the right-hand side!) and click OK

STATA

Boxplots in STATA use the 'graph box' syntax. Let's assume you have categorical variables a1 and a2 with numeric variable b.

• `graph box b` will create a boxplot of b.
• `graph hbox b` will create a horizontal boxplot of b.
• `graph box b, over(a1)` will create boxplots next to each other of b according to the groups of a1.
• `graph box b a2, over(a1)` will create boxplots next to each other of b according to the groups of a1 separated by a2.

Heat Maps

What are they?

Heat maps are popular charts which use variations in colour, saturation or luminance to show the magnitude of individual data values in an area or category, and therefore are very easy to read. They can be used in matrices of data or in literal two-dimensional maps (however these are more likely going to actually be a 2D density plot or a choropleth).

Heat maps traditionally display bivariate data, where one variable lies on the x-axis and the other on the y-axis, like a line graph or scatterplot would. Each axis is divided into a consecutive, non-overlapping series of intervals, most likely of equal length, to make a grid. The number of observations in each cell is calculated and given a corresponding shade of colour from a designated colour gradient.

This type of chart is not the best one to use to evaluate patterns, but they can provide an overview of overall trends.

How to Read a Heat Map

Cell colour can correspond to many different things, from frequencies to non-numeric groupings such as low/medium/low. It is important to observe the colour gradient key which should always appear with the heat map in order to make sense of the colourings. The general rule is that, when using a colour gradient, paler colours is used to denote lower frequencies and bolder, brighter colours are used to denote higher frequencies.

Creating Heat Maps Using Software

Excel

You can create a heat map in Excel using Conditional Formatting:

• Highlight all numeric data in your spreadsheet to include in the heat map (don't include column or row headings)
• Click the Conditional Formatting button located on the Home tab
• In the drop-down menu which appears, click Colour Scales and then choose a subtype you would like to use.

That's it! The selection of numbers you have highlighted will then be coloured according to the colour scale you chose. Be aware that the conditional formatting is dependent on the data you have in your dataset, so if any value was to change the format will be recalculated as a result, and may change the look of the whole thing!

MATLAB

Heat maps in MATLAB are created using the 'heatmap' syntax, which is a function which is used with tabular or matrix data. Let's suppose we have some tabular data tabdata, which contains categorical variables a1 and aand numerical variable b.

•  `heatmap(tabdata, 'a1', 'a2', 'ColorVariable', 'b')` will create a heatmap using tabdata where a1 lies on the x-axis, a2 lies on the y-axis and the points in the middle are coloured and labelled according to the mean of b.

Let's suppose now we have some (numerical) matrix data matrixdata.

• `heatmap(matrixdata)` will create a heat map with the values in the matrix coloured accordingly.

R

Create heat maps in RStudio using the syntax 'heatmap'. Let's assume we have a dataset 'dataset' in which lies a numeric matrix M.

`heatmap(M, scale = "row")`

If your dataset does not contain a matrix M to use, one can be created using the 'data.matrix' function.

SAS

Heat maps in SAS can be created for tables with the syntax 'HEATMAP' (or alternatively 'HEATMAPPARM') for the variables V1 and V2.

`proc sgplot data = dataset;`

`heatmap x = V1 y = V2;`

`keylegend / title = "Heat Map";`

`run;`

SPSS

Heat maps in SPSS can only be created in version 28 or higher, so if yours is older than this you would not be able to create this using this software.

First, create a crosstable by going to Descriptive Statistics and then Crosstabs. Choose the relevant variables you would like to display into the row and column boxes and clicking OK. You crosstable will then be generated in the Output window. Double-click this to open the 'Pivot Table' pop-up window.

In this new window highlight all the cells to be coloured in the heatmap, right-click and select 'Colour Scales' to open the Colour Scales pop-up. You can choose which colours to display for your low and high values - a good idea is to choose a paler colour for low values and a more bright and bold shade for the high values. Then, click OK to exit the 'Colour Scales' pop-up and then click the X in the top corner to exit the 'Pivot Table' pop-up.

Histograms

What are they?

A Histogram is a type of frequency distribution graph as it effectively summarises the distribution of quantitative interval data. They are large-sample tools, which means that they are best suited to display large samples of data (over 100 data points).

Histograms are constructed by dividing the x-axis into a series of consecutive, non-overlapping intervals called 'bins', usually of equal length, and drawing a rectangle over each bin whose area is proportional to the number of data points in that bin. Each rectangle in the histogram is consecutive, meaning that they lie next to each other with no gaps in between, and the shape of the histogram is deduced by the overall shape the rectangles provide. In this way, you can easily see the central tendency, spread, skewness and kurtosis a frequency distribution has.

A histogram has an x-axis labelled with the variable or data being represented, and a y-axis labelled 'frequency' or 'relative frequency'. The area of each of the rectangles is the frequency of each value in the frequency table, which means that you can clearly see the overall shape the dataset has by observing the overall shape the rectangles give the graph.

You can observe the overall central tendency a dataset has by observing where the histogram's peak lies, as this is what represents where the concentration of points lies.

The Difference Between a Histogram and Bar Chart

Histograms and bar charts can potentially look very similar, but there are a few key differences between them. In a bar chart,  the categories on the x-axis can be put in any order in a bar chart, as there is no set order for them to go in, which means that you cannot say how the data is distributed based on the shape as the it will change every time you re-order the groups!

Creating Histograms Using Software

Excel

To create a histogram in Excel, highlight the relevant data to display and then go to the Insert tab at the top. In the Charts group, select Insert Statistics Chart and select Histogram to generate the graph.

Excel will automatically format the histogram in a certain way, however you may need to change for example the number of bins required. To make changes, you will need to right-click the chart axis and select Format Axis, modifying as necessary in the pane which appears.

MATLAB

For MATLAB you will only need to use the 'histogram' syntax, so for the continuous variable b we write:

`histogram(b)`

R

We can create a histogram in R using the built-in 'HIST' function, which by default will create a frequency histogram. For a dataset named 'dataset' continuous variable b we simply write:

`hist(b)`

Otherwise, if ggplot2 is installed, we can instead write:

`ggplot(dataset, aes(x = b)) + `

`geom_histogram()`

SAS

Histograms in SAS are created using the 'SGPLOT' procedure, which makes use of the 'DATA' option and the 'HISTOGRAM' statement. For the dataset 'dataset' and continuous variable b, we create a histogram with the following:

`proc sgplot data = dataset;`

`histogram b;`

`run;`

SPSS

To create a bar chart in SPSS, go to GraphsLegacy Dialogues and then Histogram. In the new open window choose the variable you wish to display from the left on the left and click-and-drag it over into the Variable box.

If you do not wish to have the normal curve plotted on the graph make sure the box 'Display normal curve' is unticked. Then, click OK to have the graph generated.

STATA

Histograms in STATA require the syntax 'HIST'. Let's assume we have a continuous variable b that we wish to display. We would then type:

`hist b, freq`

to generate the histogram.

Line Graphs

What are they?

Line graphs use lines to connect data points, and are useful to show continuous changes with respect to another variable (over time, for example). The data points are joined together with a line to easily show trends and patterns...if they exist!

The x-axis is reserved for the independent variable whereas the y-axis is used for the dependent variable. As the independent variable varies, you will easily see how the dependent variable changes as a result, and therefore identify any trends/patterns which may be present. Most commonly, the independent variable will be time, so that you can track how the dependent variable changes as time goes on. Line graphs are suitable for datasets of all sizes.

How to Read a Line Graph

Line graphs have an x- and y-axis, with one variable each plotted against one axis. If the line graph is being used to show changes over time, then the time variable would lie on the x-axis with the other variable on the other. The line on a line graph continuously joins together the data points. You can easily identify changes, trends and patterns in the data by observing the shape of this line. By identifying patterns in past data, you can predict what may happen in the future.

We can contextualise and interpret trends in the line graph to give us information on the data being presented. The close the R^2 value is to 1, the closer the trendline fits the data.

 Trend Type Explanation Linear A linear trend is one in which the line on the line graph is relatively straight, suggesting a constant, incremental rate of change over time. This trend can either be positive, where the data values are consistently increasing over time; negative, where the data values are consistently decreasing over time; or stable, where the data values are relatively constant over time.  The equation of a linear trend is given by y = mx + c Polynomial A polynomial trend deviates from a straight line by having curves and or fluctuations, and is more useful than linear trends in accommodating fluctuations in the trend which may arise due to 'noise' in the data. Polynomial trends can take the form y = m1x1 + m2x^2 + ... + c which make them more accurate in displaying non-linear patterns, and may provide a more useful and accurate data representation. Polynomial trends appear in stock market analysis, climate change analysis and product lifecycle. Logarithmic A logarithmic trend is one where the data points on a graph show an increasing or decreasing trend which over time becomes shallower, and which may eventually plateau, suggesting that the rate of change is decelerating over time until there is no more growth. This trend is so-called as it follows the shape of the logarithmic curve, and the equation for this is given by: y = m ln(x) + c Exponential An exponential trend is one where the data points on a graph show a rapid increasing or decreasing trend which over time becomes steeper, suggesting that the rate of change is accelerating over time. The equation of the exponential line is y = m e^(nx) + c Exponential and logarithmic trends are commonly found in economics, for example compound interest. Power A power trend is one which displays an exponential growth or decay. They are modelled by the equation y = mx^(n) Notice that there is no constant, so the change in y is dependent on the changing values of x. Periodic A period trend is where the data points on the graph repeat in a cyclical or repeated pattern over time, suggesting that there is some seasonality to the data.

This type of graph can contain more than one line - each represents another variable being displayed, making for an easy comparison between how each variable changes over time.

Creating Line Graphs Using Software

Excel

To create a line graph in Excel, highlight your data and then go to the Insert tab at the top. In the Charts group, select Line and choose the pie chart option. The chart will automatically appear.

Note that your data needs to be in a tabular format in order for the line chart to form properly.

MATLAB

Line graphs are the easiest thing to plot in MATLAB as the syntax used is simply 'plot'! So, for two variables m and n, we just need to write

`plot (m, n)`

Here, m would end up being plotted against the x-axis and n would be against the y-axis.

R

Similarly to MATLAB, the syntax for R is simply 'plot', so for two variables m and n contained in the dataset 'dataset' we can write:

`plot(m, n, type = 1)`

Or, utilising ggplot2, we can instead write:

`ggplot(dataset, aes(x = m, y = n, col = line)) + `

`geom_line()`

SAS

Line graphs in SAS make use of the syntax 'proc sgplot', so for a dataset 'dataset' and variables m and n we type:

`proc sgplot data = dataset;`

`series x = m y = n;`

`run;`

SPSS

To create a line graph in SPSS, go to Graphs and then Chart Builder.

In the new open window choose Line in the Gallery pane and click and drag the relevant graphic you wish to create into the main Chart Builder dialogue box. Then, click and drag the relevant variable(s) into the relevant box(es) and click OK.

STATA

You can create a line graph using the STATA menus by going to Graphics and then Twoway graph (scatter, line, etc.). In the pop-up window select 'Basic plots' and then 'Line' in the 'Basic plots: (select type)' menu. Then, select the relevant X and Y variables from the drop-down menus in the 'Plot type: (line plot)', click Accept and then Submit.

Pie Charts

What are they?

So called because they look rather like a pie, pie charts are used to depict how a dataset is made up using 'pie slices' to show relative sizes. Each segment adds up to the total number of the population (or 100% if it is being used to show percentages), and each segment is sized according to its percentage proportion. This is because pie charts show proportions of a whole, as opposed to the differences between groups.

Each segment must be appropriately labelled and coloured so that the reader can easily understand the information being displayed.

If precision is paramount in the display of data, a pie chart may not be best to use: instead, consider using a bar chart.

How to Read a Pie Chart

The categories in a pie chart are displayed by wedges of a circle proportional to the percentage size that category has.

The Difference Between a Pie Chart and a Bar Chart

Both bar and pie charts can display categories of data, the proportions of which are graphically represented by the size of the bars or slices of the pie. However, a pie chart can only be used to show the breakdown of the whole, rather than just the variability between categories, which means that a pie chart can be used with fewer

If the total proportions of the data do not add up to 100% — whether it be greater than or less than 100% — then we cannot use a pie chart to display the data. Pie charts always show the breakdown of the whole, so it does not make sense if the proportions do not make up 100%. In this case, a bar chart would be more appropriate to display this data.

Creating Pie Charts Using Software

Excel

To create a pie chart in Excel, highlight your data and then go to the Insert tab at the top. In the Charts group, select Insert Pie or Doughnut Chart and choose the pie chart option. The chart will automatically appear.

Be aware that your data needs to be in a certain format in order for the pie chart to form properly: like with line data, it is best to create a table with the headings being the segments of the pie and the frequencies underneath:

 Apples Bananas Kiwi fruits Oranges Peaches 18 24 15 20 14

as opposed to listing, for example, 'apples' 18 times in a column, 'bananas' 24 times, and so on.

MATLAB

If you have a vector V saved in MATLAB then you can easily create a pie chart using the syntax

`pie(V)`

which will create a pie chart depicting the segments of the vector as proportions of the whole.

If the sum of the entries in V is greater than 1 then the pie chart generated will be proportional, however if the sum is less than 1 then the pie chart will be incomplete, so be aware!

R

For the data set 'dataset' containing the categorical variable a and continuous variable b, you can either use the built-in syntax

`pie(b, labels = a, main = "Pie Chart")`

to create a pie chart in R, or alternatively can use the 'ggplot2' package with the syntax

`ggplot(data, aes(x = "", y = b, fill = a)) + `

`geom_bar(stat = "identity", width = 1, color = "white") + `

`coord_polar("y") + `

`theme_minimal() + `

`theme(axis.text = element_blank(),`

`axis.title = element_blank(),`

`legend.position = "bottom")`

SAS

If you have a dataset 'dataset' containing a categorical variable a and continuous variable b  you can create a pie chart in SAS using the following:

`prog gchart data = dataset;`

`pie team;`

`run;`

`quit;`

SPSS

To create a pie chart in SPSS, go to Graphs and then Chart Builder.

In the new open window choose Pie/Polar in the Gallery pane and click and drag the relevant graphic you wish to create into the main Chart Builder dialogue box. Then, click and drag the relevant variable(s) into the relevant box(es) and click OK.

Alternatively, you can create a pie chart by going to AnalyzeDescriptive Statistics and then Frequencies, then then dragging over the relevant variable into the 'Variable(s)' box in the window which pops up. Click on Charts on the left and make sure 'Pie Charts' is selected, then click Continue and then OK.

STATA

In STATA you will need to use the syntax 'graph pie' to create a pie chart. If you have a dataset with continuous variable b and categorical variable a, then you can write:

`graph pie b, over(a),`

`plabel(_all name, size(*1.5) color (white))`

Scatterplots

What are they?

Scatterplots, also known as scatter graphs or scatter charts, visualise data which is numerical data: more specifically, they show the potential relationships (called 'correlations') between two quantitative variables. They contain an x- and y-axis, and the data is displayed in dots or points on the graph which represent the corresponding points. Drawing a line of best fit through the data points can emphasis the strength of the correlation.

With one variable plotted along the x-axis and the other along the y-axis, these graphs are useful to show if a non-/linear relationship exists between the two by observing the pattern the points on the graph make. These patterns can show us if the relationships between variables display linearity or non-linearity, positivity or negativity (increasing or decreasing), and also the strength.

The graph can be fitted with a line of best fit to emphasise this relationship. It is very easy to spot anomalies in your data using a scatter graph. The closer this line is to 45°, the stronger the relationship.

For more information on interpreting the line of best fit, have a look at the table of trend lines on the Line Graphs tab.

Creating Scatterplots Using Software

Excel

To create a scatterplot in Excel, highlight the relevant data to display and then go to the Insert tab at the top. In the Charts group, select Insert Scatter (X, Y or Bubble Chart) and choose the Scatter option. The graph will automatically appear.

MATLAB

To create a scatterplot in MATLAB with two continuous variables b1 and b2, use the syntax

`scatter(b1, b2)`

R

Creating such a graph in R can be done with the built-in function 'plot' which will automatically create a scatter plot. Suppose we have a dataset named dataset which contains two categorical variables b1 and b2. Then, all that is needed to create the scatter graph is writing:

`plot(b1, b2)`

Otherwise, if you have the 'ggplot2' package installed and loaded, you can instead write:

`ggplot(data = dataset, aes(x = b1, y = b2)) + `

`geompoint()`

SAS

To create a scatterplot of the categorical variables b1 and b2 which exist in the dataset 'dataset' in SAS you can write:

`proc sgplot data = dataset`

`scatter x = b1 y = b2;`

`run;`

SPSS

To create a scatterplot in SPSS, go to Graphs and then Chart Builder. In the new open window choose Scatterplot in the Gallery pane and click and drag the relevant plot you wish to create into the main 'Chart Builder' box. Click and drag the relevant variable into the relevant x- and y-axis boxes and click OK.

STATA

A scatterplot in STATA can be created for two continuous variables b1 and b2 with the syntax 'twoway (scatter X1 X2)'.

`scatter b1 b2`

Alternatively, scatterplots can be created using the Graphics menu by going to Graphics and then Twoway graph (scatter, line, etc.) and Create. In the pop-up window, select Basic plots from the list on the left, and then select Scatter from the list on the right. Then, under the Plot type: (scatterplot) section select the relevant X and Y variables under the relevant drop-downs. Finally, click Accept and Submit.