# 14.4: Using Excel and R for an Analysis of Variance

Although the calculations for an analysis of variance are relatively straight-forward, they become tedious when working with large data sets. Both Excel and R include functions for completing an analysis of variance. In addition, R provides a function for identifying the source(s) of significant differences within the data set.

## Excel

Excel’s Analysis ToolPak includes a tool to help you complete an analysis of variance. Let’s use the ToolPak to complete an analysis of variance on the data in Table 14.3.1. Enter the data from Table 14.3.1 into a spreadsheet as shown in Figure $$\PageIndex{1}$$.

 A B C D E 1 replicate analyst A analyst B analyst C analyst D 2 1 94.09 99.55 95.14 93.88 3 2 94.64 98.24 94.62 94.23 4 3 95.08 101.1 95.28 96.05 5 4 94.54 100.4 94.59 93.89 6 5 95.38 100.1 94.24 94.59 7 6 93.62 95.49

Figure $$\PageIndex{1}$$. Portion of a spreadsheet containing the data from Table 14.3.1.

To complete the analysis of variance select Data Analysis... from the Tools menu, which opens a window entitled “Data Analysis.” Scroll through the window, select Analysis: Single Factor from the available options and click OK. Place the cursor in the box for the “Input range” and then click and drag over the cells B1:E7. Select the radio button for “Grouped by: columns” and check the box for “Labels in the first row.” In the box for “Alpha” enter 0.05 for $$\alpha$$. Select the radio button for “Output range,” place the cursor in the box and click on an empty cell; this is where Excel will place the results. Clicking OK generates the information shown in Figure $$\PageIndex{2}$$. The small value of $$3.05 \times 10^{-9}$$ for falsely rejecting the null hypothesis indicates that there is a significant source of variation between the analysts.

## R

To complete an analysis of variance for the data in Table 14.3.1 using R, we first need to create several objects. The first object contains each result from Table 14.3.1.

> results = c(94.090, 94.640, 95.008, 94.540, 95.380, 93.620, 99.550, 98.240, 101.100, 100.400, 100.100, 95.140, 94.620, 95.280, 94.590, 94.240, 93.880, 94.230, 96.050, 93.890, 94.950, 95.490)

The second object contains labels that identify the source of each entry in the first object. The following code creates this object.

> analyst = c(rep(“a”,6), rep(“b”,5), rep(“c”,5), rep(“d”,6))

Next, we combine the two objects into a table with two columns, one that contains the data (results) and one that contains the labels (analyst).

> df= data.frame(results, labels= factor(analyst))

The command factor indicates that the object analyst contains the categorical factors for the analysis of variance. The command for an analysis of variance takes the following form

anova(lm(data ~ factors), data = data.frame)

where data and factors are the columns that contain the data and the categorical factors, and data.frame is the name we assigned to the data table. Figure $$\PageIndex{3}$$ shows the resulting output. The small value of $$3.05 \times 10^{-9}$$ for falsely rejecting the null hypothesis indicates that there is a significant source of variation between the analysts.

Having found a significant difference between the analysts, we want to identify the source of this difference. R does not include Fisher’s least significant difference test, but it does include a function for a related method called Tukey’s honest significant difference test. The command for this test takes the following form

> TukeyHSD(aov(lm(data ~ factors), data = data.frame), conf. level = 0.5)

where data and factors are the columns that contain the data and the categorical factors, and data.frame is the name we assigned to the data table. Figure $$\PageIndex{4}$$ shows the output of this command and its interpretation. The small probability values when comparing analyst B to each of the other analysts indicates that this is the source of the significant difference identified in the analysis of variance.