3.8: Using Excel and R to Analyze Data

Last updated
Save as PDF

Page ID: 219795

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

Although the calculations in this chapter are relatively straightforward, it can be tedious to work problems using nothing more than a calculator. Both Excel and R include functions for many common statistical calculations. In addition, R provides useful functions for visualizing your data.

Excel

Excel has built-in functions that we can use to complete many of the statistical calculations covered in this chapter, including reporting descriptive statistics, such as means and variances, predicting the probability of obtaining a given outcome from a binomial distribution or a normal distribution, and carrying out significance tests. Table 4.8.1 provides the syntax for many of these functions; you can information on functions not included here by using Excel’s Help menu.

Table 4.8.1 : Excel Functions for Statistics Calculations
Parameter	Excel Function
Descriptive Statistics
mean	= average(data)
median	= median(data)
standard deviation for sample	= stdev.s(data)
standard deviation for populations	= stdev.p(data)
variance for sample	= var.s(data)
variance for population	= var.p(data)
maximum value	= max(data)
minimum value	= min(data)

Probability Distributions
binomial distribution	= binom.dist(X, N, p, TRUE or FALSE)
normal distribution	= norm.dist(x, \(\mu\) \(\sigma\), TRUE or FALSE)

Significance Tests
F-test	= f.test(data set 1, data set 2)
t-test	= t.test(data set 1, data set 2, tails = 1 or 2, type of t-test: 1 = paired; 2 = unpaired with equal variances; or 3 = unpaired with unequal variances)

Descriptive Statistics

Let’s use Excel to provide a statistical summary of the data in Table 4.1.1. Enter the data into a spreadsheet, as shown in Figure 4.8.1 . To calculate the sample’s mean, for example, click on any empty cell, enter the formula

= average(b2:b8)

and press Return or Enter to replace the cell’s content with Excel’s calculation of the mean (3.117285714), which we round to 3.117. Excel does not have a function for the range, but we can use the functions that report the maximum value and the minimum value to calculate the range; thus

= max(b2:b8) – min(b2:b8)

returns 0.142 as an answer.

In order of B2 to B8, the data in the table is 3.080, 3.094, 3.107, 3.056, 3.112, 3.174, and 3.198. — Figure 4.8.1 : Portion of a spreadsheet containing data from Table 4.1.1.

Probability Distributions

In Example 4.4.2 we showed that 91.10% of a manufacturer’s analgesic tablets contained between 243 and 262 mg of aspirin. We arrived at this result by calculating the deviation, z, of each limit from the population’s expected mean, \(\mu\), of 250 mg in terms of the population’s expected standard deviation, \(\sigma\), of 5 mg. After we calculated values for z, we used the table in Appendix 3 to find the area under the normal distribution curve between these two limits.

We can complete this calculation in Excel using the norm.dist function As shown in Figure 4.8.2 , the function calculates the probability of obtaining a result less than x from a normal distribution with a mean of \(\mu\) and a standard deviation of \(\sigma\). To solve Example 4.4.2 using Excel enter the following formulas into separate cells

= norm.dist(243, 250, 5, TRUE)

= norm.dist(262, 250, 5, TRUE)

obtaining results of 0.080756659 and 0.991802464. Subtracting the smaller value from the larger value and adjusting to the correct number of significant figures gives the probability as 0.9910, or 99.10%.

Figure 4.8.2 : Shown in blue is the area returned by the function norm.dist(x, \(\mu\) \(\sigma\), TRUE). The last parameter—TRUE—returns the cumulative distribution from \(- \infty \text{ to } x\); entering FALSE gives the probability of obtaining a result greater than x. For our purposes, we want to use TRUE.

Excel also includes a function for working with binomial distributions. The function’s syntax is

= binom.dist(X, N, p, TRUE or FALSE)

where X is the number of times a particular outcome occurs in N trials, and p is the probability that X occurs in a single trial. Setting the function’s last term to TRUE gives the total probability for any result up to X and setting it to FALSE gives the probability for X. Using Example 4.4.1 to test this function, we use the formula

= binom.dist(0, 27, 0.0111, FALSE)

to find the probability of finding no atoms of ¹³C atoms in a molecule of cholesterol, C₂₇H₄₄O, which returns a value of 0.740 after adjusting for significant figures. Using the formula

= binom.dist(2, 27, 0.0111, TRUE)

we find that 99.7% of cholesterol molecules contain two or fewer atoms of ¹³C.

Significance Tests

As shown in Table 4.8.1 , Excel includes functions for the following significance tests covered in this chapter:

an F-test of variances
an unpaired t-test of sample means assuming equal variances
an unpaired t-test of sample means assuming unequal variances
a paired t-test for of sample means

Let’s use these functions to complete a t-test on the data in Table 4.4.1, which contains results for two experiments to determine the mass of a circulating U. S. penny. Enter the data from Table 4.4.1 into a spreadsheet as shown in Figure 4.8.3 .

In order of C2 to C6, the data reads 3.052, 3.141, 3.083, 3.083, and 3.048. — Figure 4.8.3 : Portion of a spreadsheet containing the data in Table 4.4.1.

Because the data in this case are unpaired, we will use Excel to complete an unpaired t-test. Before we can complete the t-test, we use an F-test to determine whether the variances for the two data sets are equal or unequal.

To complete the F-test, we click on any empty cell, enter the formula

= f.test(b2:b8, c2:c6)

and press Return or Enter, which replaces the cell’s content with the value of \(\alpha\) for which we can reject the null hypothesis of equal variances. In this case, Excel returns an \(\alpha\) of 0.566 105 03; because this value is not less than 0.05, we retain the null hypothesis that the variances are equal. Excel’s F-test is two-tailed; for a one-tailed F-test, we use the same function, but divide the result by two; thus

= f.test(b2:b8, c2:c6)/2

Having found no evidence to suggest unequal variances, we next complete an unpaired t-test assuming equal variances, entering into any empty cell the formula

= t.test(b2:b8, c2:c6, 2, 2)

where the first 2 indicates that this is a two-tailed t-test, and the second 2 indicates that this is an unpaired t-test with equal variances. Pressing Return or Enter replaces the cell’s content with the value of \(\alpha\) for which we can reject the null hypothesis of equal means. In this case, Excel returns an \(\alpha\) of 0.211 627 646; because this value is not less than 0.05, we retain the null hypothesis that the means are equal.

See Example 4.6.3 and Example 4.6.4 for our earlier solutions to this problem.

The other significance tests in Excel work in the same format. The following practice exercise provides you with an opportunity to test yourself.

Exercise 4.8.1

Rework Example 4.6.5 and Example 4.6.6 using Excel.

Answer: You will find small differences between the values you obtain using Excel’s built in functions and the worked solutions in the chapter. These differences arise because Excel does not round off the results of intermediate calculations.

R

R is a programming environment that provides powerful capabilities for analyzing data. There are many functions built into R’s standard installation and additional packages of functions are available from the R web site (www.r-project.org). Commands in R are not available from pull down menus. Instead, you interact with R by typing in commands.

You can download the current version of R from www.r-project.org. Click on the link for Download: CRAN and find a local mirror site. Click on the link for the mirror site and then use the link for Linux, MacOS X, or Windows under the heading “Download and Install R.”