# 4.1: Characterizing Measurements and Results

[ "article:topic", "Author tag:Harvey", "mass", "measurement", "authorname:harveyd", "showtoc:no", "Central Tendency", "mean", "median", "range", "standard deviation", "Variance" ]

Let’s begin by choosing a simple quantitative problem requiring a single measurement—What is the mass of a penny? As you consider this question, you probably recognize that it is too broad. Are we interested in the mass of a United States penny or of a Canadian penny, or is the difference relevant? Because a penny’s composition and size may differ from country to country, let’s limit our problem to pennies from the United States.

There are other concerns we might consider. For example, the United States Mint currently produces pennies at two locations (Figure 4.1). Because it seems unlikely that a penny’s mass depends upon where it is minted, we will ignore this concern. Another concern is whether the mass of a newly minted penny is different from the mass of a circulating penny. Because the answer this time is not obvious, let’s narrow our question to—What is the mass of a circulating United States Penny?

Figure 4.1 An uncirculated 2005 Lincoln head penny. The “D” below the date indicates that this penny was produced at the United States Mint at Denver, Colorado. Pennies produced at the Philadelphia Mint do not have a letter below the date. Source: United States Mint image (www.usmint.gov).

A good way to begin our analysis is to examine some preliminary data. Table 4.1 shows masses for seven pennies from my change jar. In examining this data it is immediately apparent that our question does not have a simple answer. That is, we can not use the mass of a single penny to draw a specific conclusion about the mass of any other penny (although we might conclude that all pennies weigh at least 3 g). We can, however, characterize this data by reporting the spread of individual measurements around a central value.

Table 4.1 Masses of Seven Circulating U. S. Pennies

Penny

Mass (g)
1 3.080
2 3.094
3 3.107
4 3.056
5 3.112
6 3.174
7 3.198

#### 4.1.1 Measures of Central Tendency

One way to characterize the data in Table 4.1 is to assume that the masses are randomly scattered around a central value that provides the best estimate of a penny’s expected, or “true” mass. There are two common ways to estimate central tendency: the mean and the median.

##### Mean

The mean, X, is the numerical average for a data set. We calculate the mean by dividing the sum of the individual values by the size of the data set

$\overline{X}=\frac{\sum_{i}^{ }X_i}{n}$

where Xi is the ith measurement, and n is the size of the data set.

Example 4.1

What is the mean for the data in Table 4.1?

Solution

To calculate the mean we add together the results for all measurements

$\mathrm{3.080 + 3.094 + 3.107 + 3.056 + 3.112 + 3.174 + 3.198 = 21.821\: g}$

and divide by the number of measurements

$\overline{X} = \mathrm{\dfrac{21.821\: g}{7}=3.117\:g}$

The mean is the most common estimator of central tendency. It is not a robust estimator, however, because an extreme value—one much larger or much smaller than the remainder of the data—strongly influences the mean’s value.1 For example, if we mistakenly record the third penny’s mass as 31.07 g instead of 3.107 g, the mean changes from 3.117 g to 7.112 g!

Note

An estimator is robust if its value is not affected too much by an unusually large or unusually small measurement.

##### Median

The median, $$\widetilde{X}$$, is the middle value when we order our data from the smallest to the largest value. When the data set includes an odd number of entries, the median is the middle value. For an even number of entries, the median is the average of the n/2 and the (n/2) + 1 values, where n is the size of the data set.

Note

When n = 5, the median is the third value in the ordered data set; for n = 6, the median is the average of the third and fourth members of the ordered data set.

Example 4.2

What is the median for the data in Table 4.1?

Solution

To determine the median we order the measurements from the smallest to the largest value

3.056   3.080   3.094   3.107   3.112   3.174   3.198

Because there are seven measurements, the median is the fourth value in the ordered data set; thus, the median is 3.107 g.

As shown by Examples 4.1 and 4.2, the mean and the median provide similar estimates of central tendency when all measurements are comparable in magnitude. The median, however, provides a more robust estimate of central tendency because it is less sensitive to measurements with extreme values. For example, introducing the transcription error discussed earlier for the mean changes the median’s value from 3.107 g to 3.112 g.

If the mean or median provides an estimate of a penny’s expected mass, then the spread of individual measurements provides an estimate of the difference in mass among pennies or of the uncertainty in measuring mass with a balance. Although we often define spread relative to a specific measure of central tendency, its magnitude is independent of the central value. Changing all measurements in the same direction, by adding or subtracting a constant value, changes the mean or median, but does not change the spread. (Problem 12 at the end of the chapter asks you to show that this is true.) There are three common measures of spread: the range, the standard deviation, and the variance.

##### Range

The range, w, is the difference between a data set’s largest and smallest values.

$w = X_\ce{largest} - X_\ce{smallest}$

The range provides information about the total variability in the data set, but does not provide any information about the distribution of individual values. The range for the data in Table 4.1 is

$w = \mathrm{3.198\: g - 3.056\: g = 0.142\: g}$

##### Standard Deviation

The standard deviation, s, describes the spread of a data set’s individual values about its mean, and is given as

$s=\sqrt{\frac{\sum_{i}^{ }(X_i-\overline{X})^2}{n-1}} \tag{4.1}$

where Xi is one of n individual values in the data set, and X is the data set’s mean value. Frequently, the relative standard deviation, sr, is reported.

$s_r =\frac{s}{\overline{X}}$

The percent relative standard deviation, %sr, is sr × 100.

Example 4.3

What are the standard deviation, the relative standard deviation and the percent relative standard deviation for the data in Table 4.1?

Solution

To calculate the standard deviation we first calculate the difference between each measurement and the mean value (3.117), square the resulting differences, and add them together to give the numerator of equation 4.1.

\begin{align} (3.080-3.117)^2 = (-0.037)^2 = 0.001369\\ (3.094-3.117)^2 = (-0.023)^2 = 0.000529\\ (3.107-3.117)^2 = (-0.010)^2 = 0.000100\\ (3.056-3.117)^2 = (-0.061)^2 = 0.003721\\ (3.112-3.117)^2 = (-0.005)^2 = 0.000025\\ (3.174-3.117)^2 = (+0.057)^2 = 0.003249\\ (3.198-3.117)^2 = (+0.081)^2 = \underline{0.006561}\\ 0.015554 \end{align}

For obvious reasons, the numerator of equation 4.1 is called a sum of squares. Next, we divide this sum of the squares by n – 1, where n is the number of measurements, and take the square root.

$s = \sqrt{\frac{0.015554}{7-1}}=\mathrm{0.051\:g}$

Finally, the relative standard deviation and percent relative standard deviation are

$\mathrm{\mathit{s}_r= \dfrac{0.051\: g}{3.117\: g} = 0.016 \hspace{20px} \%\mathit{s}_r = (0.016) × 100\% = 1.6\%}$

It is much easier to determine the standard deviation using a scientific calculator with built in statistical functions.

Note

Many scientific calculators include two keys for calculating the standard deviation. One key calculates the standard deviation for a data set of n samples drawn from a larger collection of possible samples, which corresponds to equation 4.1. The other key calculates the standard deviation for all possible samples. The later is known as the population’s standard deviation, which we will cover later in this chapter. Your calculator’s manual will help you determine the appropriate key for each.
##### Variance

Another common measure of spread is the square of the standard deviation, or the variance. We usually report a data set’s standard deviation, rather than its variance, because the mean value and the standard deviation have the same unit. As we will see shortly, the variance is a useful measure of spread because its values are additive.

Example 4.4

What is the variance for the data in Table 4.1?

Solution

The variance is the square of the absolute standard deviation. Using the standard deviation from Example 4.3 gives the variance as

$s^2 = (0.051)^2 = 0.0026$

Practice Exercise 4.1

The following data were collected as part of a quality control study for the analysis of sodium in serum; results are concentrations of Na+ in mmol/L.

140   143   141   137   132   157   143   149   118   145

Report the mean, the median, the range, the standard deviation, and the variance for this data. This data is a portion of a larger data set from Andrew, D. F.; Herzberg, A. M. Data: A Collection of Problems for the Student and Research Worker, Springer-Verlag:New York, 1985, pp. 151–155.