7.1: The Importance of Sampling

When a manufacturer lists a chemical as ACS Reagent Grade, they must demonstrate that it conforms to specifications set by the American Chemical Society (ACS). For example, the ACS specifications for NaBr require that the concentration of iron be ≤5 ppm. To verify that a production lot meets this standard, the manufacturer collects and analyzes several samples, reporting the average result on the product’s label (Figure 7.1).

Figure 7.1 Certificate of analysis for a production lot of NaBr. The result for iron meets the ACS specifications, but the result for potassium does not.

If the individual samples do not accurately represent the population from which they are drawn—what we call the target population—then even a careful analysis must yield an inaccurate result. Extrapolating this result from a sample to its target population introduces a determinate sampling error. To minimize this determinate sampling error, we must collect the right sample.

Even if we collect the right sample, indeterminate sampling errors may limit the usefulness of our analysis. Equation 7.1 shows that a confidence interval about the mean, X, is proportional to the standard deviation, s, of the analysis

$\mu=\bar X\pm \dfrac{ts}{\sqrt n}\tag{7.1}$

where n is the number of samples and t is a statistical factor that accounts for the probability that the confidence interval contains the true value, μ.

Note

Equation 7.1 should be familiar to you. See Chapter 4 to review confidence intervals and see Appendix 4 for values of t.

Each step of an analysis contributes random error that affects the overall standard deviation. For convenience, let’s divide an analysis into two steps—collecting the samples and analyzing the samples—each characterized by a standard deviation. Using a propagation of uncertainty, the relationship between the overall variance, s2, and the variances due to sampling, ssamp2, and the analytical method, smeth2, is

$s^2=s_\textrm{samp}^2+s_\textrm{meth}^2\tag{7.2}$

Note

For a review of the propagation of uncertainty, see Chapter 4C and Appendix 2.

Although equation 7.1 is written in terms of a standard deviation, s, a propagation of uncertainty is written in terms of variances, s2. In this section, and those that follow, we will use both standard deviations and variances to discuss sampling uncertainty.

Equation 7.2 shows that the overall variance for an analysis may be limited by either the analytical method or the collecting of samples. Unfortunately, analysts often try to minimize the overall variance by improving only the method’s precision. This is a futile effort, however, if the standard deviation for sampling is more than three times greater than that for the method.1 Figure 7.2 shows how the ratio ssamp/ smeth affects the method’s contribution to the overall variance. As shown by the dashed line, if the sample’s standard deviation is 3× the method’s standard deviation, then indeterminate method errors explain only 10% of the overall variance. If indeterminate sampling errors are significant, decreasing smeth provides only a nominal change in the overall precision.

Figure 7.2 The blue curve shows the method’s contribution to the overall variance, s2, as a function of the relative magnitude of the standard deviation in sampling, ssamp, and the method’s standard deviation, smeth. The dashed red line shows that the method accounts for only 10% of the overall variance when ssamp = 3 × smeth. Understanding the relative importance of potential sources of indeterminate error is important when considering how to improve the overall precision of the analysis.

Example 7.1

A quantitative analysis gives a mean concentration of 12.6 ppm for an analyte. The method’s standard deviation is 1.1 ppm and the standard deviation for sampling is 2.1 ppm. (a) What is the overall variance for the analysis? (b) By how much does the overall variance change if we improve smeth by 10% to 0.99 ppm? (c) By how much does the overall variance change if we improve ssamp by 10% to 1.89 ppm?

Solution

(a) The overall variance is

$s^2=s_\textrm{samp}^2+s_\textrm{meth}^2=\mathrm{(2.1\;ppm)^2+(1.1\;ppm)^2=5.6\;ppm^2}$

(b) Improving the method’s standard deviation changes the overall variance to

$s^2=\mathrm{(2.1\;ppm)^2+(0.99\;ppm)^2=5.4\;ppm^2}$

Improving the method’s standard deviation by 10% improves the overall variance by approximately 4%.

(c) Changing the standard deviation for sampling

$s^2=\mathrm{(1.9\;ppm)^2+(1.1\;ppm)^2=4.8\;ppm^2}$

improves the overall variance by almost 15%. As expected, because ssamp is larger than smeth, we obtain a bigger improvement in the overall variance when we focus our attention on sampling problems.

Practice Exercise 7.1

Suppose you wish to reduce the overall variance in Example 7.1 to 5.0 ppm2. If you focus on the method, by what percentage do you need to reduce smeth? If you focus on the sampling, by what percentage do you need to reduce ssamp?

To determine which step has the greatest effect on the overall variance, we need to measure both ssamp and smeth. The analysis of replicate samples provides an estimate of the overall variance. To determine the method’s variance we analyze samples under conditions where we may assume that the sampling variance is negligible. The sampling variance is determined by difference.

Note

There are several ways to minimize the standard deviation for sampling. Here are two examples. One approach is to use a standard reference material (SRM) that has been carefully prepared to minimize indeterminate sampling errors. When the sample is homogeneous—as is the case, for example, with aqueous samples—a useful approach is to conduct replicate analyses on a single sample.

Example 7.2

The following data were collected as part of a study to determine the effect of sampling variance on the analysis of drug-animal feed formulations.2

% Drug (w/w)                                 % Drug (w/w)

0.0114      0.0099      0.0105           0.0105      0.0109      0.0107

0.0102      0.0106      0.0087           0.0103      0.0103      0.0104

0.0100      0.0095      0.0098           0.0101      0.0101      0.0103

0.0105      0.0095      0.0097

The data on the left were obtained under conditions where both ssamp and smeth contribute to the overall variance. The data on the right were obtained under conditions where ssamp is known to be insignificant. Determine the overall variance, and the standard deviations due to sampling and the analytical method. To which factor—sampling or the method—should you turn your attention if you want to improve the precision of the analysis?

Solution

Using the data on the left, the overall variance, s2, is 4.71 × 10–7. (See Chapter 4 for a review of how to calculate the variance.) To find the method’s contribution to the overall variance, smeth2, we use the data on the right, obtaining a value of 7.00 × 10–8. The variance due to sampling, ssamp2, is

$s_\textrm{samp}^2=s^2-s_\textrm{meth}^2=4.71\times10^{-7}-7.00\times10^{-8}=4.01\times10^{-7}$

Converting variances to standard deviations gives ssamp as 6.32 × 10–4 and smeth as 2.65 × 10–4. Because ssamp is more than twice as large as smeth, improving the precision of the sampling process has the greatest impact on the overall precision.

Practice Exercise 7.2

A polymer’s density provides a measure of its crystallinity. The standard deviation for the determination of density using a single sample of a polymer is 1.96 × 10–3 g/cm3. The standard deviation when using different samples of the polymer is 3.65 × 10–2 g/cm3. Determine the standard deviations due to sampling and the analytical method.