# Statistical Treatment of Data

Many times during the course of the Chemistry 105 laboratory you will be asked to report an average, relative deviation, and a standard deviation. You may also have to analyze multiple trials to decide whether or not a certain piece of data should be discarded. This section describes these procedures.

### Average and Standard Deviation

The **average **or **mean **of the data set, \(\bar{x}\), is defined by:

\(\bar{x} = \dfrac{\sum_{i=1}^N x_i}{N}\)

where x_{i} is the result of the i^{th} measurement, i = 1,…,N. The standard deviation, σ, measures how closely values are clustered about the mean. The standard deviation for small samples is defined by:

\( \sigma = \sqrt{\dfrac{\sum_{i=1}^N (x_i-\bar{x})^2}{N}} \)

The smaller the value of σ, the more closely packed the data are about the mean, and we say that the measurements are **precise**. In contrast, a high** accuracy **of the measurements occurs if the mean is close to the real result (presuming we know that information). It is easy to tell if your measurements are precise, but it is often difficult to tell if they are accurate.

### Relative Deviation

The relative average deviation, d, like the standard deviation, is useful to determine how data are clustered about a mean. The advantage of a relative deviation is that it incorporates the relative numerical magnitude of the average. The relative average deviation, d, is calculated in the following way.

- Calculate the average, \(\bar{x}\), with all data that are of high quality.
- Calculate the deviation, d=|x
_{i}- \(\bar{x}\)|, of each datum. - Calculate the average of these deviations.
- Divide that average of the deviations by the mean of the data. This number is generally expressed as parts per thousand (ppt). You can do this by simply multiplying by 1000.

Report the relative average deviation (ppt) in addition to the standard deviation in all experiments.

### Analysis of Poor Data

Sometimes a single piece of data is inconsistent with other data. You need a method to determine, or test, if the datum in question is so poor that it should be excluded from your calculations. Many tests have been developed for this purpose. One of the most common is what is known as the Q test (section 4-3). While it is very popular, it is not particularly useful for the small samples you will have (you will generally only do triplicate trials). Instead you will use what is commonly known as the 4d test. To use this test you need to follow the procedure outlined below.

- Calculate the average, \(\bar{x}\).
- Calculate the deviation, d.
- Calculate the average of these deviations.
- Calculate the deviation of the "suspect" datum from the mean you calculated above, d
_{s}. If this deviation is greater than 4 times the average deviation, then you should discard this datum.( If d_{s}>4d, then discard.)

Keep in mind that you also always have the right to discard a piece of data that you are sure is of low quality: that is, when you are aware of a poor collection. However, beware of discarding data that do not meet the 4d test. You may be discarding your most accurate determination!

### Other important concepts and procedures

Associated topics you should be familiar with from your Chem 105 class:

- Normal error curve: Histogram of an infinitely large number of good measurements usually follow a Gaussian distribution
- Confidence limit (95%)
- Linear least squares fit
- Residual sum of squares
- Correlation coefficient