Normal Distribution and Statistics
 Page ID
 279680
Classroom Exercise #5: Normal Distribution and Statistics
There is an Excel spreadsheet and instructions at the end of this document. You will need to use your computer to do the statistics to complete the questions.
 Is my red blood cell count high today?
The excel sheet gives normal readings of red blood cell counts (all are *10^{6 }cells)
Calculate the mean and standard deviation.
Today my red blood count is 5.6*10^{6}? How certain (statistically) are you that value is abnormally high? Use the normal distribution function to figure it out.
 Pre and Post test.
The second Excel sheet has test scores for some students before and after they took a course. Compute the mean, standard deviation, and 95% confidence interval for each group. Do the confidence intervals overlap?
Compute the ttest if the results are not paired. Is the result statistically different at the 95% confidence level?
Compute the ttest if the results were paired (i.e. each row represents the same students). What is the statistical advantage of performing the paired ttest?
 Absorbance dependence on two different ions.
 The excel spreadsheet gives UVVis absorbance data for absorbance of mixtures of NO_{3} and S_{2}O_{4} ions. Using ANOVA, which ions does the absorbance depends on? How did you reach that conclusion?
 Look at the data yourself. Are you surprised by the statistical results? Does the absorbance seem to be linearly dependent on both variables?
 The excel spreadsheet gives UVVis absorbance data for absorbance of mixtures of NO_{3} and S_{2}O_{4} ions. Using ANOVA, which ions does the absorbance depends on? How did you reach that conclusion?
 Look at the following overlapping distributions for patients with and without a disease. The y axis is number of patients. The mean for the patients without disease is 17 units, with a standard deviation of 5.5. The mean of the patients with disease is 33 units, with a standard deviation of 5.5.
Data from: https://library.med.utah.edu/WebPath.../BIOSTATS.html

If point A is used as the cut off for disease, what percentage of healthy patients will have a false positive?
 If point B is used as the cutoff for disease, what percentage of diseased patients will have a false negative?
 If you used 25 instead as the cutoff, how many false positives and false negatives would you expect?
 You could label point A and point B with the labels, point of maximum sensitivity and point of maximum selectivity. Explain which point would have each label. Discuss the tradeoffs in choosing a testing point with overlapping distributions.

Excel Syntax
CONFIDENCE.NORM(alpha,standard_dev,size)
The CONFIDENCE.NORM function syntax has the following arguments:
 Alpha Required. The significance level used to compute the confidence level. The confidence level equals 100*(1  alpha)%, or in other words, an alpha of 0.05 indicates a 95 percent confidence level.
 Standard_dev Required. The population standard deviation for the data range and is assumed to be known.
 Size Required. The sample size.
NORM.DIST(x,mean,standard_dev,cumulative)
The NORM.DIST function syntax has the following arguments:
 X Required. The value for which you want the distribution.
 Mean Required. The arithmetic mean of the distribution.
 Standard_dev Required. The standard deviation of the distribution.
 Cumulative Required. A logical value that determines the form of the function. If cumulative is TRUE, NORM.DIST returns the cumulative distribution function; if FALSE, it returns the probability mass function.
Notes: When cumulative is true, it gives the area under the curve from –infinity until that value.
AVERAGE(number1, [number2], ...)
STDEV.S(number1,[number2],...)
Gives the sample standard deviation
STDEV.P(number1,[number2],...)
Gives the population standard deviation
T.TEST(array1,array2,tails,type)
Gives a t test
 Array 1: First range of values to compare
 Array 2: Second range of values to compare
 Tails: Either 1 or 2. Generally you do 2 tails, unless you know your values cannot vary in one direction (then it is one tail)
 Type: 1=Paired, 2=Unpaired (note for Paired, the arrays must have the same number of data points)
For ANOVA:
Hit data analysis (under data):
Choose it off the menu. If you have 1 variable, it is a 1 way ANOVA. If you have 2 variables, it is a 2way ANOVA. If the data are paired (i.e. from the same sample) there is a repeated measures test, if not choose without replication.
Example steps for ANOVA
Similar for 2way without replication, just choose that instead if you have 2 variables.
Note
You must have the analysis toolpak installed. Go to File, options, and addins. If you click Manage Addins at the bottom, you can install the analysis toolpak. Then the rest of these instructions will work.
To perform a single factor ANOVA, execute the following steps.
 On the Data tab, click Data Analysis.
Note: can't find the Data Analysis button? Click here to load the Analysis ToolPak addin.
 Select Anova: Single Factor and click OK.
 Click in the Input Range box and select the range A2:C10.
 Click in the Output Range box and select cell E1.
 Click OK.
Result:
Conclusion: if F > F crit, we reject the null hypothesis. This is the case, 15.196 > 3.443. Therefore, we reject the null hypothesis. The means of the three populations are not all equal. At least one of the means is different. However, the ANOVA does not tell you where the difference lies. You need other posthoc comparison tests for that.
Data
Contributors and Attributions
 Jill Venton, University of Virginia (bjv2n@eservices.virginia.edu)