Skip to main content
Chemistry LibreTexts

Normal Distribution and Statistics

  • Page ID
    279680
  • Classroom Exercise #5: Normal Distribution and Statistics

    There is an Excel spreadsheet and instructions at the end of this document.  You will need to use your computer to do the statistics to complete the questions.

    1. Is my red blood cell count high today?

      The excel sheet gives normal readings of red blood cell counts (all are *106 cells)

      Calculate the mean and standard deviation. 

       

       

       

       

      Today my red blood count is 5.6*106?  How certain (statistically) are you that value is abnormally high?  Use the normal distribution function to figure it out. 

       

       

       

       

    1. Pre and Post test.

      The second Excel sheet has test scores for some students before and after they took a course.  Compute the mean, standard deviation, and 95% confidence interval for each group.  Do the confidence intervals overlap?

       

       

       

      Compute the t-test if the results are not paired.  Is the result statistically different at the 95% confidence level?

       

       

       

      Compute the t-test if the results were paired (i.e. each row represents the same students).  What is the statistical advantage of performing the paired t-test? 

       

       

       

       

       

       

    1. Absorbance dependence on two different ions.
      1. The excel spreadsheet gives UV-Vis absorbance data for absorbance of mixtures of NO3 and S2O4 ions. Using ANOVA, which ions does the absorbance depends on? How did you reach that conclusion? 

         

         

         

         

         

         

      2. Look at the data yourself. Are you surprised by the statistical results?  Does the absorbance seem to be linearly dependent on both variables?

     

     

    1. Look at the following overlapping distributions for patients with and without a disease. The y axis is number of patients. The mean for the patients without disease is 17 units, with a standard deviation of 5.5.  The mean of the patients with disease is 33 units, with a standard deviation of 5.5. 

      overlapping_distributions.png

      Data from: https://library.med.utah.edu/WebPath.../BIOSTATS.html

      1. If point A is used as the cut off for disease, what percentage of healthy patients will have a false positive?

         

         

         

         

         

      2. If point B is used as the cut-off for disease, what percentage of diseased patients will have a false negative?

         

         

         

         

         

         

      3. If you used 25 instead as the cut-off, how many false positives and false negatives would you expect?

         

         

         

         

         

         

         

      4. You could label point A and point B with the labels, point of maximum sensitivity and point of maximum selectivity. Explain which point would have each label. Discuss the trade-offs in choosing a testing point with overlapping distributions. 

         

         

         

         

         

         

         

    Excel Syntax

    CONFIDENCE.NORM(alpha,standard_dev,size)

    The CONFIDENCE.NORM function syntax has the following arguments:

    • Alpha     Required. The significance level used to compute the confidence level. The confidence level equals 100*(1 - alpha)%, or in other words, an alpha of 0.05 indicates a 95 percent confidence level.
    • Standard_dev     Required. The population standard deviation for the data range and is assumed to be known.
    • Size     Required. The sample size.

    NORM.DIST(x,mean,standard_dev,cumulative)

    The NORM.DIST function syntax has the following arguments:

    • X     Required. The value for which you want the distribution.
    • Mean     Required. The arithmetic mean of the distribution.
    • Standard_dev     Required. The standard deviation of the distribution.
    • Cumulative     Required. A logical value that determines the form of the function. If cumulative is TRUE, NORM.DIST returns the cumulative distribution function; if FALSE, it returns the probability mass function.

    Notes: When cumulative is true, it gives the area under the curve from –infinity until that value.

    AVERAGE(number1, [number2], ...)

    STDEV.S(number1,[number2],...) 

    Gives the sample standard deviation

    STDEV.P(number1,[number2],...) 

    Gives the population standard deviation

    T.TEST(array1,array2,tails,type)

    Gives a t test

    • Array 1: First range of values to compare
    • Array 2: Second range of values to compare
    • Tails: Either 1 or 2.  Generally you do 2 tails, unless you know your values cannot vary in one direction (then it is one tail)
    • Type: 1=Paired, 2=Unpaired (note for Paired, the arrays must have the same number of data points)

    For ANOVA:

    Hit data analysis (under data):

    Choose it off the menu.  If you have 1 variable, it is a 1 way ANOVA.  If you have 2 variables, it is a 2-way ANOVA.  If the data are paired (i.e. from the same sample) there is a repeated measures test, if not choose without replication.

    Example steps for ANOVA

    Similar for 2-way without replication, just choose that instead if you have 2 variables.

    Note

    You must have the analysis tool-pak installed.  Go to File, options, and add-ins.  If you click Manage Add-ins at the bottom, you can install the analysis tool-pak.  Then the rest of these instructions will work.

    To perform a single factor ANOVA, execute the following steps.

    1. On the Data tab, click Data Analysis.

    data_tab.png

    Note: can't find the Data Analysis button? Click here to load the Analysis ToolPak add-in.

    1. Select Anova: Single Factor and click OK.

    analysis_tools_list.png

    1. Click in the Input Range box and select the range A2:C10.
    2. Click in the Output Range box and select cell E1.

    anova_single_factor_options.png

    1. Click OK.

    Result:

    anova_result.png

    Conclusion: if F > F crit, we reject the null hypothesis. This is the case, 15.196 > 3.443. Therefore, we reject the null hypothesis. The means of the three populations are not all equal. At least one of the means is different. However, the ANOVA does not tell you where the difference lies. You need other post-hoc comparison tests for that.

    Contributors and Attributions

    • Was this article helpful?