# Penny Statistics with Data Set

Experiment #2: Penny Statistics1

Chem 331, Analytical Chemistry, Fall 2019

## Introduction

Not all pennies are created equal! In 1982, the way pennies were minted was completely changed. U.S. pennies minted after 1982 have a zinc core with an over-layer of copper, but before 1982 pennies were made of solid brass, with a uniform composition throughout (95 wt% copper and 5 wt% zinc). In 1982, both the brass coins and the copper-coated zinc coins were produced. In addition, pennies are minted at two different mints, in Denver and in Philadelphia.

In this experiment, we will perform statistical analyses of masses of a large number of coins to answer questions such as: Is the distribution of masses described by a Gaussian function? Can we clearly distinguish the copper-coated pennies from the brass ones? Is the mass of a penny consistent from year to year? Do pennies from different mints have the same mass? But ultimately, we’ll learn how to use Excel to manipulate large sets of data and apply statistical analyses!

## Data Analysis

### 1. Sorting Data

Open the data file and go to File: Save As… and save the file with a new name. (That way you will preserve the original list in case you make an irreversible mistake later on!) Then sort the entire list by year and then by mass. To do that, first select all three columns of data (year, mass, and mint). Go to the Data tab, and click on the Sort icon.

### 2. Distribution of Masses

1. Using the sorted list of all coins, find the coins from 1983 and later, and copy and paste the data into a new worksheet. Sort the data by mass, and find the mean and the standard deviation for all coins 1983 and later.
2. Generate a histogram of the coin masses from 1983 and later, using 3-milligram intervals to bin the data. To make a histogram, you’ll need to use the FREQUENCY function in Excel. (Note: the output of this function is an array). To make a histogram, follow these instructions exactly:
1. First, generate the values for your x-axis (the “bins”). Take the mass of lightest coin in the series, and round down to the nearest hundredth of a gram. Type that mass into a cell. Select the cell you just typed in, and under the Home tab, click the Fill icon and select “Series…” from the pull-down menu. In the new window, click the Columns radio button, and enter the desired Step value. For the Stop Value, take the mass of the heaviest coin, and round up to the nearest hundredth of a gram. Click the OK button. This should automatically fill in the column of numbers as desired!
2. Now, create the y-values for the histogram. Select all the cells immediately adjacent to the x-values you just created. Then, in the formula bar, type:

=FREQUENCY(

Now click and drag to select the masses of all coins you want to include in your histogram. Then type a comma, and then select all the x-values you generated above. Then close the parenthesis. And finally, on type “Control-Shift-Enter.”

3. To generate the graph of your histogram, select only the y-values you just calculated using FREQUENCY. From the Insert tab, click the Column icon, and choose “Clustered Column” from the pull-down menu. To finish, you’ll then need to set the x-axis manually. Select the chart, and then under the Chart Tools tab select the Design tab (just called Chart Design tab on a Mac). Then click on the Select Data icon. Under where it says “Horizontal (Category) Axis Labels,” click the “Edit” button (or icon in the text box on a Mac), and click and drag to select all the cells that contain the 3-mg bins you generated in step (i) above.

QUESTION #1: OF HISTOGRAMS AND GAUSSIANS

1. (3 pts) Graph a frequency histogram for the mass all coins from 1983 and later using 3-mg bins for the data. Indicate the two 3-standard-deviation limits on the plot.
2. (2 pts) How many coins are outside of ± 3s for this data? What percent of coins is that? How does that percentage compare to what we would expect for a perfect Gaussian distribution? Does this support or refute the hypothesis that the data fits to a Gaussian? Explain.

Bonus (+2): Using data from years ≥1983, make a new plot of the histogram from part (a). For direct comparison of the penny data and a truly random distribution, on this same plot now add in a perfect Gaussian curve, using the mean and standard deviation from your data. Make sure the x-axes match exactly for the two graphs. How do the shapes match up? Discuss what this plot shows/means.

### 3. Least Squares Analysis

To determine if there is a consistent trend in the mass of the pennies over time, plot the mass of the pennies for 1983 and later (on the y-axis) vs. the year (on the x-axis). If the mass of the pennies increases systematically from year to year, then the least-squares line through the data will have a positive slope; if the mass decreases, the slope will be negative; and if the mass is constant, the slope will be exactly zero. However, even if the mass is in fact constant, your selection of coins will not give a slope of exactly zero. The question is if it will be significantly different than zero given the uncertainty in the slope.

1. Go back to the list of sorted data, and select all data from 1983 and later. Copy and paste it into a new worksheet. In this new worksheet, select all data in the year and mass columns. Generate a “Scatter with only Markers” (called “Scatter” on a Mac) plot under the Insert tab from the pull-down menu on the Scatter icon.
2. Use the LINEST function (see Error summary handout) to determine the best-fit equation of the line through the data you plotted above, as well as the associated error.

QUESTION #2: STATIC OR DYNAMIC?

1. (2 pts) Graph a scatter-plot of masses for all pennies ≥1983.
2. (3 pts) What is the confidence interval for the slope and for the intercept at the 95% confidence level? Is the slope of the line statistically different from zero at the 95% confidence level? What about at the 99% confidence level? What does this mean about the mass of pennies over time? Can we say they are changing? Can we say they are the same? What exactly can we say? Explain your logic. (Hint: What is the “null hypothesis”? Think carefully about this and be very specific!)

### 4. Using the t Test

1. Compare pennies made in 2000 and in 2001, years in which zinc coins were made. Compare the means for each of those two years using the t Test at the 95% confidence level. To do this, copy and paste the data from those two years into a new worksheet in your file. First use the F test, and then calculate the tcalc by the appropriate method (Case A or Case B). Compare tcalc with ttable at the 95% confidence level. Use the T.INV.2T function in Excel to get appropriate value of ttable.

QUESTION #3: THE t TEST FOR 2000 vs 2001

(3 pts) Provide the average mass and its standard deviation for coins from 2000 and from 2001. Report the 95% confidence interval for each. Are the means for each year different at the 95% confidence level? Report critical values of Fcalc, Ftable, tcalc and ttable, and values of spooled and DoFs for Case A or Case B respectively.

1. Now go back to the sorted list of all the data in a single column. Copy and paste this into a new worksheet, and separate the data for coins ≤1981 and coins ≥1983 into two separate columns. Delete the data from 1982.

QUESTION #4: BRASS vs. ZINC

(3 pts) Provide average mass and the standard deviation for all coins ≤1981 and for all coins ≥1983. Report the 95% confidence interval for each. Are the two means different at the 95% confidence level? Are the means different with 99.9999% certainty? For both tests, report critical values of Fcalc, Ftable, tcalc and ttable, and values of spooled and DoFs for Case A or Case B respectively. Just for fun: Can you find a % confidence level where the outcome of the test changes?

1. Copy and paste the data for the year 2000 into a new worksheet, and using the sort function, separate coins minted in Philadelphia from those from Denver. Then find the mean and standard deviation of the masses for coins from each mint. Apply the t Test. Now do the same thing for the 2001 coins, and finally do it for all years after 1982.

QUESTION #5: PHILLY vs. DENVER

1. (2 pts) Provide average mass and the standard deviation the coins from each of the two mints in the year 2000. Report the 95% confidence interval for each. Are the two means different at the 95% confidence level? Report critical values of Fcalc, Ftable, tcalc and ttable, and values of spooled and DoFs for Case A or Case B respectively.
2. (2 pts) Repeat the same calculations for coins from 2001.
3. (2 pts) Repeat the same calculations for all coins for years ≥1983.
4. (3 pts) What do you conclude by comparing answers from a, b, and c? Assuming the manufacturing process was identical for all years ≥1983, why do you get a different result for part c than for a and b?

### 5. Making the Case

How could you prove that in 1982 both types of coins were made? Using the data provided, give a convincing argument based on the evidence to show that the type of coins made in 1981 and before, as well as the type made in 1983 and later, were both made in 1982.

QUESTION #6: ANALYZING 1982

(5 pts) Using the data from 1982, provide a convincing evidence-based argument that the type of coins made in 1981 and before, as well as the type made in 1983 and after, were both manufactured in 1982. Support your claim with evidence: you may use graphical analysis (a revealing plot of the data), or statistical analysis (calculate a pertinent statistical value), or both. Choose the best way to present the data. Be creative! There is no single right answer, but some are more effective and convincing than others!  Be sure you provide an argument—that’s more than just facts!

## The Lab Report

For this lab, you will simply enter all your final answers into a template in an Excel file I have provide for you. Upload that file to Drive using the correct naming convention. Nothing else needs to be turned in.

Every numerical value, figure, or short answer has a designated place for you to enter the answer. Where asked, provide professional quality figures with captions. Any discussion or explanation should be in complete sentences.

For numbers that have units, include those units in your answer. For answers that need to be rounded (anything with an associated error), use the 3-to-30 rule.

For any values that have no associated error (for example, all critical values from the F and t tests), report the completely unrounded values directly from your Excel calculations (that should be 9 decimal places!). How to do this correctly is described in the Excel template file instructions, which should be read carefully!