Statistics: Excel Exercise
- Page ID
- 279713
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)*Note for Students:
You may complete this assignment in collaboration with a partner or small group. However, each student must submit INDIVIDUAL work. Students will submit this handout with written answers and will EMAIL their OWN EXCEL file to the instructor.
Please download the Excel file “PennyDataSet_Stats_StudentData.”
This data set represents the masses of pennies collected by students at Hobart and William Smith College under the direction of Dr. Elana Stennett. The pennies massed represent average coins in circulation during the year 2016.
This Excel file has 7 sheets (or tabs) in it:
- “Original Data”
- This is the unaltered data set as it was sent to King’s College.
- Data for each penny includes the year of production, the mass (in grams), and the mint where it was produced. Pennies minted in Denver have a “D” beneath the year printed on the penny; these coins are noted as Mint “D” in the data set. Pennies minted in Philadelphia do not have a mark or letter printed beneath the year on the penny and are noted as Mint “P” in the data set. A subtotal of pennies per year of production are shown in the “n” row.
- “Column Data”
- This data includes all of the data from “Original Data,” however it has been sorted into columns “Mass (g),” “Year,” and “Mint.”
- Labeling EACH penny with appropriate data (including mass, year, and mint) and organizing the data into rows makes this data set much easier to sort and analyze.
- “#1 - #5 (with description)”
- All other sheets or tabs will be used to analyze the data set and solve each of the problems below.
Solve the problems below by working with a partner or small group. Remember, EACH student must submit INDIVIDUAL work, including submitting written responses to these questions AND email their own Excel file showing their work, and thought process.
Please remember to include UNITS where appropriate. Make sure to LABEL your graphs, including an appropriate TITLE and AXES. LABEL CELLS next to your calculated values – a random number will mean nothing to you the next time you open this spreadsheet to review your work. However, a cell with a formula that is clearly labeled will be very useful for you next time you need to do statistical analysis (perhaps for lab reports!). If you are going through the work to set up a spreadsheet for analysis, make it useful for the future and re-use it!
1. Histogram Year
- Copy the “Column Data” data into the “1. Histogram Year” sheet.
- Create a histogram showing the distribution of pennies massed by year, with “Year” on the x-axis as a categorical variable, and “Number of Pennies” massed on the y-axis as a numerical variable.
- Give the graph an appropriate title and label all axes. Make sure the font is at least size 11 (easily read-able)
- The y-axis should have a minimum of 0, maximum of 70, with major and minor tickmarks on the outside. The axis should be black (not grey, it is hard to see).
- The graph should NOT have gridlines and should have a think black border to help it stand out.
- Make the bars on the graph grey in color, with a thin (0.5 pt) black outline.
Helpful Hints:
- Use the “Insert” “Chart” “Histogram” option (the icon with all BLUE bars). Note: Do not confuse a histogram with a bar chart!
- Right click on the chart and use the “Select Data” option: to ensure “Year” is on the x-axis as a categorical variable, and “Number of Pennies” massed on the y-axis as a numerical variable.
- Right click on the x-axis and select “Format Axis.” The “Bins” are currently dictating how wide your bars are. The default is “Automatic,” but that’s usually not the best option. Try changing the bins to “By Category,” then changing the bins width, number of bins, etc. What happens?
- You can also right click on the data series itself and select “Format Data Series” to change the appearance of data.
- “Chart Tools,” “Design” tab will help you add axes labels, titles, change the color and format of your histogram.
Questions:
1a. Is the distribution of penny years Gaussian? Why or why not?
1b. Do expect all pennies in circulation to have a Gaussian distribution when sorted by year? Why or why not?
2. Histogram Mass
- Copy the “Mass,” “Year,” and “Mint” data only from “Column Data” data into the “2. Histogram Mass” sheet.
- Using “Sort and Filter” tool and “Custom Sort”, sort the data from low-to-high mass, followed by old-to-recent year, and A-Z mint.
- Create a histogram showing the distribution of penny masses, with “Mass” on the x-axis as a categorical variable, and “Number of Pennies” massed on the y-axis as a numerical variable.
- Make sure the graph is labeled appropriately and that the data is clear and easy to read.
Use the “Format Axis” and “Axis Options” for the x-axis to change the bins and bin widths while answering the questions below:
Questions:
2a. Using an “Automatic” bin setting, how many bins of data are presented? What is the bin width?
2b. Set the number of bins to “20,” and bin width to “0.04.” What happens?
2c. Set the number of bins to “41,” and bin width to “0.02.” What happens?
2d. Set the bin width to “0.01.” What happens? How many bins are there?
2e. Using the histogram with a bin width of 0.01g, does the mass of pennies as a whole follow a Gaussian distribution? Why or why not?
2f. What may cause the separation between low-mass pennies and high-mass pennies? (Cite any sources you use to answer this question!) Does your data support this hypothesis?
2g. Do the groups of low-mass pennies and high-mass pennies exhibit a Gaussian distribution? Why or why not? Is this what you would expect?
3. Outliers
- Copy data only from years 1975 and 1983 from “Column Data” into “3. Outliers”.
Questions:
3a. Is there an outlier in the 1975 data set? Support your answer.
3b. Is there an outlier in the 1983 data set? Support your answer.
3c. Is there a reasonable explanation for why the two data points questioned (one in 1975, one in 1983) appear different from the other masses in that given year?
3d. What should be done to the data set as a whole before performing additional analyses?
4. Pre-Post 1983
- Copy the “Column Data” data into the “4. Pre-Post 1983” sheet.
Questions:
4a. Is the mass of a penny produced from 1960-1982 different from the mass of a penny produced from 1983 to the present? Support your answer.
5. “Known” Mass
- Copy data only from years 1983 to 2015 from “Column Data” into “5. Known Mass”.
Questions:
5a. According to the United States Mint1, the mass of a penny is 2.5000g. Do the masses of pennies in this data set agree with the known? Support your answer.
References
- United States Mint. Coin Specifications. https://www.usmint.gov/learn/coin-an...specifications. (accessed September 2, 2018).
Contributors and Attributions
- Dr. Elise M. Heiss, Kings College (EliseHeiss@kings.edu)
- Data set from Dr. Elana Stennett, Assistant Professor of Chemsitry, Hobart and William Smith Colleges, Geneva, NY
- Sourced from the Analytical Sciences Digital Library