# Central Limit Theorem

The Central Limit Theorem states that given a population and many samples containing n elements from that population, the distribution of the averages of those samples is approximately normal when n approaches infinity. It is extremely useful because it tells us that if n is large enough, distributions can be treated like normal (Gaussian) distributions. We may not know much about any given distribution but we do know a great deal about Gaussians. This knowledge can be applied to any distribution: even if it is not normal, the distribution of the sample means is approximately normal, when the sample size n is sufficiently large.

### Distributions

In statistics, sample distributions of the sample means can take on many different forms. Especially when using statistics for scientific data, it is difficult for one to make their sample distribution of the mean of their data match up with a certain shape. Distributions can be shaped like Gaussian functions, Lorentzian functions, linear functions, parabolic functions, hyperbolic functions, and many other types of functions. The problem posed to many statisticians and scientists is that many of these sample distributions of the mean shapes are difficult to use to obtain any useful information. One type of sample distribution that is easy to obtain data from is a Gaussian. The Central Limit Theorem is remarkable because if the sample size is large, the sampling distribution of any shape distribution will be approximately normal. Moreover, by the Central Limit Theorem, we can assume the resulting distribution of the sample means follows a Gaussian distribution. This is the fundamental advantage to the Central Limit Theorem.

### Statements of the Central Limit Theorem

Let \(X\) represent random data points from a population. Suppose the population has mean m and standard deviation \(\sigma\). Let \(\bar{X}\) represent the sample mean for any random sample of size \(n\). Then, \(\bar{X}\) has the following properties:

\[\mu_{\bar{X}} = \mu \tag{1}\]

\[\sigma_{\bar{X}} = \dfrac{\sigma}{\sqrt{n}} \tag{2}\]

The distribution of \(\bar{X}\) will be approximately normal when the sample size n is sufficiently large. (3) That is:

- The mean of the distribution of the sample means is equal to the mean of the sampled population.
- The standard deviation of the distribution of sample means (often called the standard error of the mean) is equal to the standard deviation of the sampled population, divided by the square root of the sample size.
- As the sample size approaches infinity, the distribution of the sample means approaches a normal distribution.

### Analysis

A reasonable question is how high does n has to be to provide a "normal enough" distribution of the means. As you may surmise, this question is subjective. If more data are available, the samples should be as large as possible. Results from various textbook and online sources suggest that when n is greater than or equal to 30, the sampling distribution is normal to a first approximation. However, as with any limit, true normality is not reached until n is infinity. The result is that the larger n can be, the better the results will be.

You’re probably asking why it’s so valuable to have a distribution that is approximately normal. This is, after all, the main advantage to the Central Limit Theorem. Approximating a normal distribution is incredibly useful because normal distributions are Gaussian functions. If the sample size is large enough, greater than 30 for instance, the distribution created can be treated like a Gaussian function.

One property of Gaussian functions that makes them particularly easy to work with is that they are symmetric about the vertical axis. This means that the area under the curve on the left is the same as that on the right. Let us motivate how this is useful to us: Let’s say we wish to calculate the probability of some region of our distribution that is approximately normal. The area under the entire Gaussian is 1. Then, the area to the right of the vertical axis is 0.5. If you calculate the area of the region you don’t need the probability for, this can be subtracted from 0.5. Therefore, by knowing which region of the Gaussian we want to calculate the probability for, we get the probability by determining the area under that portion of the Gaussian.

### Using the Central Limit Theorem

The values of a distribution of \(\bar{X}\) will be on the horizontal axis of the distribution. If we’re dealing with Gaussian functions, it is useful if the horizontal axis is calibrated in terms of z values rather than units of \(\bar{X}\). This allows us to standardize any Gaussian function regardless of the \(\bar{X}\) range which would be vastly different for different sets of data. The transformed Gaussian is centered at zero, and the area under the curve is 1.

We have a way to calculate a z-value for any value of \(\bar{X}\) . If we know the population average m and the standard deviation of all the sample means \(\sigma_{\bar{X}}\), we can easily calculate the z value using the equation \(z = \dfrac{\bar{X}-\mu_{\bar{X}}}{\sigma_\bar{X}}\).

Tables of z-values can easily be found. The following table of z values was obtained from statsoft.com and can also be found as an outside link:

0.00 | 0.01 | 0.02 | 0.03 | 0.04 | 0.05 | 0.06 | 0.07 | 0.08 | 0.09 | |
---|---|---|---|---|---|---|---|---|---|---|

0.0 | 0.0000 | 0.0040 | 0.0080 | 0.0120 | 0.0160 | 0.0199 | 0.0239 | 0.0279 | 0.0319 | 0.0359 |

0.1 | 0.0398 | 0.0438 | 0.0478 | 0.0517 | 0.0557 | 0.0596 | 0.0636 | 0.0675 | 0.0714 | 0.0753 |

0.2 | 0.0793 | 0.0832 | 0.0871 | 0.0910 | 0.0948 | 0.0987 | 0.1026 | 0.1064 | 0.1103 | 0.1141 |

0.3 | 0.1179 | 0.1217 | 0.1255 | 0.1293 | 0.1331 | 0.1368 | 0.1406 | 0.1443 | 0.1480 | 0.1517 |

0.4 | 0.1554 | 0.1591 | 0.1628 | 0.1664 | 0.1700 | 0.1736 | 0.1772 | 0.1808 | 0.1844 | 0.1879 |

0.5 | 0.1915 | 0.1950 | 0.1985 | 0.2019 | 0.2054 | 0.2088 | 0.2123 | 0.2157 | 0.2190 | 0.2224 |

0.6 | 0.2257 | 0.2291 | 0.2324 | 0.2357 | 0.2389 | 0.2422 | 0.2454 | 0.2486 | 0.2517 | 0.2549 |

0.7 | 0.2580 | 0.2611 | 0.2642 | 0.2673 | 0.2704 | 0.2734 | 0.2764 | 0.2794 | 0.2823 | 0.2852 |

0.8 | 0.2881 | 0.2910 | 0.2939 | 0.2967 | 0.2995 | 0.3023 | 0.3051 | 0.3078 | 0.3106 | 0.3133 |

0.9 | 0.3159 | 0.3186 | 0.3212 | 0.3238 | 0.3264 | 0.3289 | 0.3315 | 0.3340 | 0.3365 | 0.3389 |

1.0 | 0.3413 | 0.3438 | 0.3461 | 0.3485 | 0.3508 | 0.3531 | 0.3554 | 0.3577 | 0.3599 | 0.3621 |

1.1 | 0.3643 | 0.3665 | 0.3686 | 0.3708 | 0.3729 | 0.3749 | 0.3770 | 0.3790 | 0.3810 | 0.3830 |

1.2 | 0.3849 | 0.3869 | 0.3888 | 0.3907 | 0.3925 | 0.3944 | 0.3962 | 0.3980 | 0.3997 | 0.4015 |

1.3 | 0.4032 | 0.4049 | 0.4066 | 0.4082 | 0.4099 | 0.4115 | 0.4131 | 0.4147 | 0.4162 | 0.4177 |

1.4 | 0.4192 | 0.4207 | 0.4222 | 0.4236 | 0.4251 | 0.4265 | 0.4279 | 0.4292 | 0.4306 | 0.4319 |

1.5 | 0.4332 | 0.4345 | 0.4357 | 0.4370 | 0.4382 | 0.4394 | 0.4406 | 0.4418 | 0.4429 | 0.4441 |

1.6 | 0.4452 | 0.4463 | 0.4474 | 0.4484 | 0.4495 | 0.4505 | 0.4515 | 0.4525 | 0.4535 | 0.4545 |

1.7 | 0.4554 | 0.4564 | 0.4573 | 0.4582 | 0.4591 | 0.4599 | 0.4608 | 0.4616 | 0.4625 | 0.4633 |

1.8 | 0.4641 | 0.4649 | 0.4656 | 0.4664 | 0.4671 | 0.4678 | 0.4686 | 0.4693 | 0.4699 | 0.4706 |

1.9 | 0.4713 | 0.4719 | 0.4726 | 0.4732 | 0.4738 | 0.4744 | 0.4750 | 0.4756 | 0.4761 | 0.4767 |

2.0 | 0.4772 | 0.4778 | 0.4783 | 0.4788 | 0.4793 | 0.4798 | 0.4803 | 0.4808 | 0.4812 | 0.4817 |

2.1 | 0.4821 | 0.4826 | 0.4830 | 0.4834 | 0.4838 | 0.4842 | 0.4846 | 0.4850 | 0.4854 | 0.4857 |

2.2 | 0.4861 | 0.4864 | 0.4868 | 0.4871 | 0.4875 | 0.4878 | 0.4881 | 0.4884 | 0.4887 | 0.4890 |

2.3 | 0.4893 | 0.4896 | 0.4898 | 0.4901 | 0.4904 | 0.4906 | 0.4909 | 0.4911 | 0.4913 | 0.4916 |

2.4 | 0.4918 | 0.4920 | 0.4922 | 0.4925 | 0.4927 | 0.4929 | 0.4931 | 0.4932 | 0.4934 | 0.4936 |

2.5 | 0.4938 | 0.4940 | 0.4941 | 0.4943 | 0.4945 | 0.4946 | 0.4948 | 0.4949 | 0.4951 | 0.4952 |

2.6 | 0.4953 | 0.4955 | 0.4956 | 0.4957 | 0.4959 | 0.4960 | 0.4961 | 0.4962 | 0.4963 | 0.4964 |

2.7 | 0.4965 | 0.4966 | 0.4967 | 0.4968 | 0.4969 | 0.4970 | 0.4971 | 0.4972 | 0.4973 | 0.4974 |

2.8 | 0.4974 | 0.4975 | 0.4976 | 0.4977 | 0.4977 | 0.4978 | 0.4979 | 0.4979 | 0.4980 | 0.4981 |

2.9 | 0.4981 | 0.4982 | 0.4982 | 0.4983 | 0.4984 | 0.4984 | 0.4985 | 0.4985 | 0.4986 | 0.4986 |

3.0 | 0.4987 | 0.4987 | 0.4987 | 0.4988 | 0.4988 | 0.4989 | 0.4989 | 0.4989 | 0.4990 | 0.4990 |

### References

- Miller, J. C.
*Statistics for Analytical Chemistry*, Third Edition; Ellis Horwood: Chichester, England, 1993; 41, 142. - Hamilton, L. C.
*Modern Data Analysis: A First Course in Applied Statistics*; Wadsworth: Belmont, CA, 1990; 228-231, 241-242, 313. - Wasserman, L. W.
*All of Statistics: A Concise Course in Statistical Inference*; Springer-Verlag: New York, NY, 2004; 77-79. - Chase, W.; Bown, F.
*General Statistics*, Third Edition; John Wiley & Sons: New York, NY, 1997; 300-310. - McClave, J. T.; Sincich, T.
*A First Course in Statistics*, Fifth Edition; Prentice Hall: Englewood Cliffs, NJ, 1995; 248, 250.

### Outside Links

- Central Limit Theorem Tutorial (Wadsworth)
- Central Limit Theorem Applet (University of Athens)
- Central Limit Theorem Applet (University of South Carolina)
- Central Limit Theorem (Wikipedia)
- Sampling Distribution (Wikipedia)
- Table of z values (Statsoft.com)
- http://www.wadsworth.com/psychology_..._therm_01.html

### Problems

### Solutions

### Contributors

- Henry Wedler (University of California, Davis)