# 3.9: Random Variables, Expected Values, and Population Sets

- Page ID
- 151670

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

When we sample a particular distribution, the value that we obtain depends on chance and on the nature of the distribution described by the function \(f\left(u\right)\). The probability that any given trial will produce \(u\) in the interval \(a<u<b\) is equal to \(f\left(b\right)-f\left(a\right)\). We often find situations in which a second function of \(u\), call it \(g\left(u\right)\), is also of interest. If we sample the distribution and obtain a value of the random variable, \(u_k\), then the value of \(g\) associated with that trial is \(g\left(u_k\right)\). The question arises: Given \(g(u)\) and the distribution function \(f(u)\), what should we expect the value of \(g\left(u_k\right)\) to be? That is, if we get a value of \(u\) from the distribution and then find \(g\left(u\right)\), what value should we expect to find for \(g\left(u\right)\)? While this seems like a reasonable question, it is obvious that we can give a meaningful answer only when we can define more precisely just what we mean by “expect.”

To understand our definition of the *expected *** value** (sometimes called the

*expectation***) of \(g\left(u\right)\), let us consider a game of chance. Suppose that we have a needle that rotates freely on a central axis. When spun, the needle describes a circular path, and its point eventually comes to rest at some point on this path. The location at which the needle stops is completely random. Imagine that we divide the circular path into six equal segments, which we number from one to six. When we spin the needle, it is equally likely to stop over any of these segments. Now, let us suppose that we conduct a lottery by selling six tickets, also numbered from one to six. We decide the winner of the lottery by spinning the needle. The holder of the ticket whose number matches the number on which the needle stops receives a payoff of $6000. After the spin, one ticket is worth $6000, and the other five are valueless. We ask: Before the spin, what is any one of the lottery tickets worth?**

*value*In this context, it is reasonable to define the expected value of a ticket as the amount that we should be willing to pay to buy a ticket. If we buy them all, we receive $6000 when the winning ticket is selected. If we pay $1000 per ticket to buy them all, we get our money back. If we buy all the tickets, the expected value of each ticket is $1000. What if we buy only one ticket? Is it reasonable to continue to say that its expected value is $1000? We argue that it is. One argument is that the expected value of a ticket should not depend on who owns the ticket; so, it should not depend on whether we buy one, two, or all of them. A more general argument supposes that repeated lotteries are held under the same rules. If we spend $1000 to buy one ticket in each of a very large number of such lotteries, we expect that we will eventually “break even.” Since the needle comes to rest at each number with equal probability, we reason that

\[\begin{array}{l} \text{Expected value of a ticket} \\ =\$6000\left(fraction\ of\ times\ our\ ticket\ would\ be\ selected\right) \\ =\$6000\left({1}/{6}\right) \\ =\$1000 \end{array} \nonumber \]

Since we assume that the fraction of times our ticket would be selected in a long series of identical lotteries is the same thing as the probability that our ticket will be selected in any given drawing, we can also express the expected value as

\[ \begin{array}{l} \text{Expected value of a ticket} \\ =\$6000\left(probability\ that\ our\ ticket\ will\ be\ be\ selected\right) \\ =\$6000\left({1}/{6}\right) \\ =\$1000 \end{array} \nonumber \]

Clearly, the ticket is superfluous. The game depends on obtaining a value of a random variable from a distribution. The distribution is a spin of the needle. The random variable is the location at which the needle comes to rest. We can conduct essentially the same game by allowing any number of participants to bet that the needle will come to rest on any of the six equally probable segments of the circle. If an individual repeatedly bets on the same segment in many repetitions of this game, the total of his winnings eventually matches the total amount that he has wagered. (More precisely, the total of his winnings divided by the total amount he has wagered becomes arbitrarily close to one.)

Suppose now that we change the rules. Under the new rules, we designate segment \(1\) of the circle as the payoff segment. Participants pay a fixed sum to be eligible for the payoff for a particular game. Each game is decided by a spin of the needle. If the needle lands in segment \(1\), everyone who paid to participate in that game receives $6000. Evidently, the new rules have no effect on the value of participation. Over the long haul, a participant in a large number of games wins $6000 in one-sixth of these games. We take this to be equivalent to saying that he has a probability of one-sixth of winning $6000 in a given game in which he participates. His expected payoff is

\[ \begin{array}{l} \text{Expected value of game} \\ =\$6000\left(probability\ of\ winning\ \$6000\right) \\ =\$6000\left({1}/{6}\right) \\ =\$1000 \end{array} \nonumber \]

Let us change the game again. We sub-divide segment \(2\) into equal-size segments \(2A\) and \(2B\). The probability that the needle lands in \(2A\) or \(2B\) is \({1}/{12}\). In this new game, the payoff is $6000 when the needle lands in either segment \(1\) or segment \(2A\). We can use any of the arguments that we have made previously to see that the expected payoff game is now \(\$6000\left({1}/{4}\right)=\$1500\). However, the analysis that is most readily generalized recognizes that the payoff from this game is just the sum of the payout from the previous game plus the payout from a game in which the sole payout is $6000 whenever the needle lands in segment \(2A\). For the new game, we have

\[ \begin{array}{l} \text{Expected value of a game} \\ =\$6000\times P\left(segment\ 1\right)+\$6000\times P\left(segment\ 2A\right) \\ =\$6000\left({1}/{6}\right)+\$6000\left({1}/{12}\right) \\ =\$1500 \end{array} \nonumber \]

We can devise any number of new games by dividing the needle’s circular path into \(\mathrm{\textrm{Ω}}\) non-overlapping segments. Each segment is a possible outcome. We number the possible outcomes \(1\), \(2\), …, \(i\), …, Ω, label these outcomes \(u_1\), \(u_2\),…, \(u_i\),…, \(u_{\textrm{Ω}}\), and denote their probabilities as \(P\left(u_1\right)\), \(P\left(u_2\right)\),...,\(P\left(u_i\right)\),…, \(P\left(u_{\textrm{Ω}}\right)\). We say that the probability of outcome \(u_i\), \(P\left(u_i\right)\), is the *expected *** frequency **of outcome \(u_i\). We denote the respective payoffs as \(g\left(u_1\right)\), \(g\left(u_2\right)\),...,\(g\left(u_i\right)\),…, \(g\left(u_{\textrm{Ω}}\right)\). Straightforward generalization of our last analysis shows that the expected value for participation in any game of this type is

\[\sum^{\textrm{Ω}}_{i=1}{g\left(u_i\right)\times P\left(u_i\right)} \nonumber \]

Moreover, the spinner is representative of any distribution, so it is reasonable to generalize further. We can say that the expected value of the outcome of a single trial is always the probability-weighted sum, over all possible outcomes, of the value of each outcome. A common notation uses angular brackets to denote the expected value for a function of the random variable; the expected value of \(g\left(u\right)\) is \(\left\langle g\left(u\right)\right\rangle\). For a discrete distribution with \(\textrm{Ω}\) exhaustive mutually-exclusive outcomes \(u_i\), probabilities \(P\left(u_i\right)\), and outcome values (payoffs) \(g\left(u_i\right)\), we define the *expected *** value **expected value of \(g\left(u\right)\) to be

\[\left\langle g\left(u\right)\right\rangle \ =\sum^{\textrm{Ω}}_{i=1}{g\left(u_i\right)}\times P\left(u_i\right) \nonumber \]

Now, let us examine the expected value of \(g\left(u\right)\) from a slightly different perspective. Let the number of times that each of the various outcomes is observed in a particular sample of \(N\) observations be \(N_1,\ N_2,\dots ,N_3,\dots ,N_{\textrm{Ω}}\). We have \(N=N_1+\ N_2+\dots +N_i+\dots +N_{\textrm{Ω}}\). The set \(\{N_1,\ N_2,\dots ,N_i,\dots ,N_{\textrm{Ω}}\}\) specifies the way that the possible outcomes are populated in this particular series of \(N\) observations. We call \(\{N_1,\ N_2,\dots ,N_i,\dots ,N_{\textrm{Ω}}\}\) a *population *** set**. If we make a second series of

*N*observations, we obtain a second population set. We infer that the best forecast we can make for the number of occurrences of outcome \(u_i\) in any future series of

*N*observations is \(N\times P\left(u_i\right)\). We call \(N\times P\left(u_i\right)\) the

*expected***of observations of outcome \(u_i\) in a sample of size \(N\).**

*number*In a particular series of \(N\) trials, the number of occurrences of outcome \(u_i\), and hence of \(g\left(u_i\right)\), is \(N_i\). For the set of outcomes \(\{N_1,\ N_2,\dots ,N_3,\dots ,N_{\textrm{Ω}}\}\), the average value of \(g\left(u\right)\) is

\[\overline{g\left(u\right)}=\frac{1}{N}\sum^{\textrm{Ω}}_{i=1}{g\left(u_i\right)\times N_i} \nonumber \]

Collecting a second sample of \(N\) observations produces a second estimate of \(\overline{g\left(u\right)}\). If \(N\) is small, successive estimates of \(\overline{g\left(u\right)}\) may differ significantly from one another. If we make a series of \(N\) observations multiple times, we obtain multiple population sets. In general, the population set from one series of \(N\) observations is different from the population set for a second series of \(N\) observations. If \(N\gg \mathit{\Omega}\), collecting such samples of \(N\) a sufficiently large number of times must produce some population sets more than once, and among those that are observed more than once, one must occur more often than any other. We call it the ** most probable** population set. Let the elements of the most probable population set be \(\{N_1,N_2,\dots ,N_i,\dots ,N_{\textrm{Ω}}\}\). We infer that the most probable population set is the best forecast we can make about the outcomes of any future sample of \(N\) from this distribution. Moreover, we infer that the best estimate we can make of \(N_i\) is that it equals the expected number of observations of outcome \(u_i\); that is,

\[N_i\approx N\times P\left(u_i\right) \nonumber \]

Now, \(N_i\) and \(N_i\) must be natural numbers, while \(N\times P\left(u_i\right)\) need only be real. In particular, we can have \(0, but \(N_i\) must be \(0\) or \(1\) (or some higher integer). This is a situation of practical importance, because circumstances may limit the sample size to a number, \(N\), that is much less than the number of possible outcomes, \(\mathit{\Omega}\). (We encounter this situation in our discussion of statistical thermodynamics in Chapter 21. We find that the number of molecules in a system can be much smaller than the number of outcomes—observable energy levels—available to any given molecule.)

If many more than \(N\) outcomes have about the same probability, repeated collection of samples of \(N\) observations can produce a series of population sets (each population set different from all of the others) in each of which every element is either zero or one. When this occurs, it may be that no single population set is significantly more probable than any of many others. Nevertheless, every outcome occurs with a well-defined probability. We infer that the set \(\left\{N\times P\left(u_1\right),N\times P\left(u_2\right),\dots ,N\times P\left(u_i\right),\dots ,N\times P\left(u_{\textrm{Ω}}\right)\right\}\) is always an adequate proxy for calculating the expected value for the most probable population set.

To illustrate this kind of distribution, suppose that there are \(3000\) possible outcomes, of which the first and last thousand have probabilities that are so low that they can be taken as zero, while the middle \(1000\) outcomes have approximately equal probabilities. Then \(P\left(u_i\right)\approx 0\) for \(1*<\)>*1000 and 2001\(*<\)>*3000, while \(P\left(u_i\right)\approx {10}^{-3}\) for \(1001<2000\)>. We are illustrating the situation in which the number of outcomes we can observe, \(N\), is much less than the number of outcomes that have appreciable probability, which is \(1000\). So let us take the number of trials to be \(N=4\). If the value of \(g\left(u\right)\) for each of the \(1000\) middle outcomes is the same, say \(g\left(u_i\right)=100\) for \(1001*<2000\)>*, then our calculation of the expected value of \(g\left(u\right)\) will be

\[\left\langle g\left(u\right)\right\rangle \ =\frac{1}{4}\sum^{3000}_{i=1}{g\left(u_i\right)\times N}\times P\left(u_i\right)=\frac{1}{4}\sum^{2000}_{i=1001}{100\times N_i}=\frac{400}{4}=100 \nonumber \]

regardless of which population set results from the four trials. That is, because all of the populations sets that have a significant chance to be observed have \(N_i=1\) and \(g\left(u_i\right)=100\) for exactly four values of \(i\) in the range \(1001*<2011\)>*, all of the population sets that have a significant chance to be observed give rise to the same expected value.

Let us compute the arithmetic average, \(\overline{g\left(u\right)}\), using the most probable population set for a sample of *N* trials. In this case, the number of observations of the outcome \(u_i\) is \(N_i=N\times P\left(u_i\right).\)

\(\overline{g\left(u\right)}\mathrm{\ }\ =\frac{1}{N}\sum^{\textrm{Ω}}_{i=1}{g\left(u_i\right)}\times N_i\)\(=\frac{1}{N}\sum^{\textrm{Ω}}_{i=1}{g\left(u_i\right)}\times N\times P\left(u_i\right)\)\(=\sum^{\textrm{Ω}}_{i=1}{g\left(u_i\right)\times P\left(u_i\right)}\)\(=\ \ \left\langle g\left(u\right)\right\rangle\)

For a discrete distribution, \(\left\langle g\left(u\right)\right\rangle\) is the value of \(\overline{g\left(u\right)}\) that we calculate from the most probable population set, \(\left\{N_1,N_2,\dots ,N_i,\dots ,N_{\textrm{Ω}}\right\}\), or its proxy \(\left\{N\times P\left(u_1\right),N\times P\left(u_2\right),\dots ,N\times P\left(u_i\right),\dots ,N\times P\left(u_{\textrm{Ω}}\right)\right\}\).

We can extend the definition of the expected value, \(\left\langle g\left(u\right)\right\rangle\), to cases in which the cumulative probability distribution function, \(f\left(u\right)\), and the outcome-value function, \(g\left(u\right)\), are continuous in the domain of the random variable, \(u_{min}__<u_{max}\)>__. To do so, we divide this domain into a finite number, \(\mathit{\Omega}\), of intervals, \(\Delta u_i\). We let \(u_i\) be the lower limit of \(u\) in the interval \(\Delta u_i\). Then the probability that a given trial yields a value of the random variable in the interval \(\Delta u_i\) is \(P\left({\Delta u}_i\right)=f\left(u_i+\Delta u_i\right)-f\left(u_i\right)\), and we can approximate the expected value of \(g\left(u\right)\) for the continuous distribution by the finite sum

\[\left\langle g\left(u\right)\right\rangle \ =\sum^{\textrm{Ω}}_{i=1}{g\left(u_i\right)\times P\left(\Delta u_i\right)}=\sum^{\textrm{Ω}}_{i=1}{g\left(u_i\right)}\times \left[f\left(u_i+\Delta u_i\right)-f\left(u_i\right)\right]=\sum^{\textrm{Ω}}_{i=1}{g\left(u_i\right)\times \left[\frac{f\left(u_i+\Delta u_i\right)-f\left(u_i\right)}{\Delta u_i}\right]}\times \Delta u_i \nonumber \]

In the limit as \(\mathit{\Omega}\) becomes arbitrarily large and all of the intervals \(\Delta u_i\) become arbitrarily small, the expected value of \(g\left(u\right)\) for a continuous distribution becomes

\[\left\langle g\left(u\right)\right\rangle \ =\int^{\infty }_{-\infty }{g\left(u\right)\left[\frac{df\left(u\right)}{du}\right]du} \nonumber \]

This integral is the value of \(\left\langle g\left(u\right)\right\rangle\), where \({df\left(u\right)}/{du}\) is the probability density function for the distribution. If *c* is a constant, we have

\[\left\langle g\left(cu\right)\right\rangle =c\left\langle g\left(u\right)\right\rangle \nonumber \]

If \(h\left(u\right)\) is a second function of the random variable, we have

\[\left\langle g\left(u\right)+h\left(u\right)\right\rangle =\left\langle g\left(u\right)\right\rangle +\left\langle h\left(u\right)\right\rangle \nonumber \]