We also need to introduce the idea that a function that successfully models the results of past experiments can be used to predict some of the characteristics of future results.

We reason as follows: We have results from drawing many samples of a random variable from some distribution. We suppose that a mathematical representation has been found that adequately summarizes the results of these experiences. If the underlying distribution—the physical system in scientific applications—remains the same, we expect that a long series of future results would give rise to essentially the same mathematical representation. If 25% of many previous results have had a particular characteristic, we expect that 25% of a large number of future trials will have the same characteristic. We also say that there is one chance in four that the next individual result will have this characteristic; when we say this, we mean that 25% of a large number of future trials will have this characteristic, and the next trial has as good a chance as any other to be among those that do. *The probability that an outcome will occur in the future is equal to the frequency with which that outcome has occurred in the past.*

Given a distribution, the possible outcomes must be mutually exclusive; in any given trial, the random variable can have only one of its possible values. Consequently, a discrete distribution is completely described when the probability of each of its outcomes is specified. Many distributions are comprised of a finite set of *N* mutually exclusive possible outcomes. If each of these outcomes is equally likely, the probability that we will observe any particular outcome in the next trial is \(1/N\).

We often find it convenient to group the set of possible outcomes into subsets in such a way that each outcome is in one and only one of the subsets. We say that such assignments of outcomes to subsets are *exhaustive*, because every possible outcome is assigned to some subset; we say that such assignments are *mutually exclusive*, because no outcome belongs to more than one subset. We call each such subset an *event*. When we partition the possible outcomes into exhaustive and mutually exclusive events, we can say the same things about the probabilities of events that we can say about the probabilities of outcomes. In our discussions, the term “events” will always refer to an exhaustive and mutually exclusive partitioning of the possible outcomes. Distinguishing between outcomes and events just gives us some language conventions that enable us to create alternative groupings of the same set of real world observations.

Suppose that we define a particular event to be a subset of outcomes that we denote as *U*. If in a large number of trials, the fraction of outcomes that belong to this subset is *F*, we say that the probability is *F* that the outcome of the next trial will belong to this event. To express this in more mathematical notation, we write \(P\left(U\right)=F\). When we do so, we mean that the fraction of a large number of future trials that belong to this subset will be *F*, and the next trial has as good a chance as any other to be among those that do. In a sample comprising *M* observations, the best forecast we can make of the number of occurrences of *U* is \(M\times P(U)\), and we call this the *expected number of occurrences* of *U* in a sample of size *M*.

The idea of grouping real world observations into either outcomes or events is easy to remember if we keep in mind the example of tossing a die. The die has six faces, which are labeled with 1, 2, 3, 4, 5, or 6 dots. The dots distinguish one face from another. On any given toss, one face of the die must land on top. Therefore, there are six possible outcomes. Since each face has as good a chance as any other of landing on top, the six possible outcomes are equally probable. The probability of any given outcome is \({1}/{6}\). If we ask about the probability that the next toss will result in one of the even-numbered faces landing on top, we are asking about the probability of an event—the event that the next toss will have the characteristic that an even-numbered face lands on top. Let us call this event \(X\). That is, event \(X\) occurs if the outcome is a 2, a 4, or a 6. These are three of the six equally likely outcomes. Evidently, the probability of this event is \({3}/{6}={1}/{2}\).

Having defined event \(X\) as the probability of an even-number outcome, we still have several alternative ways to assign the odd-number outcomes to events. One assignment would be to say that all of the odd-number outcomes belong to a second event—the event that the outcome is odd. The events “even outcome” and “odd outcome” are exhaustive and mutually exclusive. We could create another set of events by assigning the outcomes 1 and 3 to event \(Y\), and the outcome 5 to event \(Z\). Events \(X\), \(Y\), and \(Z\) are also exhaustive and mutually exclusive.

We have a great deal of latitude in the way we assign the possible outcomes to events. If it suits our purposes, we can create many different exhaustive and mutually exclusive partitionings of the outcomes of a given distribution. We require that each partitioning of outcomes into events be exhaustive and mutually exclusive, because we want to apply the laws of probability to events.