In the story of Gauss's problem of adding up the numbers from 1 to 100, one interpretation of the result, 5,050, is that the average of all the numbers from 1 to 100 is 50.5. This is the ordinary definition of an average: add up all the things you have, and divide by the number of things. (The result in this example makes sense, because half the numbers are from 1 to 50, and half are from 51 to 100, so the average is half-way between 50 and 51.)
Similarly, a definite integral can also be thought of as a kind of average. In general, if y is a function of x, then the average, or mean, value of y on the interval from x=a to b can be defined as
In the continuous case, dividing by b-a accomplishes the same thing as dividing by the number of things in the discrete case.
◊ Show that the definition of the average makes sense in the case where the function is a constant.
◊ If y is a constant, then we can take it outside of the integral, so
Example 7◊ Find the average value of the function y=x2 for values of x ranging from 0 to 1.
The mean value theoremIf the continuous function y(x) has the average value y- on the interval from x=a to b, then y attains its average value at least once in that interval, i.e., there exists ξ with a<ξ<b such that y(ξ)=y-.
The mean value theorem is proved on page 161. The special case in which y-=0 is known as Rolle's theorem.
◊ Verify the mean value theorem for y=x2 on the interval from 0 to 1.
◊ The mean value is 1/3, as shown in example 56. This value is achieved at x=√1/3=1/√3, which lies between 0 and 1.
In physics, work is a measure of the amount of energy transferred by a force; for example, if a horse sets a wagon in motion, the horse's force on the wagon is putting some energy of motion into the wagon. When a force F acts on an object that moves in the direction of the force by an infinitesimal distance dx, the infinitesimal work done is dW=Fdx. Integrating both sides, we have W=\intab Fdx, where the force may depend on x, and a and b represent the initial and final positions of the object.
◊ A spring compressed by an amount x relative to its relaxed length provides a force F=kx. Find the amount of work that must be done in order to compress the spring from x=0 to x=a. (This is the amount of energy stored in the spring, and that energy will later be released into the toy bullet.)
The reason W grows like a2, not just like a, is that as the spring is compressed more, more and more effort is required in order to compress it.
Mathematically, the probability that something will happen can be specified with a number ranging from 0 to 1, with 0 representing impossibility and 1 representing certainty. If you flip a coin, heads and tails both have probabilities of 1/2. The sum of the probabilities of all the possible outcomes has to have probability 1. This is called normalization.
So far we've discussed random processes having only two possible outcomes: yes or no, win or lose, on or off. More generally, a random process could have a result that is a number. Some processes yield integers, as when you roll a die and get a result from one to six, but some are not restricted to whole numbers, e.g., the height of a human being, or the amount of time that a uranium-238 atom will exist before undergoing radioactive decay. The key to handling these continuous random variables is the concept of the area under a curve, i.e., an integral.
Consider a throw of a die. If the die is “honest,” then we expect all six values to be equally likely. Since all six probabilities must add up to 1, then probability of any particular value coming up must be 1/6. We can summarize this in a graph, f. Areas under the curve can be interpreted as total probabilities. For instance, the area under the curve from 1 to 3 is 1/6+1/6+1/6=1/2, so the probability of getting a result from 1 to 3 is 1/2. The function shown on the graph is called the probability distribution.
Figure g shows the probabilities of various results obtained by rolling two dice and adding them together, as in the game of craps. The probabilities are not all the same. There is a small probability of getting a two, for example, because there is only one way to do it, by rolling a one and then another one. The probability of rolling a seven is high because there are six different ways to do it: 1+6, 2+5, etc.
If the number of possible outcomes is large but finite, for example the number of hairs on a dog, the graph would start to look like a smooth curve rather than a ziggurat.
What about probability distributions for random numbers that are not integers? We can no longer make a graph with probability on the y axis, because the probability of getting a given exact number is typically zero. For instance, there is zero probability that a person will be exactly 200 cm tall, since there are infinitely many possible results that are close to 200 but not exactly two, for example 199.99999999687687658766. It doesn't usually make sense, therefore, to talk about the probability of a single numerical result, but it does make sense to talk about the probability of a certain range of results. For instance, the probability that a randomly chosen person will be more than 170 cm and less than 200 cm in height is a perfectly reasonable thing to discuss. We can still summarize the probability information on a graph, and we can still interpret areas under the curve as probabilities.
But the y axis can no longer be a unitless probability scale. In the example of human height, we want the x axis to have units of meters, and we want areas under the curve to be unitless probabilities. The area of a single square on the graph paper is then
If the units are to cancel out, then the height of the square must evidently be a quantity with units of inverse centimeters. In other words, the y axis of the graph is to be interpreted as probability per unit height, not probability.
Another way of looking at it is that the y axis on the graph gives a derivative, dP/dx: the infinitesimally small probability that x will lie in the infinitesimally small range covered by dx.
Example 10◊ A computer language will typically have a built-in subroutine that produces a fairly random number that is equally likely to take on any value in the range from 0 to 1. If you take the absolute value of the difference between two such numbers, the probability distribution is of the form dP/dx=k(1-x). Find the value of the constant k that is required by normalization.
Compare the number of people with heights in the range of 130-135 cm to the number in the range 135-140.(answer in the back of the PDF version of the book)
When one random variable is related to another in some mathematical way, the chain rule can be used to relate their probability distributions.
Example 11◊ A laser is placed one meter away from a wall, and spun on the ground to give it a random direction, but if the angle u shown in figure j doesn't come out in the range from 0 to π/2, the laser is spun again until an angle in the desired range is obtained. Find the probability distribution of the distance x shown in the figure. The derivative dtan-1z/dz=1/(1+z2) will be required (see example 66, page 88).
◊ Since any angle between 0 and π/2 is equally likely, the probability distribution dP/du must be a constant, and normalization tells us that the constant must be dP/du=2/π.
The laser is one meter from the wall, so the distance x, measured in meters, is given by x=tan u. For the probability distribution of x, we have
Note that the range of possible values of x theoretically extends from 0 to infinity. Problem 7 on page 104 deals with this.
If the next Martian you meet asks you, “How tall is an adult human?,” you will probably reply with a statement about the average human height, such as “Oh, about 5 feet 6 inches.” If you wanted to explain a little more, you could say, “But that's only an average. Most people are somewhere between 5 feet and 6 feet tall.” Without bothering to draw the relevant bell curve for your new extraterrestrial acquaintance, you've summarized the relevant information by giving an average and a typical range of variation. The average of a probability distribution can be defined geometrically as the horizontal position at which it could be balanced if it was constructed out of cardboard, i. This is a different way of working with averages than the one we did earlier. Before, had a graph of y versus x, we implicitly assumed that all values of x were equally likely, and we found an average value of y. In this new method using probability distributions, the variable we're averaging is on the x axis, and the y axis tells us the relative probabilities of the various x values.
For a discrete-valued variable with n possible values, the average would be
and in the case of a continuous variable, this becomes an integral,
◊ For the situation described in example 59, find the average value of x.
Sometimes we don't just want to know the average value of a certain variable, we also want to have some idea of the amount of variation above and below the average. The most common way of measuring this is the standard deviation, defined by
The idea here is that if there was no variation at all above or below the average, then the quantity (x-x-) would be zero whenever dP/dx was nonzero, and the standard deviation would be zero. The reason for taking the square root of the whole thing is so that the result will have the same units as x.
◊ For the situation described in example 59, find the standard deviation of x.
◊ The square of the standard deviation is