Chapter 3. Conditional Probability
The probability P(A) of an event A is a measure of the likelihood that the event will occur on any trial. Sometimes partial information determines that an event C has occurred. Given this information, it may be necessary to reassign the likelihood for each event A. This leads to the notion of conditional probability. For a fixed conditioning event C, this assignment to all events constitutes a new probability measure which has all the properties of the original probability measure. In addition, because of the way it is derived from the original, the conditional probability measure has a number of special properties which are important in applications.
The original or prior probability measure utilizes all available information to make probability assignments , etc., subject to the defining conditions (P1), (P2), and (P3). The probability P(A) indicates the likelihood that event A will occur on any trial.
Frequently, new information is received which leads to a reassessment of the likelihood of event A. For example
An applicant for a job as a manager of a service department is being interviewed. His résumé shows adequate experience and other qualifications. He conducts himself with ease and is quite articulate in his interview. He is considered a prospect highly likely to succeed. The interview is followed by an extensive background check. His credit rating, because of bad debts, is found to be quite low. With this information, the likelihood that he is a satisfactory candidate changes radically.
A young woman is seeking to purchase a used car. She finds one that appears to be an excellent buy. It looks “clean,” has reasonable mileage, and is a dependable model of a well known make. Before buying, she has a mechanic friend look at it. He finds evidence that the car has been wrecked with possible frame damage that has been repaired. The likelihood the car will be satisfactory is thus reduced considerably.
A physician is conducting a routine physical examination on a patient in her seventies. She is somewhat overweight. He suspects that she may be prone to heart problems. Then he discovers that she exercises regularly, eats a low fat, high fiber, variagated diet, and comes from a family in which survival well into their nineties is common. On the basis of this new information, he reassesses the likelihood of heart problems.
New, but partial, information determines a conditioning event C , which may call for reassessing the likelihood of event A. For one thing, this means that A occurs iff the event AC occurs. Effectively, this makes C a new basic space. The new unit of probability mass is P(C). How should the new probability assignments be made? One possibility is to make the new assignment to A proportional to the probability P(AC). These considerations and experience with the classical case suggests the following procedure for reassignment. Although such a reassignment is not logically necessary, subsequent developments give substantial evidence that this is the appropriate procedure.
Definition. If C is an event having positive probability, the conditional probability of A, given C is
For a fixed conditioning event C, we have a new likelihood assignment to the event A. Now
Thus, the new function satisfies the three defining properties (P1), (P2), and (P3) for probability, so that for fixed C, we have a new probability measure, with all the properties of an ordinary probability measure.
Remark. When we write P(A|C) we are evaluating the likelihood of event A when it is known that event C has occurred. This is not the probability of a conditional event A|C. Conditional events have no meaning in the model we are developing.
A survey of student opinion on a proposed national health care program included 250 students, of whom 150 were undergraduates and 100 were graduate students. Their responses were categorized Y (affirmative), N (negative), and D (uncertain or no opinion). Results are tabulated below.
Suppose the sample is representative, so the results can be taken as typical of the student body. A student is picked at random. Let Y be the event he or she is favorable to the plan, N be the event he or she is unfavorable, and D is the event of no opinion (or uncertain). Let U be the event the student is an undergraduate and G be the event he or she is a graduate student. The data may reasonably be interpreted
Similarly, we can calculate
We may also calculate directly
Conditional probability often provides a natural way to deal with compound trials carried out in several steps.
An aircraft has two jet engines. It will fly with only one engine operating. Let F1 be the event one engine fails on a long distance flight, and F2 the event the second fails. Experience indicates that . Once the first engine fails, added load is placed on the second, so that . Now the second engine can fail only if the other has already failed. Thus F2⊂F1 so that
Thus reliability of any one engine may be less than satisfactory, yet the overall reliability may be quite high.
The following example is taken from the UMAP Module 576, by Paul Mullenix, reprinted in UMAP Journal, vol 2, no. 4. More extensive treatment of the problem is given there.
In a survey, if answering “yes” to a question may tend to incriminate or otherwise embarrass the subject, the response given may be incorrect or misleading. Nonetheless, it may be desirable to obtain correct responses for purposes of social analysis. The following device for dealing with this problem is attributed to B. G. Greenberg. By a chance process, each subject is instructed to do one of three things:
Respond with an honest answer to the question.
Respond “yes” to the question, regardless of the truth in the matter.
Respond “no” regardless of the true answer.
Let A be the event the subject is told to reply honestly, B be the event the subject is instructed to reply “yes,” and C be the event the answer is to be “no.” The probabilities P(A), P(B), and P(C) are determined by a chance mechanism (i.e., a fraction P(A) selected randomly are told to answer honestly, etc.). Let E be the event the reply is “yes.” We wish to calculate P(E|A), the probability the answer is “yes” given the response is honest.
Since E=EA⋁B, we have
which may be solved algebraically to give
Suppose there are 250 subjects. The chance mechanism is such that P(A)=0.7, P(B)=0.14 and P(C)=0.16. There are 62 responses “yes,” which we take to mean P(E)=62/250. According to the pattern above
The formulation of conditional probability assumes the conditioning event C is well defined. Sometimes there are subtle difficulties. It may not be entirely clear from the problem description what the conditioning event is. This is usually due to some ambiguity or misunderstanding of the information provided.
Five equally qualified candidates for a job, Jim, Paul, Richard, Barry, and Evan, are identified on the basis of interviews and told that they are finalists. Three of these are to be selected at random, with results to be posted the next day. One of them, Jim, has a friend in the personnel office. Jim asks the friend to tell him the name of one of those selected (other than himself). The friend tells Jim that Richard has been selected. Jim analyzes the problem as follows.
Let Ai,1≤i≤5 be the event the ith of these is hired (A1 is the event Jim is hired, A3 is the event Richard is hired, etc.). Now (for each i) is the probability that finalist i is in one of the combinations of three from five. Thus, Jim's probability of being hired, before receiving the information about Richard, is
The information that Richard is one of those hired is information that the event A3 has occurred. Also, for any pair i≠j the number of combinations of three from five including these two is just the number of ways of picking one from the remaining three. Hence,
The conditional probability
This is consistent with the fact that if Jim knows that Richard is hired, then there are two to be selected from the four remaining finalists, so that
Although this solution seems straightforward, it has been challenged as being incomplete. Many feel that there must be information about how the friend chose to name Richard. Many would make an assumption somewhat as follows. The friend took the three names selected: if Jim was one of them, Jim's name was removed and an equally likely choice among the other two was made; otherwise, the friend selected on an equally likely basis one of the three to be hired. Under this assumption, the information assumed is an event B3 which is not the same as A3. In fact, computation (see Example 5, below) shows
Both results are mathematically correct. The difference is in the conditioning event, which corresponds to the difference in the information given (or assumed).
In addition to its properties as a probability measure, conditional probability has special properties which are consequences of the way it is related to the original probability measure P(⋅). The following are easily derived from the definition of conditional probability and basic properties of the prior probability measure, and prove useful in a variety of problem situations.
(CP1) Product rule If P(ABCD)>0, then P(ABCD)=P(A)P(B|A)P(C|AB)P(D|ABC).
The defining expression may be written in product form: P(AB)=P(A)P(B|A). Likewise
This pattern may be extended to the intersection of any finite number of events. Also, the events may be taken in any order.
An electronics store has ten items of a given type in stock. One is defective. Four successive customers purchase one of the items. Each time, the selection is on an equally likely basis from those remaining. What is the probability that all four customes get good items?
Let Ei be the event the ith customer receives a good item. Then the first chooses one of the nine out of ten good ones, the second chooses one of the eight out of nine goood ones, etc., so that
Note that this result could be determined by a combinatorial argument: under the assumptions, each combination of four of ten is equally likely; the number of combinations of four good ones is the number of combinations of four of the nine. Hence
Three items are to be selected (on an equally likely basis at each step) from ten, two of which are defective. Determine the probability that the first and third selected are good.
Let Gi,1≤i≤3 be the event the ith unit selected is good. Then G1G3=G1G2G3⋁G1G2cG3. By the product rule
(CP2) Law of total probability Suppose the class of events is mutually exclusive and every outcome in E is in one of these events. Thus, E=A1E⋁A2E⋁⋯⋁AnE, a disjoint union. Then
Five cards are numbered one through five. A two-step selection procedure is carried out as follows.
Three cards are selected without replacement, on an equally likely basis.
If card 1 is drawn, the other two are put in a box
If card 1 is not drawn, all three are put in a box
One of cards in the box is drawn on an equally likely basis (from either two or three)
Let Ai be the event the ith card is drawn on the first selection and let Bi be the event the card numbered i is drawn on the second selection (from the box). Determine , , and .
From Example 3.4, we have and . This implies
Now we can draw card five on the second selection only if it is selected on the first drawing, so that B5⊂A5. Also A5=A1A5⋁A1cA5. We therefore have B5=B5A5=B5A1A5⋁B5A1cA5. By the law of total probability (CP2),
Also, since A1B5=A1A5B5,
We thus have
Occurrence of event B1 has no affect on the likelihood of the occurrence of A1. This condition is examined more thoroughly in the chapter on "Independence of Events".
Often in applications data lead to conditioning with respect to an event but the problem calls for “conditioning in the opposite direction.”
Students in a freshman mathematics class come from three different high schools. Their mathematical preparation varies. In order to group them appropriately in class sections, they are given a diagnostic test. Let Hi be the event that a student tested is from high school i, 1≤i≤3. Let F be the event the student fails the test. Suppose data indicate
A student passes the exam. Determine for each i the conditional probability that the student is from high school i.
The basic pattern utilized in the reversal is the following.
(CP3) Bayes' rule If (as in the law of total probability), then
Such reversals are desirable in a variety of practical situations.
Begin with items in two lots:
Three items, one defective.
Four items, one defective.
One item is selected from lot 1 (on an equally likely basis); this item is added to lot 2; a selection is then made from lot 2 (also on an equally likely basis). This second item is good. What is the probability the item selected from lot 1 was good?
Let G1 be the event the first item (from lot 1) was good, and G2 be the event the second item (from the augmented lot 2) is good. We want to determine . Now the data are interpreted as
By the law of total probability (CP2),
By Bayes' rule (CP3),
Medical tests. Suppose D is the event a patient has a certain disease and T is the event a test for the disease is positive. Data are usually of the form: prior probability P(D) (or prior odds ), probability of a false positive, and probability of a false negative. The desired probabilities are P(D|T) and .
Safety alarm. If D is the event a dangerous condition exists (say a steam pressure is too high) and T is the event the safety alarm operates, then data are usually of the form P(D), , and , or equivalently (e.g., and P(T|D)). Again, the desired probabilities are that the safety alarms signals correctly, P(D|T) and .
Job success. If H is the event of success on a job, and E is the event that an individual interviewed has certain desirable characteristics, the data are usually prior P(H) and reliability of the characteristics as predictors in the form P(E|H) and . The desired probability is P(H|E).
Presence of oil. If H is the event of the presence of oil at a proposed well site, and E is the event of certain geological structure (salt dome or fault), the data are usually P(H) (or the odds), P(E|H), and . The desired probability is P(H|E).
Market condition. Before launching a new product on the national market, a firm usually examines the condition of a test market as an indicator of the national market. If H is the event the national market is favorable and E is the event the test market is favorable, data are a prior estimate P(H) of the likelihood the national market is sound, and data P(E|H) and indicating the reliability of the test market. What is desired is P(H|E), the likelihood the national market is favorable, given the test market is favorable.
The calculations, as in Example 3.8, are simple but can be tedious. We have an m-procedure called bayes to perform the calculations easily. The probabilities are put into a matrix PA and the conditional probabilities are put into matrix PEA. The desired probabilities and are calculated and displayed
>> PEA = [0.10 0.02 0.06]; >> PA = [0.2 0.5 0.3]; >> bayes Requires input PEA = [P(E|A1) P(E|A2) ... P(E|An)] and PA = [P(A1) P(A2) ... P(An)] Determines PAE = [P(A1|E) P(A2|E) ... P(An|E)] and PAEc = [P(A1|Ec) P(A2|Ec) ... P(An|Ec)] Enter matrix PEA of conditional probabilities PEA Enter matrix PA of probabilities PA P(E) = 0.048 P(E|Ai) P(Ai) P(Ai|E) P(Ai|Ec) 0.1000 0.2000 0.4167 0.1891 0.0200 0.5000 0.2083 0.5147 0.0600 0.3000 0.3750 0.2962 Various quantities are in the matrices PEA, PA, PAE, PAEc, named above
The procedure displays the results in tabular form, as shown. In addition, the various quantities are in the workspace in the matrices named, so that they may be used in further calculations without recopying.
The following variation of Bayes' rule is applicable in many practical situations.
(CP3*) Ratio form of Bayes' rule
The left hand member is called the posterior odds, which is the odds after knowledge of the occurrence of the conditioning event. The second fraction in the right hand member is the prior odds, which is the odds before knowledge of the occurrence of the conditioning event C. The first fraction in the right hand member is known as the likelihood ratio. It is the ratio of the probabilities (or likelihoods) of C for the two different probability measures and .
As a part of a routine maintenance procedure, a computer is given a performance test. The machine seems to be operating so well that the prior odds it is satisfactory are taken to be ten to one. The test has probability 0.05 of a false positive and 0.01 of a false negative. A test is performed. The result is positive. What are the posterior odds the device is operating properly?
Let S be the event the computer is operating satisfactorily and let T be the event the test is favorable. The data are , , and . Then by the ratio form of Bayes' rule
The following property serves to establish in the chapters on "Independence of Events" and "Conditional Independence" a number of important properties for the concept of independence and of conditional independence of events.
(CP4) Some equivalent conditions If 0<P(A)<1 and 0<P(B)<1, then
where *is<,≤,=,≥, or> and ⋄is>,≥,=,≤,or<, respectively.
Because of the role of this property in the theory of independence and conditional independence, we examine the derivation of these results.
VERIFICATION of (CP4)
P(AB)*P(A)P(B) iff P(A|B)*P(A) (divide by P(B) — may exchange A and Ac)
P(AB)*P(A)P(B) iff P(B|A)*P(B) (divide by P(A) — may exchange B and Bc)
P(AB)*P(A)P(B) iff iff iff
We may use c to get P(AB)*P(A)P(B) iff iff
A number of important and useful propositons may be derived from these.
, but, in general, .
P(A|B)>P(A) iff .
P(A|B)>P(A) iff .
VERIFICATION — Exercises (see problem set)
Suppose conditioning by the event C has occurred. Additional information is then received that event D has occurred. We have a new conditioning event CD. There are two possibilities:
Reassign the conditional probabilities. PC(A) becomes(3.37)
Reassign the total probabilities: P(A) becomes(3.38)
Basic result: PC(A|D)=P(A|CD)=PD(A|C). Thus repeated conditioning by two events may be done in any order, or may be done in one step. This result extends easily to repeated conditioning by any finite number of events. This result is important in extending the concept of "Independence of Events" to "Conditional Independence". These conditions are important for many problems of probable inference.
Given the following data:
Determine, if possible, the conditional probability .
% file npr03_01.m % Data for Exercise 1. minvec3 DV = [A|Ac; A; A&B; B&C; Ac|(B&C); Ac&B&Cc]; DP = [ 1 0.55 0.30 0.20 0.55 0.15 ]; TV = [Ac&B; B]; disp('Call for mincalc') npr03_01 Variables are A, B, C, Ac, Bc, Cc They may be renamed, if desired. Call for mincalc mincalc Data vectors are linearly independent Computable target probabilities 1.0000 0.2500 2.0000 0.5500 The number of minterms is 8 The number of available minterms is 4 - - - - - - - - - - - - P = 0.25/0.55 P = 0.4545
In Exercise 11 from "Problems on Minterm Analysis," we have the following data: A survey of a represenative group of students yields the following information:
52 percent are male
85 percent live on campus
78 percent are male or are active in intramural sports (or both)
30 percent live on campus but are not active in sports
32 percent are male, live on campus, and are active in sports
8 percent are male and live off campus
17 percent are male students inactive in sports
Let A = male, B = on campus, C = active in sports.
(a) A student is selected at random. He is male and lives on campus. What is the (conditional) probability that he is active in sports?
(b) A student selected is active in sports. What is the(conditional) probability that she is a female who lives on campus?
npr02_11 - - - - - - - - - - - - mincalc - - - - - - - - - - - - mincalct Enter matrix of target Boolean combinations [A&B&C; A&B; Ac&B&C; C] Computable target probabilities 1.0000 0.3200 2.0000 0.4400 3.0000 0.2300 4.0000 0.6100 PC_AB = 0.32/0.44 PC_AB = 0.7273 PAcB_C = 0.23/0.61 PAcB_C = 0.3770
In a certain population, the probability a woman lives to at least seventy years is 0.70 and is 0.55 that she will live to at least eighty years. If a woman is seventy years old, what is the conditional probability she will survive to eighty years? Note that if A⊂B then P(AB)=P(A).
From 100 cards numbered 00, 01, 02, ⋯, 99, one card is drawn. Suppose Ai is the event the sum of the two digits on a card is and Bj is the event the product of the two digits is j. Determine for each possible i.
Two fair dice are rolled.
What is the (conditional) probability that one turns up two spots, given they show different numbers?
What is the (conditional) probability that the first turns up six, given that the sum is k, for each k from two through 12?
What is the (conditional) probability that at least one turns up six, given that the sum is k, for each k from two through 12?
There are 6×5 ways to choose all different. There are 2×5 ways that they are different and one turns up two spots. The conditional probability is 2/6.
Let A6= event first is a six and Sk= event the sum is k. Now A6Sk=∅ for k≤6. A table of sums shows and for k=7 through 12, respectively. Hence , respectively.
If AB6 is the event at least one is a six, then for k=7 through 11 and . Thus, the conditional probabilities are 2/6, 2/5, 2/4, 2/3, 1, 1, respectively.
Four persons are to be selected from a group of 12 people, 7 of whom are women.
What is the probability that the first and third selected are women?
What is the probability that three of those selected are women?
What is the (conditional) probability that the first and third selected are women, given that three of those selected are women?
Twenty percent of the paintings in a gallery are not originals. A collector buys a painting. He has probability 0.10 of buying a fake for an original but never rejects an original as a fake, What is the (conditional) probability the painting he purchases is an original?
Let B= the event the collector buys, and G= the event the painting is original. Assume P(B|G)=1 and . If P(G)=0.8, then
Five percent of the units of a certain type of equipment brought in for service have a common defect. Experience shows that 93 percent of the units with this defect exhibit a certain behavioral characteristic, while only two percent of the units which do not have this defect exhibit that characteristic. A unit is examined and found to have the characteristic symptom. What is the conditional probability that the unit has the defect, given this behavior?
Let D= the event the unit is defective and C= the event it has the characteristic. Then P(D)=0.05, P(C|D)=0.93, and .
A shipment of 1000 electronic units is received. There is an equally likely probability that there are 0, 1, 2, or 3 defective units in the lot. If one is selected at random and found to be good, what is the probability of no defective units in the lot?
Let Dk= the event of k defective and G be the event a good one is chosen.
Data on incomes and salary ranges for a certain population are analyzed as follows. S1= event annual income is less than $25,000; S2= event annual income is between $25,000 and $100,000; S3= event annual income is greater than $100,000. E1= event did not complete college education; E2= event of completion of bachelor's degree; E3= event of completion of graduate or professional degree program. Data may be tabulated as follows: , and .
Suppose a person has a university education (no graduate study). What is the (conditional) probability that he or she will make $25,000 or more?
Find the total probability that a person's income category is at least as high as his or her educational level.
p = ( 0 . 85 + 0 . 10 + 0 . 05 ) · 0 . 65 + ( 0 . 80 + 0 . 10 ) · 0 . 30 + 0 . 45 · 0 . 05 = 0 . 9425
In a survey, 85 percent of the employees say they favor a certain company policy. Previous experience indicates that 20 percent of those who do not favor the policy say that they do, out of fear of reprisal. What is the probability that an employee picked at random really does favor the company policy? It is reasonable to assume that all who favor say so.
P(S)=0.85, . Also, reasonable to assume P(S|F)=1.
A quality control group is designing an automatic test procedure for compact disk players coming from a production line. Experience shows that one percent of the units produced are defective. The automatic test procedure has probability 0.05 of giving a false positive indication and probability 0.02 of giving a false negative. That is, if D is the event a unit tested is defective, and T is the event that it tests satisfactory, then P(T|D)=0.05 and . Determine the probability that a unit which tests good is, in fact, free of defects.
Five boxes of random access memory chips have 100 units per box. They have respectively one, two, three, four, and five defective units. A box is selected at random, on an equally likely basis, and a unit is selected at random therefrom. It is defective. What are the (conditional) probabilities the unit was selected from each of the boxes?
Hi= the event from box i. and .
Two percent of the units received at a warehouse are defective. A nondestructive test procedure gives two percent false positive indications and five percent false negative. Units which fail to pass the inspection are sold to a salvage firm. This firm applies a corrective procedure which does not affect any good unit and which corrects 90 percent of the defective units. A customer buys a unit from the salvage firm. It is good. What is the (conditional) probability the unit was originally defective?
Let T= event test indicates defective, D= event initially defective, and G= event unit purchased is good. Data are
At a certain stage in a trial, the judge feels the odds are two to one the defendent is guilty. It is determined that the defendent is left handed. An investigator convinces the judge this is six times more likely if the defendent is guilty than if he were not. What is the likelihood, given this evidence, that the defendent is guilty?
Let G= event the defendent is guilty, L= the event the defendent is left handed. Prior odds: . Result of testimony: .
Show that if P(A|C)>P(B|C) and , then P(A)>P(B). Is the converse true? Prove or give a counterexample.
The converse is not true. Consider , P(A|C)=1/4,
, P(B|C)=1/2, and . Then
Since P(·|B) is a probability measure for a given B, we must have . Construct an example to show that in general .
Use property (CP4) to show
P(A|B)>P(A) iff P(AB)>P(A)P(B) iff iff
iff iff P(AB)<P(A)P(B) iff P(A|B)<P(A)
P(A|B)>P(A) iff P(AB)>P(A)P(B) iff iff
Show that P(A|B)≥(P ( A ) + P ( B ) – 1)/P(B).
Show that .
An individual is to select from among n alternatives in an attempt to obtain a particular one. This might be selection from answers on a multiple choice question, when only one is correct. Let A be the event he makes a correct selection, and B be the event he knows which is correct before making the selection. We suppose P(B)=p and . Determine P(B|A); show that P(B|A)≥P(B) and P(B|A) increases with n for fixed p.
P(A|B)=1, , P(B)=p
Polya's urn scheme for a contagious disease. An urn contains initially b black balls and r red balls (r+b=n). A ball is drawn on an equally likely basis from among those in the urn, then replaced along with c additional balls of the same color. The process is repeated. There are n balls on the first choice, n+c balls on the second choice, etc. Let Bk be the event of a black ball on the kth draw and Rk be the event of a red ball on the kth draw. Determine
with . Using (c), we have(3.62)