Lecture 7
Probability
ABD 3e Chapter 5
Learning Objectives
By the end of this lecture, you should be able to:
Explain probability as long-run relative frequency and how it links samples to populations.
Distinguish between discrete probability distributions and continuous probability densities.
Apply the addition rule and multiplication rule of probability.
Determine whether events are mutually exclusive, independent, or dependent.
Compute and interpret conditional probabilities, including simple multi-step processes.
Use probability trees to calculate probabilities across sequential events.
Probability theory: the foundation of statistical thinking
- Probability is the true relative frequency of an event
- It is the proportion of times the event would occur if the same process were repeated many times
- Probability values range from 0 to 1
- Written as Pr(event)
Probability theory and statistics
- Probability theory links sample estimates to population parameters
- Samples:
- Come from data
- Vary by chance
- Populations:
- Usually unobserved
- Have fixed values
- Probability explains how chance affects what we observe
Venn diagrams and sample space
- A Venn diagram represents all possible outcomes of a random trial
- The entire diagram corresponds to the sample space
- Events are shown as regions within the sample space
- Larger areas represent events with higher probability
Example: \(\operatorname{Pr}[A] > \operatorname{Pr}[B]\)
Mutually exclusive vs. non-exclusive events
Mutually exclusive events
\[
Pr[A \text{ and } B] = 0
\]
Not mutually exclusive events
\[
Pr[A \text{ and } B] > 0
\]
Probability distributions
- The true relative frequency of all possible values of a random variable
- Some probability distributions:
- Can be described mathematically
- Are simply a list of possible outcomes with their probabilities
Example: the outcomes from rolling a fair six-sided die
Probability: distributions vs. densities
Probability distributions describe discrete variables
- All discrete outcomes have finite probabilities
- Probabilities across all outcomes sum to 1
Probability densities describe continuous variables
- The probability of any exact value is infinitesimal
- Probabilities are defined over ranges, not single values
- The total area under the curve integrates to 1
Proportions
- A proportion is the number of times an event occurs divided by the number of trials
- Range from 0 to 1
- A proportion can be viewed as a realized sample from a probability distribution
- With more trials, proportions tend to get closer to the underlying probability
The General Addition Principle
- The probability of A or B includes all outcomes in A, all outcomes in B, and avoids double-counting outcomes in both
- When two events overlap, their shared outcomes must be subtracted once
\[
\operatorname{Pr}[A \text{ or } B]
=
\operatorname{Pr}[A]
+
\operatorname{Pr}[B]
-
\operatorname{Pr}[A \text{ and } B]
\]
A special case of the addition principle
- When two events are mutually exclusive, they cannot occur at the same time
- There is no overlap between events
\[
\operatorname{Pr}[A \text{ or } B]
=
\operatorname{Pr}[A]
+
\operatorname{Pr}[B]
\]
Example: probability of a range
- Consider the sum of two fair six-sided dice
- Possible sums range from 2 to 12
- We want the probability that the sum is between 6 and 8, inclusive
\[
\operatorname{Pr}[6 \text{ or } 7 \text{ or } 8]
=\\
\operatorname{Pr}[6] + \operatorname{Pr}[7]
+ \operatorname{Pr}[8]
\]
Example: probabilities sum to 1
- Consider a single fair six-sided die
- Group outcomes into mutually exclusive events
- Together, these events include all possible outcomes
- The probabilities of all such events must sum to 1
- Event A: rolling 1–4
- Event B: rolling 5–6
\[
\operatorname{Pr}[1\text{–}4 \text{ or } 5\text{–}6]
=\\
\operatorname{Pr}[1\text{–}4]
+
\operatorname{Pr}[5\text{–}6]
= \\ 0.67 + 0.33 = \\1
\]
The General Multiplication Principle
- The probability that A and B both occur depends on whether the events are independent
- In general, the probability of A and B equals the probability of A, multiplied by the probability of B given A
\[
\operatorname{Pr}[A \text{ and } B]
=
\operatorname{Pr}[A] \times \operatorname{Pr}[B \mid A]
\]
Independence
- Two events are independent if the occurrence of one does not change the probability of the other
- When events are independent:
\[
\operatorname{Pr}[A \mid B] = \operatorname{Pr}[A]
\]
- In this case, the multiplication rule simplifies
- This situation is described as independence
Probabilities for independent variables
- When two events are independent, one does not affect the probability of the other
- The probability that both events occur is the product of their individual probabilities
- This is a special case of the general multiplication principle
\[
\operatorname{Pr}[A \text{ and } B]
=
\operatorname{Pr}[A] \times \operatorname{Pr}[B]
\]
Example: Oguchi disease
- Oguchi disease is an autosomal recessive condition
- The disease is expressed only if an individual inherits:
- One mutant allele from mom
- One mutant allele from dad
- Parents who carry one mutant allele typically do not show symptoms
Question: Child of Two Oguchi Carriers
Question
If both parents have one copy of the disease allele, what is the probability that a given child will have Oguchi disease?
Thinking
A child has a \(\frac{1}{2}\) chance of inheriting mom’s affected chromosome AND a \(\frac{1}{2}\) chance of inheriting dad’s affected chromosome.
Answer
The probability that a given child of heterozygotes has the disease is \(\frac{1}{2} \times \frac{1}{2} = \frac{1}{4}\).
Visualization: two children of Oguchi carriers
Answer:
\[
\operatorname{Pr}[\text{A affected and B affected}]= \\ \operatorname{Pr}[\text{A affected}] \times \operatorname{Pr}[\text{B affected}]= \\ \frac{1}{4} \times \frac{1}{4}=\frac{1}{16}
\]
Probability trees
- Probability tree: a diagram that can be used to calculate the probabilities of combinations of events resulting from multiple random trials
Let’s revisit the example of two children of Oguchi carriers
Probability Tree Step 1: Write down all possible outcomes for event one, two… etc. and connected them
Probability Tree Step 2: Write down the probability of each outcome, conditional on their path
Probability Tree Step 3: Sum paths that lead to the same destination
Probability Tree Step 4: Sum paths that lead to the same destination
\[
\operatorname{Pr}[2 \text{ affected}] = \frac{1}{16}
\]
\[
\operatorname{Pr}[1 \text{ affected}] = \frac{6}{16}
\]
\[
\operatorname{Pr}[0 \text{ affected}] = \frac{9}{16}
\]
Dependent events
- Two events are dependent if the occurrence of one changes the probability of the other
- Knowing that one event occurred provides information about the other
- In this case, probabilities cannot be multiplied directly
\[
\operatorname{Pr}[A \text{ and } B]
\neq
\operatorname{Pr}[A] \times \operatorname{Pr}[B]
\]
Example: surviving the Titanic
- Of the \(2092\) adults on the Titanic:
- \(319\) (approximately \(0.152\)) sat in first class (more expensive)
- \(654\) (approximately \(0.312\)) survived
- If survival and sitting in first class are independent:
- We expect about \(0.152 \times 0.312 \times 2092 = 100\) first-class adults to survive
- We expect about \(0.848 \times 0.312 \times 2092 = 554\) other adults to survive
Surviving the Titanic depends on class
- More first-class passengers survived than expected:
- \(197\) of the \(319\) adults in first class survived
- This is much higher than the \(\approx 100\) survivors expected under independence
- Fewer other passengers survived than expected:
- \(457\) of the \(1773\) other adults survived
- This is much lower than the \(\approx 554\) survivors expected under independence
- Survival was therefore not independent of passenger class
Conditional probability
- The conditional probability of an event is the probability that the event occurs given that a condition is met
- Read the symbol
| as “given”
- \(\operatorname{Pr}[X \mid Y]\) means the probability of X, given that Y is true
Surviving the Titanic was conditional on class
- The probability of survival depends on passenger class
- These probabilities are calculated by conditioning on class membership
\[
\operatorname{Pr}[\text{survive} \mid \text{adult in first class}]
=
\frac{197}{319}
=
0.62
\]
\[
\operatorname{Pr}[\text{survive} \mid \text{adult not in first class}]
=
\frac{457}{1773}
=
0.26
\]
The Law of Total Probability
- The total probability of an event can be calculated by summing over all possible conditions
- Each term is a conditional probability, weighted by how common that condition is
\[
\operatorname{Pr}[X]
=
\sum_i \operatorname{Pr}[X \mid Y_i] \times \operatorname{Pr}[Y_i]
\]
The total probability of surviving the Titanic
- Applying this to survival on the Titanic:
\[
\operatorname{Pr}[\text{survive}]
=
\sum_i
\operatorname{Pr}[\text{survive} \mid \text{class}_i]
\times
\operatorname{Pr}[\text{class}_i]
\]
\[
\begin{aligned}
\operatorname{Pr}[\text{survive}]
=
&\operatorname{Pr}[\text{survive} \mid \text{1st class}] \times \operatorname{Pr}[\text{1st class}]
+\\
&\operatorname{Pr}[\text{survive} \mid \text{not 1st class}] \times \operatorname{Pr}[\text{not 1st class}]
\end{aligned}
\]
\[
\operatorname{Pr}[\text{survive}]
=
0.62 \times 0.152 + 0.26 \times 0.848
=
0.314
\]
Probability trees for conditional probabilities
- Apply the probability tree to the Titanic survival example
- Multiply along each path to get path probabilities
- Add the survival paths to get the overall probability of survival
\[
\operatorname{Pr}[\text{Survive}]
=
0.094 + 0.220
=
0.314
\]
Summary: the addition principle
- Use the addition principle when calculating the probability of A or B
- Add probabilities of events that can occur instead of one another
- Subtract any overlap to avoid double counting
\[
\operatorname{Pr}[A \text{ or } B]
=
\operatorname{Pr}[A]
+
\operatorname{Pr}[B]
-
\operatorname{Pr}[A \text{ and } B]
\]
- If events are mutually exclusive, the overlap term is \(0\)
Summary: the multiplication principle
- Use the multiplication principle when calculating the probability of A and B
- Multiply probabilities of events that occur together
- Conditional probability is required when events are dependent
\[
\operatorname{Pr}[A \text{ and } B]
=
\operatorname{Pr}[A] \times \operatorname{Pr}[B \mid A]
\]
- If events are independent, this simplifies to
\(\operatorname{Pr}[A \text{ and } B] = \operatorname{Pr}[A] \times \operatorname{Pr}[B]\)
Bayes’ theorem
- Bayes’ theorem allows us to reverse a conditional probability
- It tells us how to find the probability of A given B using:
- The probability of B given A
- How common A is
- The overall probability of B
\[
\operatorname{Pr}[A \mid B]
=
\frac{\operatorname{Pr}[B \mid A] \times \operatorname{Pr}[A]}
{\operatorname{Pr}[B]}
\]
- Bayes’ theorem is not a new rule, but a rearrangement of ideas you already know
Applying Bayes’ theorem: the Titanic
- Find the probability that an adult survivor was in first class
- Of the \(2092\) adults on the Titanic:
- \(319\) were in first class
- \(197\) of the \(319\) first-class adults survived
- \(457\) of the other adults survived
\[
\operatorname{Pr}[\text{1st class} \mid \text{survive}]
=
\frac{
\operatorname{Pr}[\text{survive} \mid \text{1st class}]
\times
\operatorname{Pr}[\text{1st class}]
}{
\operatorname{Pr}[\text{survive}]
}
\]
\[
\operatorname{Pr}[\text{1st class} \mid \text{survive}]
=
\frac{
\left(\frac{197}{319}\right)
\times
\left(\frac{319}{2092}\right)
}{
\left(\frac{197 + 457}{2092}\right)
}
=
0.301
\]
When Bayes’ theorem applies
- Bayes’ theorem is used to reverse a conditional probability
- It applies when:
- You know \(\operatorname{Pr}[B \mid A]\), and
- You want \(\operatorname{Pr}[A \mid B]\)
- The condition you observe is not the condition you care about
Use Bayes’ theorem when the probability you want is reversed from the probability you know.
Bayes’ theorem: examples and takeaway
Examples where Bayes applies
- Medical testing:
Known \(\operatorname{Pr}[\text{positive} \mid \text{disease}]\) →
Want \(\operatorname{Pr}[\text{disease} \mid \text{positive}]\)
- Titanic:
Known \(\operatorname{Pr}[\text{survive} \mid \text{class}]\) →
Want \(\operatorname{Pr}[\text{class} \mid \text{survive}]\)
Example where Bayes is not needed
- If you already have \(\operatorname{Pr}[A \mid B]\) and that is what you want
What to know:
- Bayes’ theorem helps compute the probability of a cause given an observed outcome
- It combines conditional probability, prior probability, and total probability
- You should be able to recognize when Bayes’ theorem applies and interpret a worked example