Lecture 7
Probability

ABD 3e Chapter 5

Chris Merkord

Learning Objectives

By the end of this lecture, you should be able to:

  • Explain probability as long-run relative frequency and how it links samples to populations.

  • Distinguish between discrete probability distributions and continuous probability densities.

  • Apply the addition rule and multiplication rule of probability.

  • Determine whether events are mutually exclusive, independent, or dependent.

  • Compute and interpret conditional probabilities, including simple multi-step processes.

  • Use probability trees to calculate probabilities across sequential events.

Probability theory: the foundation of statistical thinking

  • Probability is the true relative frequency of an event
  • It is the proportion of times the event would occur if the same process were repeated many times
  • Probability values range from 0 to 1
  • Written as Pr(event)
A horizontal probability scale labeled from 0 (impossible) to 1 (certain), with intermediate labels unlikely, even chance, and likely, and example icons illustrating a 1-in-6 chance, a fair coin toss, and a 4-in-5 chance.
Figure 1: Conceptual probability scale from impossible to certain, showing that probability values range from 0 to 1, with examples including a 1-in-6 chance, an even chance, and a 4-in-5 chance. Source: Illustration generated by ChatGPT.

Probability theory and statistics

  • Probability theory links sample estimates to population parameters
  • Samples:
    • Come from data
    • Vary by chance
  • Populations:
    • Usually unobserved
    • Have fixed values
  • Probability explains how chance affects what we observe

Venn diagrams and sample space

  • A Venn diagram represents all possible outcomes of a random trial
  • The entire diagram corresponds to the sample space
  • Events are shown as regions within the sample space
  • Larger areas represent events with higher probability

Example: \(\operatorname{Pr}[A] > \operatorname{Pr}[B]\)

A Venn diagram with two non-overlapping circles labeled A and B. Circle A is larger than circle B, indicating a higher probability for event A than event B. The circles are different colors and partially transparent on a transparent background.
Figure 2: Venn diagram representing the sample space of a random trial, with two mutually exclusive events where event A has a larger area than event B.

Mutually exclusive vs. non-exclusive events

Mutually exclusive events

  • Cannot occur at the same time

  • Outcomes belong to only one event

\[ Pr[A \text{ and } B] = 0 \]

A Venn diagram with two non-overlapping circles labeled A and B, indicating mutually exclusive events with no shared outcomes.
Figure 3: Venn diagram of two mutually exclusive events, where events A and B do not overlap.

Not mutually exclusive events

  • Can occur at the same time

  • Outcomes may belong to both events

\[ Pr[A \text{ and } B] > 0 \]

A Venn diagram with two overlapping circles labeled A and B, showing a shared region that represents outcomes common to both events.
Figure 4: Venn diagram of two non-exclusive events, where events A and B overlap.

Probability distributions

  • The true relative frequency of all possible values of a random variable
  • Some probability distributions:
    • Can be described mathematically
    • Are simply a list of possible outcomes with their probabilities

Example: the outcomes from rolling a fair six-sided die

A probability distribution showing six discrete outcomes from a fair die, with each outcome assigned the same probability value.
Figure 5: Probability distribution for the outcomes of a fair six-sided die, where each possible outcome has equal probability.

Probability: distributions vs. densities

Probability distributions describe discrete variables

  • All discrete outcomes have finite probabilities
  • Probabilities across all outcomes sum to 1

Probability densities describe continuous variables

  • The probability of any exact value is infinitesimal
  • Probabilities are defined over ranges, not single values
  • The total area under the curve integrates to 1
Two vertically stacked plots based on a normal distribution. The top plot shows a discrete probability distribution from binning a normal distribution into intervals, with bars labeled by the bin ranges. The bottom plot shows a smooth normal probability density curve over the same range.
Figure 6: Comparison of a probability distribution and a probability density based on the same normal distribution. The top panel shows a discretized probability distribution, while the bottom panel shows the corresponding continuous probability density.

Proportions

  • A proportion is the number of times an event occurs divided by the number of trials
  • Range from 0 to 1
  • A proportion can be viewed as a realized sample from a probability distribution
  • With more trials, proportions tend to get closer to the underlying probability

The General Addition Principle

  • The probability of A or B includes all outcomes in A, all outcomes in B, and avoids double-counting outcomes in both
  • When two events overlap, their shared outcomes must be subtracted once

\[ \operatorname{Pr}[A \text{ or } B] = \operatorname{Pr}[A] + \operatorname{Pr}[B] - \operatorname{Pr}[A \text{ and } B] \]

A Venn diagram with two overlapping circles labeled A and B. The diagram shows that the combined area representing A or B equals the area of A plus the area of B, with the overlapping region subtracted once to avoid double counting.
Figure 7: Visual illustration of the General Addition Principle, showing that the probability of A or B equals the probability of A plus the probability of B minus the probability of their overlap.

A special case of the addition principle

  • When two events are mutually exclusive, they cannot occur at the same time
  • There is no overlap between events

\[ \operatorname{Pr}[A \text{ or } B] = \operatorname{Pr}[A] + \operatorname{Pr}[B] \]

A diagram of ABO and Rh blood types organized into non-overlapping regions, showing that each blood type category is mutually exclusive with the others.
Figure 8: ABO and Rh blood types arranged into mutually exclusive categories, illustrating a case where events have no overlapping outcomes.

Example: probability of a range

  • Consider the sum of two fair six-sided dice
  • Possible sums range from 2 to 12
  • We want the probability that the sum is between 6 and 8, inclusive

\[ \operatorname{Pr}[6 \text{ or } 7 \text{ or } 8] =\\ \operatorname{Pr}[6] + \operatorname{Pr}[7] + \operatorname{Pr}[8] \]

A bar chart showing the probability distribution for the sum of two dice from 2 to 12. Bars corresponding to sums of 6, 7, and 8 are highlighted, while other sums are shown in a lighter color.
Figure 9: Probability distribution of the sum of two fair six-sided dice, with outcomes 6 through 8 highlighted to illustrate calculating the probability of a range.

Example: probabilities sum to 1

  • Consider a single fair six-sided die
  • Group outcomes into mutually exclusive events
  • Together, these events include all possible outcomes
  • The probabilities of all such events must sum to 1
  • Event A: rolling 1–4
  • Event B: rolling 5–6

\[ \operatorname{Pr}[1\text{–}4 \text{ or } 5\text{–}6] =\\ \operatorname{Pr}[1\text{–}4] + \operatorname{Pr}[5\text{–}6] = \\ 0.67 + 0.33 = \\1 \]

The General Multiplication Principle

  • The probability that A and B both occur depends on whether the events are independent
  • In general, the probability of A and B equals the probability of A, multiplied by the probability of B given A

\[ \operatorname{Pr}[A \text{ and } B] = \operatorname{Pr}[A] \times \operatorname{Pr}[B \mid A] \]

  • Read “|” as “given”

Independence

  • Two events are independent if the occurrence of one does not change the probability of the other
  • When events are independent:

\[ \operatorname{Pr}[A \mid B] = \operatorname{Pr}[A] \]

  • In this case, the multiplication rule simplifies
  • This situation is described as independence
A 6 by 6 grid showing all possible outcomes of rolling two dice, labeled as ordered pairs. A highlighted row shows outcomes where the first roll is 3, and a highlighted column shows outcomes where the second roll is 3, illustrating that each has probability one sixth and that the events are independent.
Figure 10: Sample space for rolling two fair dice, illustrating independence between the first and second roll. The probability of rolling a 3 on one roll remains 1/6 regardless of the outcome of the other roll.

Probabilities for independent variables

  • When two events are independent, one does not affect the probability of the other
  • The probability that both events occur is the product of their individual probabilities
  • This is a special case of the general multiplication principle

\[ \operatorname{Pr}[A \text{ and } B] = \operatorname{Pr}[A] \times \operatorname{Pr}[B] \]

Example: Oguchi disease

  • Oguchi disease is an autosomal recessive condition
  • The disease is expressed only if an individual inherits:
    • One mutant allele from mom
    • One mutant allele from dad
  • Parents who carry one mutant allele typically do not show symptoms
Side-by-side fundus photographs of the retina showing abnormal retinal coloration and vascular patterns associated with Oguchi disease and its progression to retinitis pigmentosa, with visible changes in retinal structure and blood vessels over long-term disease progression.
Figure 11: Fundus photographs illustrating retinal changes associated with Oguchi disease and its progression to retinitis pigmentosa, showing characteristic alterations in retinal appearance and vasculature after long-term disease progression (Nishiguchi et al. 2020).

Question: Child of Two Oguchi Carriers

Question

If both parents have one copy of the disease allele, what is the probability that a given child will have Oguchi disease?

Thinking

A child has a \(\frac{1}{2}\) chance of inheriting mom’s affected chromosome AND a \(\frac{1}{2}\) chance of inheriting dad’s affected chromosome.

Answer

The probability that a given child of heterozygotes has the disease is \(\frac{1}{2} \times \frac{1}{2} = \frac{1}{4}\).

Visualization: two children of Oguchi carriers

  • If both parents have an affected chromosome but no disease:

    • What’s the probability that both of their children will have Oguchi disease?

Answer:

\[ \operatorname{Pr}[\text{A affected and B affected}]= \\ \operatorname{Pr}[\text{A affected}] \times \operatorname{Pr}[\text{B affected}]= \\ \frac{1}{4} \times \frac{1}{4}=\frac{1}{16} \]

A 2×2 grid representing outcomes for two children of Oguchi disease carriers. The horizontal axis indicates whether child two is affected (No, Yes), and the vertical axis indicates whether child one is affected (No, Yes). The cells are labeled with the number of affected children (0, 1, or 2), with darker shading indicating outcomes with more affected children.
Figure 12: Outcome grid showing the possible numbers of children affected by Oguchi disease when two children are born to heterozygous carrier parents, illustrating the probabilities of zero, one, or two affected children.

Probability trees

  • Probability tree: a diagram that can be used to calculate the probabilities of combinations of events resulting from multiple random trials

Let’s revisit the example of two children of Oguchi carriers

Probability Tree Step 1: Write down all possible outcomes for event one, two… etc. and connected them

Probability Tree Step 2: Write down the probability of each outcome, conditional on their path

Probability Tree Step 3: Sum paths that lead to the same destination

Probability Tree Step 4: Sum paths that lead to the same destination

\[ \operatorname{Pr}[2 \text{ affected}] = \frac{1}{16} \]

\[ \operatorname{Pr}[1 \text{ affected}] = \frac{6}{16} \]

\[ \operatorname{Pr}[0 \text{ affected}] = \frac{9}{16} \]

Dependent events

  • Two events are dependent if the occurrence of one changes the probability of the other
  • Knowing that one event occurred provides information about the other
  • In this case, probabilities cannot be multiplied directly

\[ \operatorname{Pr}[A \text{ and } B] \neq \operatorname{Pr}[A] \times \operatorname{Pr}[B] \]

Example: surviving the Titanic

  • Of the \(2092\) adults on the Titanic:
    • \(319\) (approximately \(0.152\)) sat in first class (more expensive)
    • \(654\) (approximately \(0.312\)) survived
  • If survival and sitting in first class are independent:
    • We expect about \(0.152 \times 0.312 \times 2092 = 100\) first-class adults to survive
    • We expect about \(0.848 \times 0.312 \times 2092 = 554\) other adults to survive

Survivors of the RMS Titanic aboard a lifeboat, illustrating unequal survival outcomes during the disaster.

Survivors of the RMS Titanic aboard a lifeboat, illustrating unequal survival outcomes during the disaster. Source: Public domain, via Wikimedia Commons.

Surviving the Titanic depends on class

  • More first-class passengers survived than expected:
    • \(197\) of the \(319\) adults in first class survived
    • This is much higher than the \(\approx 100\) survivors expected under independence
  • Fewer other passengers survived than expected:
    • \(457\) of the \(1773\) other adults survived
    • This is much lower than the \(\approx 554\) survivors expected under independence
  • Survival was therefore not independent of passenger class

Conditional probability

  • The conditional probability of an event is the probability that the event occurs given that a condition is met
  • Read the symbol | as “given”
  • \(\operatorname{Pr}[X \mid Y]\) means the probability of X, given that Y is true

Surviving the Titanic was conditional on class

  • The probability of survival depends on passenger class
  • These probabilities are calculated by conditioning on class membership

\[ \operatorname{Pr}[\text{survive} \mid \text{adult in first class}] = \frac{197}{319} = 0.62 \]

\[ \operatorname{Pr}[\text{survive} \mid \text{adult not in first class}] = \frac{457}{1773} = 0.26 \]

The Law of Total Probability

  • The total probability of an event can be calculated by summing over all possible conditions
  • Each term is a conditional probability, weighted by how common that condition is

\[ \operatorname{Pr}[X] = \sum_i \operatorname{Pr}[X \mid Y_i] \times \operatorname{Pr}[Y_i] \]

The total probability of surviving the Titanic

  • Applying this to survival on the Titanic:

\[ \operatorname{Pr}[\text{survive}] = \sum_i \operatorname{Pr}[\text{survive} \mid \text{class}_i] \times \operatorname{Pr}[\text{class}_i] \]

\[ \begin{aligned} \operatorname{Pr}[\text{survive}] = &\operatorname{Pr}[\text{survive} \mid \text{1st class}] \times \operatorname{Pr}[\text{1st class}] +\\ &\operatorname{Pr}[\text{survive} \mid \text{not 1st class}] \times \operatorname{Pr}[\text{not 1st class}] \end{aligned} \]

\[ \operatorname{Pr}[\text{survive}] = 0.62 \times 0.152 + 0.26 \times 0.848 = 0.314 \]

Probability trees for conditional probabilities

  • Apply the probability tree to the Titanic survival example
  • Multiply along each path to get path probabilities
  • Add the survival paths to get the overall probability of survival

\[ \operatorname{Pr}[\text{Survive}] = 0.094 + 0.220 = 0.314 \]

A probability tree diagram for adult Titanic passengers, split first by class and then by survival outcome. Each complete path shows the combined probability, and the two survival paths are summed to obtain the total probability of surviving.
Figure 13: Probability tree showing survival on the Titanic by passenger class, with branch probabilities for first class versus not first class and conditional survival outcomes.

Summary: the addition principle

  • Use the addition principle when calculating the probability of A or B
  • Add probabilities of events that can occur instead of one another
  • Subtract any overlap to avoid double counting

\[ \operatorname{Pr}[A \text{ or } B] = \operatorname{Pr}[A] + \operatorname{Pr}[B] - \operatorname{Pr}[A \text{ and } B] \]

  • If events are mutually exclusive, the overlap term is \(0\)

Summary: the multiplication principle

  • Use the multiplication principle when calculating the probability of A and B
  • Multiply probabilities of events that occur together
  • Conditional probability is required when events are dependent

\[ \operatorname{Pr}[A \text{ and } B] = \operatorname{Pr}[A] \times \operatorname{Pr}[B \mid A] \]

  • If events are independent, this simplifies to
    \(\operatorname{Pr}[A \text{ and } B] = \operatorname{Pr}[A] \times \operatorname{Pr}[B]\)

Bayes’ theorem

  • Bayes’ theorem allows us to reverse a conditional probability
  • It tells us how to find the probability of A given B using:
    • The probability of B given A
    • How common A is
    • The overall probability of B

\[ \operatorname{Pr}[A \mid B] = \frac{\operatorname{Pr}[B \mid A] \times \operatorname{Pr}[A]} {\operatorname{Pr}[B]} \]

  • Bayes’ theorem is not a new rule, but a rearrangement of ideas you already know

Applying Bayes’ theorem: the Titanic

  • Find the probability that an adult survivor was in first class
  • Of the \(2092\) adults on the Titanic:
    • \(319\) were in first class
    • \(197\) of the \(319\) first-class adults survived
    • \(457\) of the other adults survived

\[ \operatorname{Pr}[\text{1st class} \mid \text{survive}] = \frac{ \operatorname{Pr}[\text{survive} \mid \text{1st class}] \times \operatorname{Pr}[\text{1st class}] }{ \operatorname{Pr}[\text{survive}] } \]

\[ \operatorname{Pr}[\text{1st class} \mid \text{survive}] = \frac{ \left(\frac{197}{319}\right) \times \left(\frac{319}{2092}\right) }{ \left(\frac{197 + 457}{2092}\right) } = 0.301 \]

When Bayes’ theorem applies

  • Bayes’ theorem is used to reverse a conditional probability
  • It applies when:
    • You know \(\operatorname{Pr}[B \mid A]\), and
    • You want \(\operatorname{Pr}[A \mid B]\)
  • The condition you observe is not the condition you care about

Use Bayes’ theorem when the probability you want is reversed from the probability you know.

Bayes’ theorem: examples and takeaway

Examples where Bayes applies

  • Medical testing:
    Known \(\operatorname{Pr}[\text{positive} \mid \text{disease}]\)
    Want \(\operatorname{Pr}[\text{disease} \mid \text{positive}]\)
  • Titanic:
    Known \(\operatorname{Pr}[\text{survive} \mid \text{class}]\)
    Want \(\operatorname{Pr}[\text{class} \mid \text{survive}]\)

Example where Bayes is not needed

  • If you already have \(\operatorname{Pr}[A \mid B]\) and that is what you want

What to know:

  • Bayes’ theorem helps compute the probability of a cause given an observed outcome
  • It combines conditional probability, prior probability, and total probability
  • You should be able to recognize when Bayes’ theorem applies and interpret a worked example