| Survival |
Sex
|
||
|---|---|---|---|
| Men | Women | ||
| Survived | 338 | 316 | 654 |
| Died | 1329 | 109 | 1438 |
| 1667 | 425 | 2092 | |
ABD 3e Chapter 9
Examples:
- Treatment (Aspirin vs Placebo) × Cancer (Yes/No)
- Sex (Men/Women) × Survival (Yes/No)
- Smoking (Yes/No) × Lung Disease (Yes/No)
Core Question:
Does the distribution of one variable depend on the other?
If yes → association
If no → independence
How strong is the relationship?
These quantify the magnitude of association.
Primarily defined for 2×2 tables (binary exposure × binary outcome)
Is the relationship statistically detectable?
This evaluates evidence for association.
Applicable to any r × c contingency table (two categorical variables with two or more categories each)
| Survival |
Sex
|
||
|---|---|---|---|
| Men | Women | ||
| Survived | 338 | 316 | 654 |
| Died | 1329 | 109 | 1438 |
| 1667 | 425 | 2092 | |
A large randomized study investigated whether regular aspirin use reduces cancer risk.
Two categorical variables:
Each participant falls into exactly one cell of a 2 × 2 contingency table.
Our goal:
Does cancer risk differ between the aspirin and placebo groups?
A risk is a conditional probability:
\[ \text{Risk} = \operatorname{Pr}(\text{Cancer} \mid \text{Group}) \]
We compute risk as a proportion by dividing:
\[ p=\frac{\text{number with cancer in group}}{\text{total in group}} \]
Risk of cancer in the aspirin group:
\[ \operatorname{Pr}(\text{Cancer} \mid \text{Aspirin}) \]
\[ \hat{p}_1=\frac{1438}{1438+18496}=0.0721 \]
Risk of cancer in the placebo group:
\[ \operatorname{Pr}(\text{Cancer} \mid \text{Placebo}) \]
\[ \hat{p}_2=\frac{1427}{1427+18515}=0.0716 \]
Now we compare risks across groups.
Relative risk of aspirin:
\[ RR = \frac{ \operatorname{Pr}(\text{Cancer} \mid \text{Aspirin}) }{ \operatorname{Pr}(\text{Cancer} \mid \text{Placebo}) } \]
\[ \hat{RR}=\frac{\hat{p}_1}{\hat{p}_2} \]
\[ =\frac{0.0721}{0.0716}=1.007 \]
Interpretation:
Relative risk measures the magnitude of association between treatment and outcome.
Risk compares outcome to total.
Odds compare outcome to non-outcome.
Odds are not bounded between 0 and 1.
When to use: 2 variables, each with 2 categories
Risk (probability):
\[ \operatorname{Pr}(\text{Cancer}) \]
Odds:
\[ \frac{\operatorname{Pr}(\text{Cancer})} {\operatorname{Pr}(\text{No Cancer})} \]
\[ =\frac{\operatorname{Pr}(\text{Cancer})} {1 - \operatorname{Pr}(\text{Cancer})} \]
The odds ratio compares two odds:
\[ \hat{OR} = \frac{\text{Odds}(\text{Cancer} \mid \text{Aspirin})} {\text{Odds}(\text{Cancer} \mid \text{Placebo})} \]
Interpretation:
For the aspirin study:
\[ \hat{OR} = \frac{0.0777}{0.0771} = 1.008 \]
The odds of cancer are essentially the same in the aspirin and placebo groups.
The following is a shortcut formula where 𝑎, 𝑏, 𝑐, and 𝑑 refer to the observed frequencies in the cells of the contingency table:
\[ \hat{OR}=\frac{a/c}{b/d}=\frac{ad}{bc} \]
\[ \operatorname{SE}[\operatorname{ln}(\hat{OR})]=\sqrt{\frac{1}{a}+\frac{1}{b}+\frac{1}{c}+\frac{1}{d}} \]
For the aspirin example:
\[ \operatorname{SE}[\operatorname{ln}(\hat{OR})]=\sqrt{\frac{1}{1438}+\frac{1}{1427}+\frac{1}{18496}+\frac{1}{18515}}=0.03878 \]
\[ \operatorname{ln}(\hat{OR}) \pm Z \times \operatorname{SE}[\operatorname{ln}(\hat{OR})] \]
where \(Z=1.96\) for a 95% CI and \(Z=2.58\) for a 99% CI
To get the CI for the odds ratio itself, you have to take the antilog of the upper and lower bounds:
\[ e^x < \operatorname{OR} < e^y \]
If OR=1 there is no association between exposure and outcome
If 95% CI includes 1, results are not statistically significant
Relative Risk (RR)
Best used when
Odds Ratio (OR)
Best used when
If the outcome is rare, \(OR \approx RR\)
Investigates associations between an exposure and an outcome.
Start with individuals who already have the outcome (cases)
Select a comparison group without the outcome (controls)
Look backward to determine prior exposure status
The total population at risk is unknown, so risk cannot be estimated directly. Use odds ratio instead.
It tests the goodness of fit to the data of the null model of independence of variables.
RR and OR allow us to estimate magnitude of association, but do not test whether an association may be caused by chance alone.
Researchers have observed that infected fish spend excessive time near the water surface
They may be more vulnerable to bird predation, which would benefit the worm
Lafferty and Morris (1996) tested whether bird predation varies with severity of infection
Fish placed into outdoor pens open to bird predation, with fish of varying infection intensity:
highly infected
lightly infected
not infected
Goal: calculate the \(\chi^2\) from the data to see how different the observed frequencies are from the expected frequencies
Start with: contingency table of observed frequencies
Next step: 2a. Calculate row, column, and grand totals
Table 1. Observed Frequencies.
| Not Infected | Lightly Infected | Highly Infected | |
|---|---|---|---|
| Eaten by birds | 1 | 10 | 37 |
| Not eaten by birds | 49 | 35 | 9 |
Sum the values in each row to get row totals
Sum the values in each column to get column totals
Sum the row or column totals to get the grand total
Table 1. Observed Frequencies.
| Not Infected | Lightly Infected | Highly Infected | Row total | |
|---|---|---|---|---|
| Eaten by birds | 1 | 10 | 37 | 48 |
| Not eaten by birds | 49 | 35 | 9 | 93 |
| Column Total | 50 | 45 | 46 | 141 |
Table 1. Observed Frequencies.
| Not Infected | Lightly Infected | Highly Infected | Row total | |
|---|---|---|---|---|
| Eaten by birds | 1 | 10 | 37 | 48 |
| Not eaten by birds | 49 | 35 | 9 | 93 |
| Column Total | 50 | 45 | 46 | 141 |
Table 2. Expected Proportions.
| Not Infected | Lightly Infected | Highly Infected | Proportion | |
|---|---|---|---|---|
| Eaten by birds | ||||
| Not eaten by birds | ||||
| Proportion |
\[ \hat{\operatorname{Pr}}[\text{Not infected}]= \]
\[ \frac{50}{141}= \]
\[ 0.3546 \]
Table 1. Observed Frequencies.
| Not Infected | Lightly Infected | Highly Infected | Row total | |
|---|---|---|---|---|
| Eaten by birds | 1 | 10 | 37 | 48 |
| Not eaten by birds | 49 | 35 | 9 | 93 |
| Column Total | 50 | 45 | 46 | 141 |
Table 2. Expected Proportions.
| Not Infected | Lightly Infected | Highly Infected | Proportion | |
|---|---|---|---|---|
| Eaten by birds | ||||
| Not eaten by birds | ||||
| Proportion | 0.3546 |
\[ \hat{\operatorname{Pr}}[\text{Eaten by birds}]= \]
\[ \frac{48}{141}= \]
\[ 0.3404 \]
Table 1. Observed Frequencies.
| Not Infected | Lightly Infected | Highly Infected | Row total | |
|---|---|---|---|---|
| Eaten by birds | 1 | 10 | 37 | 48 |
| Not eaten by birds | 49 | 35 | 9 | 93 |
| Column Total | 50 | 45 | 46 | 141 |
Table 2. Expected Proportions.
| Not Infected | Lightly Infected | Highly Infected | Prop. | |
|---|---|---|---|---|
| Eaten by birds | 0.3404 | |||
| Not eaten by birds | ||||
| Proportion | 0.3546 |
Table 1. Observed Frequencies.
| Not Infected | Lightly Infected | Highly Infected | Row total | |
|---|---|---|---|---|
| Eaten by birds | 1 | 10 | 37 | 48 |
| Not eaten by birds | 49 | 35 | 9 | 93 |
| Column Total | 50 | 45 | 46 | 141 |
Table 2. Expected Proportions.
| Not Infected | Lightly Infected | Highly Infected | Prop. | |
|---|---|---|---|---|
| Eaten by birds | 0.3404 | |||
| Not eaten by birds | 0.6596 | |||
| Proportion | 0.3546 | 0.3192 | 0.3262 |
Use multiplication rule:
If two events are independent (null hypothesis), probability of both occurring is probability of one times probability of the other
\[ \operatorname{Pr}[\text{not infected and eaten}]\\ =\operatorname{Pr}[\text{not infected}]\times\operatorname{Pr}[\text{eaten}]\\ =0.3546 \times 0.3404 =0.1207 \]
Table 1. Observed Frequencies.
| Not Infected | Lightly Infected | Highly Infected | Row total | |
|---|---|---|---|---|
| Eaten by birds | 1 | 10 | 37 | 48 |
| Not eaten by birds | 49 | 35 | 9 | 93 |
| Column Total | 50 | 45 | 46 | 141 |
Table 2. Expected Proportions.
| Not Infected | Lightly Infected | Highly Infected | Prop. | |
|---|---|---|---|---|
| Eaten by birds | 0.1207 | 0.3404 | ||
| Not eaten by birds | 0.6596 | |||
| Proportion | 0.3546 | 0.3192 | 0.3262 |
Use multiplication rule:
If two events are independent (null hypothesis), probability of both occurring is probability of one times probability of the other
Table 1. Observed Frequencies.
| Not Infected | Lightly Infected | Highly Infected | Row total | |
|---|---|---|---|---|
| Eaten by birds | 1 | 10 | 37 | 48 |
| Not eaten by birds | 49 | 35 | 9 | 93 |
| Column Total | 50 | 45 | 46 | 141 |
Table 2. Expected Proportions.
| Not Infected | Lightly Infected | Highly Infected | Prop. | |
|---|---|---|---|---|
| Eaten by birds | 0.1207 | 0.1087 | 0.1110 | 0.3404 |
| Not eaten by birds | 0.2339 | 0.2105 | 0.2152 | 0.6596 |
| Proportion | 0.3546 | 0.3192 | 0.3262 |
\[ \operatorname{Expected}[\text{not infected and eaten}]= \\\operatorname{Pr}[\text{not infected and eaten}]\times G= \\0.1207 \times 141 = \\17.0 \]
Repeat for each cell
Table 3. Expected Frequencies
| Not Infected | Lightly Infected | Highly Infected | ||
|---|---|---|---|---|
| Eaten by birds | 17.0 | |||
| Not eaten by birds | ||||
| 141 |
Table 2. Expected Proportions.
| Not Infected | Lightly Infected | Highly Infected | ||
|---|---|---|---|---|
| Eaten by birds | 0.1207 | 0.1087 | 0.1110 | |
| Not eaten by birds | 0.2339 | 0.2105 | 0.2152 | |
Use the tables for Observed and Expected frequencies to calculate the test statistic
\[ \chi^2 = \sum_{i=1}^{r} \sum_{j=1}^{c} \frac{(O_{ij} - E_{ij})^2}{E_{ij}} \]
Where:
Table 3. Expected Frequencies
| Not Infected | Lightly Infected | Highly Infected | ||
|---|---|---|---|---|
| Eaten by birds | 17.0 | 15.3 | 15.7 | |
| Not eaten by birds | 33.0 | 29.7 | 30.3 | |
Table 1. Observed Frequencies.
| Not Infected | Lightly Infected | Highly Infected | ||
|---|---|---|---|---|
| Eaten by birds | 1 | 10 | 37 | |
| Not eaten by birds | 49 | 35 | 9 | |
Use the tables for Observed and Expected frequencies to calculate the test statistic
\[ \chi^2 = \sum_{i=1}^{r} \sum_{j=1}^{c} \frac{(O_{ij} - E_{ij})^2}{E_{ij}} \]
\[ = \frac{(1-17)^2}{17} + \frac{(49-33)^2}{33} + \\ \frac{(10-15.3)^2}{15.3} + \frac{(35-29.7)^2}{29.7} + \\ \frac{(37-15.7)^2}{15.7} + \frac{(9-30.3)^2}{30.3} \]
\[ = 69.5 \]
Table 3. Expected Frequencies
| Not Infected | Lightly Infected | Highly Infected | ||
|---|---|---|---|---|
| Eaten by birds | 17.0 | 15.3 | 15.7 | |
| Not eaten by birds | 33.0 | 29.7 | 30.3 | |
Table 1. Observed Frequencies.
| Not Infected | Lightly Infected | Highly Infected | ||
|---|---|---|---|---|
| Eaten by birds | 1 | 10 | 37 | |
| Not eaten by birds | 49 | 35 | 9 | |
Method 1 - exact P-value
Method 2 - statistical table
Calculate degrees of freedom
\(\operatorname{df}=(r-1)(c-1)\)
Look up critical value in a table
If \(\chi^2_{df}>\operatorname{critical value}\) then reject \(H_0\)
Decision rule: \(\chi^2_{df}>\operatorname{critical value}\)
\[ \chi^2 = 69.5 \]
\[ \operatorname{critical value} = 5.994 \]
Therefore, reject \(H_0\), conclude parasite infection level and being eaten are not independent

BIOL 275 Biostatistics | Spring 2026