ABD 3e Chapter 6
By the end of this presentation, students will be able to:
Estimation
What is the value of a population parameter?
How uncertain is that estimate?
How large is the effect?
Hypothesis testing
Is the population parameter consistent with a specific value?
Is the observed effect plausibly due to random variation alone?
Estimation emphasizes magnitude and uncertainty; hypothesis testing emphasizes compatibility with a specific model.
Null model
Describes how data would vary if only random variation were operating
Baseline probability model against which observed results can be compared
Generates a distribution of plausible sampling outcomes (null distribution)
A null model defines what “random variation alone” would look like.
Hypothesis testing
Compares observed data to null distribution
If observed data fall in the extreme tail of that distribution, the data are unlikely under the null model
The null hypothesis (\(H_0\))
Is a specific claim about a population parameter
That claim defines a reference model for comparison.
A good null hypothesis is one that, if rejected, would meaningfully change our understanding of the system.
The alternative hypothesis (\(H_A\))
A null hypothesis specifies a value for a population parameter
That specification defines a probabilistic data-generating process
Repeating that process produces a null distribution
Example: Two heterozygous parents reproduce (Aa × Aa)
Data-generating process: meiosis produces gametes A and a with \(p=0.5\) each, fertilization pairs gametes at random
Expected offspring genotype proportions: AA = 0.25, Aa = 0.50, aa = 0.25
These expected proportions define the null distribution for genotype outcomes in offspring
| Step | Question being answered | What we do | Statistical meaning |
|---|---|---|---|
| 1 | What claim are we testing? | State the hypotheses \(H_0\) and \(H_A\) | Specify a model that defines \(\operatorname{Pr}[\text{data} \mid H_0]\) |
| 2 | How far do the data depart from the null? | Compute a test statistic | Measure distance from what is expected under \(H_0\) |
| 3 | How unusual is this result? | Compute the \(P\)-value | \(\operatorname{Pr}[\text{data as or more extreme} \mid H_0]\) |
| 4 | What do we conclude? | Translate statistical evidence into a biological conclusion | Make a decision about \(H_0\) and state the conclusion in biological terms |
Humans are predominantly right-handed.
Do other animals exhibit consistent forelimb bias?
Bisazza et al. (1996) tested the possibility of handedness in European toads (Bufo bufo) by by observing forelimb use in 18 wild-caught individuals.
This example comes from the textbook (Whitlock & Schluter Analysis of Biological Data 3e)
In the lab, a balloon was wrapped around each individual’s head.
For each individual, researchers recorded whether the right or left forelimb was used to remove it.
The response variable was forelimb choice (right vs. left).
\[ \hat{p} = \frac{14}{18} \approx 0.78 \]
Is 0.78 a sufficiently unusual result under the null model to reject the null hypothesis that toads do not exhibit handedness?
\(\hat{p} = 0.78\) is the sample proportion of right-handed toads.
Let \(p\) denote the true proportion of right-handed toads in the population.
Hypothesis testing is about \(p\), not \(\hat{p}\).
Is \(p = 0.50\), or is \(p \neq 0.50\)?
We want to test whether right- and left-handed toads occur with equal frequency in the population.
The null hypothesis represents no handedness:
\[ H_0: p = 0.50 \]
\[ H_A: p \neq 0.50 \]
The test statistic is a number calculated from the data that summarizes how far the observed result departs from what is expected under \(H_0\).
In this study, the test statistic is the number of right-handed toads in the sample.
\[ \text{Test statistic} = 14 \]
\[ 0.50 \times 18 = 9 \]
If \(H_0\) is true, would we always observe exactly 9 right-handed toads?
Even if \(H_0: p = 0.50\) is true, we would not expect to observe exactly 9 right-handed toads every time we sample 18.
Instead, repeated samples would produce a distribution of possible outcomes.
Under \(H_0\), the number of right-handed toads in a sample of 18 follows a binomial distribution:
\[ X \sim \text{Binomial}(n = 18, p = 0.50) \]
The \(P\)-value measures how unusual the observed result is under \(H_0\).
It is the probability of observing data as extreme as or more extreme than what we observed, assuming \(H_0\) is true.
\[ P = \operatorname{Pr}[\text{data as or more extreme} \mid H_0] \]
Our observed test statistic is 14 right-handed toads.
Probability of 14 or more right-handed toads:
\[ \operatorname{Pr}[X \ge 14] = \\ \operatorname{Pr}[X = 14]+\operatorname{Pr}[X = 15]+\operatorname{Pr}[X = 16]+\operatorname{Pr}[X = 17]+\operatorname{Pr}[X = 18] = \\ 0.0015 \]
\[ P = 2 \times \operatorname{Pr}[X \ge 14] = 0.031 \]
The significance level, \(\alpha\), is a probability threshold used to decide whether to reject \(H_0\).
It represents the probability of rejecting \(H_0\) when \(H_0\) is actually true (Type I error).
A common choice is
\[ \alpha = 0.05 \]
Decision rule:
\[ P = 0.031 \]
\[ 0.031 < 0.05 \]
we reject \(H_0\).
There is statistical evidence that right- and left-handed toads do not occur with equal frequency in the population.
The data suggest a forelimb bias in European toads.
At minimum, report:
In addition, report an estimate of the parameter and its uncertainty:
\[ \hat{p} = 0.78 \]
Example:
In a sample of 18 European toads, 14 (78%) used their right forelimb. A two-sided test indicated that this differed significantly from 0.50 (\(P = 0.031\)), suggesting a forelimb bias in the population.
A Type I error occurs when we reject \(H_0\) even though \(H_0\) is true.
A Type II error occurs when we fail to reject \(H_0\) even though \(H_0\) is false.
Power is the probability of correctly rejecting \(H_0\) when it is false.
There is a tradeoff: decreasing \(\alpha\) generally increases \(\beta\), unless sample size increases.
Table summarizing possible combinations of reality (was \(H_0\) true or not?) versus the conclusion of a statistical hypothesis test (did you reject \(H_0\) or not?)
| Reality | ||
| Conclusion | \(H_0\) True | \(H_0\) False |
| Reject \(H_0\) | Type I error | Correct (Power) |
| Do not reject \(H_0\) | Correct | Type II error |
If \(P > \alpha\), we fail to reject \(H_0\).
This is sometimes described as a nonsignificant result.
We do not conclude that \(H_0\) is true or that \(H_A\) is false.
We conclude only that the data are compatible with \(H_0\).
A nonsignificant result may occur because:
\[ H_A: p > p_0 \quad \text{or} \quad H_A: p < p_0 \]
We reject \(H_0\) only if the data depart from \(H_0\) in that specified direction.
A one-sided test should be used only if the direction is justified before examining the data.
In most scientific studies, two-sided tests are preferred because effects could plausibly occur in either direction.
Example: A one-sided test in toxicology
Question: Does exposure to a pollutant increase amphibian mortality?
Let \(p\) denote the mortality rate in exposed individuals, and \(p_0\) the mortality rate in controls.
\[ H_0: p = p_0 \] \[ H_A: p > p_0 \]
The alternative is one-sided because the biological concern is an increase in mortality.
A decrease in mortality would not support the claim of toxicity.
A 95% confidence interval provides an estimate of a parameter and a measure of uncertainty.
For a two-sided test at \(\alpha = 0.05\):
Thus, confidence intervals and hypothesis tests often lead to the same conclusion.
Hypothesis tests are useful when the goal is to evaluate a specific scientific claim (e.g., \(p = 0.50\)), while confidence intervals emphasize estimation and effect size.

BIOL 275 Biostatistics | Spring 2026