ABD 3e Chapter 4
By the end of this lecture, you should be able to:
Explain why sample-based estimates vary by chance and may differ from true population parameters
Describe how the sampling distribution quantifies uncertainty in an estimate
Define the standard error as the standard deviation of a sampling distribution and a measure of estimation uncertainty
Interpret a confidence interval as a plausible range of values for a population parameter that reflects uncertainty in the estimate
Repeated samples drawn from the same population will differ from one another due to random chance
As a result, estimates calculated from samples (e.g., means, proportions) will vary even when the underlying population is unchanged
This variability is called sampling error, but it does not imply a mistake or bias — it is an inherent consequence of sampling
Draw a random sample of size \(n\) from a population
Calculate a statistic of interest (e.g., the sample mean)
Repeat this process many times, each time taking a new random sample
Collect the statistic from each sample
The distribution of those statistics is the sampling distribution
Interactive app (open in browser):
https://www.zoology.ubc.ca/~whitlock/
Kingfisher/SamplingNormal.htm
What the app shows
Repeated random samples drawn from the same population
The distribution of a sample statistic (here, the mean) across many samples
Try this
The standard error of the mean (SE) quantifies uncertainty in a sample mean
It describes how much the sample mean would vary by chance across repeated samples
SE reflects precision, not variability among individuals
SE decreases as sample size increases, even if the underlying variability in the population does not
SE is an estimate of the spread of the sampling distribution
\[ SE_{\bar{x}} = \frac{s}{\sqrt{n}} \]
\[ \displaystyle SE_{\bar{x}} \approx \text{SD of } \bar{x} \text{ across repeated samples} \]
where
For this example, we treat the 344 penguins in the dataset as the population
The population distribution describes variation in body mass among individual penguins
A single random sample of penguins includes only some individuals from the population
As a result, the sample does not fully represent the population distribution, and its summary statistics vary by chance
Each panel shows a different random sample
Vertical lines are the sample means from each sample (they vary by chance)
Each sample is drawn from the same population
All samples estimate the same mean
Smaller samples produce more variable sample means (wider sampling distributions)
Larger samples produce more similar sample means (narrower sampling distributions)
A confidence interval (CI) is a range of plausible values for a population parameter
It is constructed from a sample estimate and a measure of its uncertainty
Values inside the interval are more plausible, given the data and the chosen level of confidence
Wider intervals indicate greater uncertainty; narrower intervals indicate greater precision
A CI reflects the reliability of the method, not certainty about a single interval
The confidence level describes how the method performs under repeated sampling
If we repeatedly sample from the same population and compute a 95% CI each time:
Whether any one interval contains the true value is unknown
Each confidence interval is unique—width varies depending on which individuals happen to be included in the sample
Even at a fixed sample size, interval width also depends on the chosen confidence level.”
Higher confidence requires a wider interval (95% CI is wider than 90% CI for the same sample)
Simple way to construct a confidence interval: estimate ± 2 × SE
Usually close to a 95% confidence interval
Works well when sample sizes are not very small
For one random sample of n = 25 penguins, the sample mean body mass was
\[ \bar{x} = 4389 \text{ g} \]
with sample standard deviation
\[ s = 801.5 \text{ g} \]
The standard error of the mean is
\[
SE = \frac{s}{\sqrt{n}} = \frac{801.5}{\sqrt{25}} = 160.3 \text{ g}
\]
Using the rule of thumb, a confidence interval is
\[
\begin{aligned}\text{95% CI} \;&=\; 4389 \pm 2 \times 160.3 \\ &=\; 4389 \pm 320.6 \\ &=\; 4068-4710\end{aligned}
\]
Error bars extend from a sample estimate to show uncertainty in the estimated parameter
They may represent ±1 SE, ±2 SE, or an exact confidence interval (e.g., 95% CI)
Always specify what the error bars represent in the figure legend
Correct
In repeated sampling, 95% of 95% confidence intervals will contain the true population mean
We are 95% confident that the population mean lies within this interval
Not correct
Interactive app (open in browser):
https://www.zoology.ubc.ca/~whitlock/
Kingfisher/CIMean.htm
What the app shows
Repeated random samples drawn from the same population
The estimated mean and confidence interval for each sample
Try this
Report the estimate, the confidence interval, the confidence level, and the sample size
Use clear units and avoid unnecessary precision
State the statistic in words the first time it appears
Example
Mean penguin body mass was 4389 g (95% CI: 4068–4710 g, n = 25).
Pseudoreplication occurs when observations are not independent, but are treated as if they are
This often happens when multiple measurements come from the same experimental unit (e.g., repeated measures, subsamples, clustered data)
Pseudoreplication inflates the apparent sample size, leading to confidence intervals that are too narrow
The result is a false sense of precision about the population parameter
You are interested in estimating the average pulse rate of mountain climbers.
Mountain climbers are hard to find, so you take 10 pulse measurements from each climber
You study 6 climbers, giving 60 total measurements
Question: What is the sample size (\(n\))?
Answer:
Treating the 60 measurements as independent observations would be pseudoreplication, leading to overly narrow confidence intervals and misleading conclusions.
Identify the experimental unit: the unit that is independently sampled (e.g., individual climbers, plots, animals)
Do not treat repeated measurements or subsamples from the same unit as independent data points
When you have multiple measurements per unit:
Always report and base inference on the true sample size, not the number of raw measurements
Sample estimates vary by chance; this variability is described by the sampling distribution
The standard error (SE) measures uncertainty in an estimate and decreases as sample size increases
A useful approximation for uncertainty is the 2 × SE rule, which gives a confidence interval close to 95%
Confidence intervals express a range of plausible values for a population parameter and must be interpreted using repeated sampling logic
Interval width depends on both sample size and confidence level
Pseudoreplication violates independence, inflates sample size, and produces misleadingly narrow confidence intervals

BIOL 275 Biostatistics | Spring 2026