BIOL 275 Biostatistics – quarto-input25ead5d8f9553385

Learning Objectives

By the end of this lecture, you should be able to:

Explain why sample-based estimates vary by chance and may differ from true population parameters
Describe how the sampling distribution quantifies uncertainty in an estimate
Define the standard error as the standard deviation of a sampling distribution and a measure of estimation uncertainty
Interpret a confidence interval as a plausible range of values for a population parameter that reflects uncertainty in the estimate

Samples Vary by Chance

Repeated samples drawn from the same population will differ from one another due to random chance
As a result, estimates calculated from samples (e.g., means, proportions) will vary even when the underlying population is unchanged
This variability is called sampling error, but it does not imply a mistake or bias — it is an inherent consequence of sampling

Diagram showing a population and two different random samples drawn from it. The two samples contain different subsets of individuals and produce different sample means, illustrating that samples and their estimates vary by chance even when drawn from the same population. — Figure 1: **Sampling variability.** Different random samples from the same population produce different sample estimates by chance.

Sampling distribution

The distribution of a parameter estimate (e.g., a mean) obtained from many repeated samples

Diagram showing a population of 5th-grade students, many random samples drawn from that population, the sample mean from each sample, and a histogram of those sample means, illustrating the sampling distribution of the mean. — Figure 2: Sampling distribution of the sample mean. Repeated random samples from the same population produce different sample means; the distribution of those means forms the sampling distribution. *Source: GeeksforGeeks, “Sampling Distribution”*.

Generating a Sampling Distribution

Draw a random sample of size \(n\) from a population
Calculate a statistic of interest (e.g., the sample mean)
Repeat this process many times, each time taking a new random sample
Collect the statistic from each sample
The distribution of those statistics is the sampling distribution

Sampling Distribution: Book’s Web App

Interactive app (open in browser):

https://www.zoology.ubc.ca/~whitlock/
Kingfisher/SamplingNormal.htm

Screenshot of an interactive web app showing a population distribution, repeated random samples, and a histogram of sample means, used to demonstrate how a sampling distribution is formed through repeated sampling. — Figure 3: Sampling distribution simulation. Interactive web app showing repeated random samples drawn from a population and the resulting distribution of sample means, illustrating how sampling variability leads to uncertainty in estimates. Source: Whitlock and Schluter, The Analysis of Biological Data (3rd ed.) web app.

What the app shows

Repeated random samples drawn from the same population
The distribution of a sample statistic (here, the mean) across many samples

Try this

Generate a sampling distribution using sample size n = 10
Repeat using sample size n = 100
Compare the results:
- Which sampling distribution is wider?
- Which is more concentrated around the population mean?

Standard Error of the Mean

The standard error of the mean (SE) quantifies uncertainty in a sample mean
It describes how much the sample mean would vary by chance across repeated samples
SE reflects precision, not variability among individuals
SE decreases as sample size increases, even if the underlying variability in the population does not
SE is an estimate of the spread of the sampling distribution

\[ SE_{\bar{x}} = \frac{s}{\sqrt{n}} \]

\[ \displaystyle SE_{\bar{x}} \approx \text{SD of } \bar{x} \text{ across repeated samples} \]

where

\(SE_{\bar{x}}\) = estimated standard deviation of the sampling distribution
\(s\) = sample standard deviation
\(n\) = sample size
\(\bar{x}\) = sample mean

From Individuals to Samples (Penguin Body Mass)

For this example, we treat the 344 penguins in the dataset as the population
The population distribution describes variation in body mass among individual penguins
A single random sample of penguins includes only some individuals from the population
As a result, the sample does not fully represent the population distribution, and its summary statistics vary by chance

Figure 4: Distribution of body mass measurements for all penguins in the dataset, treated here as the population.

From Sample Means to a Sampling Distribution

Each panel shows a different random sample
Vertical lines are the sample means from each sample (they vary by chance)

Figure 5: **Multiple random samples of penguin body masses (n = 15 per sample).** Each panel shows a different random sample drawn from the same population; vertical lines indicate the sample mean body mass, illustrating how estimates of the mean vary by chance across samples.

The histogram shows the distribution of those same sample means
This distribution describes sampling variability

Figure 6: Sampling distribution of the mean penguin body mass based on repeated random samples of size n = 15. Each value represents the mean body mass from one random sample.

Larger sample sizes produce less variability in the sample means

Each sample is drawn from the same population
All samples estimate the same mean
Smaller samples produce more variable sample means (wider sampling distributions)
Larger samples produce more similar sample means (narrower sampling distributions)

Figure 7: Sampling distributions of the mean penguin body mass for two sample sizes. Smaller samples (n = 10) produce more variable sample means than larger samples (n = 100).

Population Distribution vs Sampling Distribution

The sampling distribution is centered at the population mean
Its spread is much smaller than the spread of individual values

Figure 8: Population distribution of penguin body mass (top) compared to the sampling distribution of the sample mean for samples of size n = 25 (bottom). Vertical lines show the mean of each distribution.

Confidence Intervals

A confidence interval (CI) is a range of plausible values for a population parameter
It is constructed from a sample estimate and a measure of its uncertainty
Values inside the interval are more plausible, given the data and the chosen level of confidence
Wider intervals indicate greater uncertainty; narrower intervals indicate greater precision

Stylized diagram showing a shaded horizontal band labeled as a confidence interval, representing a range of plausible values for a population mean, with a marker indicating that the true population mean lies somewhere within the interval and values outside the band are less plausible. — Confidence interval as a range of plausible values. The shaded band represents a confidence interval, indicating the range of population mean values that are plausible given the data; the true population mean lies somewhere within this range.

What a 95% Confidence Interval Means

A CI reflects the reliability of the method, not certainty about a single interval
The confidence level describes how the method performs under repeated sampling
If we repeatedly sample from the same population and compute a 95% CI each time:
- About 95% of intervals will include the true population parameter
- About 5% will miss it
Whether any one interval contains the true value is unknown
Each confidence interval is unique—width varies depending on which individuals happen to be included in the sample

Figure 9: Repeated 95% confidence intervals for a population mean. Each horizontal line is a confidence interval from one sample; the vertical red line shows the true population mean. Most intervals include the true value, but some do not.

Increasing sample size produces narrower confidence intervals

Larger samples reduce uncertainty even though the underlying population variability remains the same

Figure 10: Repeated 95% confidence intervals for a population mean at two sample sizes (n = 10 and n = 25). Each horizontal line is one CI; the vertical red line is the true population mean. Larger samples produce narrower intervals. Data Source: Palmer Penguins.

Confidence Level Affects CI Width

Even at a fixed sample size, interval width also depends on the chosen confidence level.”
Higher confidence requires a wider interval (95% CI is wider than 90% CI for the same sample)

Figure 11: Confidence level affects confidence interval width. For the same set of random samples (n = 25), 95% confidence intervals are wider than 90% confidence intervals.

The 2SE Rule of Thumb for Confidence Intervals

Simple way to construct a confidence interval: estimate ± 2 × SE
Usually close to a 95% confidence interval
Works well when sample sizes are not very small

Example: 2SE Rule of Thumb with penguin body mass

For one random sample of n = 25 penguins, the sample mean body mass was

\[ \bar{x} = 4389 \text{ g} \]

with sample standard deviation

\[ s = 801.5 \text{ g} \]

The standard error of the mean is
\[ SE = \frac{s}{\sqrt{n}} = \frac{801.5}{\sqrt{25}} = 160.3 \text{ g} \]

Using the rule of thumb, a confidence interval is
\[ \begin{aligned}\text{95% CI} \;&=\; 4389 \pm 2 \times 160.3 \\ &=\; 4389 \pm 320.6 \\ &=\; 4068-4710\end{aligned} \]

Figure 12: Illustration of the 2 × SE rule. The point shows the sample mean (\(\bar{x}\)); the inner and outer bands represent \(\bar{x} \pm 1\,SE\) and \(\bar{x} \pm 2\,SE\).

Error bars illustrate uncertainty

Error bars extend from a sample estimate to show uncertainty in the estimated parameter
They may represent ±1 SE, ±2 SE, or an exact confidence interval (e.g., 95% CI)
Always specify what the error bars represent in the figure legend

Figure 13: Penguin body mass by species. Points show individual penguins; symbols and vertical bars indicate the species mean and 90% confidence interval.

Interpreting Confidence Intervals

Correct

In repeated sampling, 95% of 95% confidence intervals will contain the true population mean
We are 95% confident that the population mean lies within this interval

Not correct

There is a 95% probability that the population mean lies within this specific interval

Confidence Intervals: Book’s Web App

Interactive app (open in browser):

https://www.zoology.ubc.ca/~whitlock/
Kingfisher/CIMean.htm

Screenshot of an interactive web app showing the distribution of repeated random samples and line ranges representing 95% confidence intervals calculated from each sample, used to demonstrate how confidence intervals are calculated from individual samples. — Figure 14: Confidence intervals for the mean simulation. Interactive web app showing repeated random samples drawn from a population and the resulting confidence intervals, illustrating how sampling variability leads to uncertainty in estimates. Source: Whitlock and Schluter, The Analysis of Biological Data (3rd ed.) web app.

What the app shows

Repeated random samples drawn from the same population
The estimated mean and confidence interval for each sample

Try this

Generate confidence intervals using sample size n = 10 and standard deviation sigma = 30
As samples are appearing, slide the selectors right and left.
Questions:
- How do CI’s vary as n changes?
- As the standard deviation changes?

Reporting Means and Confidence Intervals in Writing

Report the estimate, the confidence interval, the confidence level, and the sample size
Use clear units and avoid unnecessary precision
State the statistic in words the first time it appears

Example

Mean penguin body mass was 4389 g (95% CI: 4068–4710 g, n = 25).

Reporting standards vary

Different journals and fields have different expectations for:
- whether CIs are required or optional
- how many decimal places to report
- whether to include SE, SD, or CI
To determine standards:
- check the journal’s author guidelines
- look at recent papers in that journal or field
- follow instructions provided by your instructor or lab manual

Pseudoreplication: When Confidence Is Undeserved

Pseudoreplication occurs when observations are not independent, but are treated as if they are
This often happens when multiple measurements come from the same experimental unit (e.g., repeated measures, subsamples, clustered data)
Pseudoreplication inflates the apparent sample size, leading to confidence intervals that are too narrow
The result is a false sense of precision about the population parameter

Pseudoreplication: Pulse Rate Example

You are interested in estimating the average pulse rate of mountain climbers.

Mountain climbers are hard to find, so you take 10 pulse measurements from each climber
You study 6 climbers, giving 60 total measurements

Question: What is the sample size (\(n\))?

Answer:

The experimental units are the climbers
The correct sample size is \(n = 6\), not 60

Treating the 60 measurements as independent observations would be pseudoreplication, leading to overly narrow confidence intervals and misleading conclusions.

How to Avoid Pseudoreplication

Identify the experimental unit: the unit that is independently sampled (e.g., individual climbers, plots, animals)
Do not treat repeated measurements or subsamples from the same unit as independent data points
When you have multiple measurements per unit:
- Average within each unit before analysis, or
- Use a statistical approach that accounts for non-independence (e.g., paired or repeated-measures designs)
Always report and base inference on the true sample size, not the number of raw measurements

Lecture Summary: Estimating with Uncertainty

Sample estimates vary by chance; this variability is described by the sampling distribution
The standard error (SE) measures uncertainty in an estimate and decreases as sample size increases
A useful approximation for uncertainty is the 2 × SE rule, which gives a confidence interval close to 95%
Confidence intervals express a range of plausible values for a population parameter and must be interpreted using repeated sampling logic
Interval width depends on both sample size and confidence level
Pseudoreplication violates independence, inflates sample size, and produces misleadingly narrow confidence intervals

Lecture 6 Estimating with Uncertainty

Learning Objectives

Samples Vary by Chance

Sampling distribution

Generating a Sampling Distribution

Sampling Distribution: Book’s Web App

Standard Error of the Mean

From Individuals to Samples (Penguin Body Mass)

From Sample Means to a Sampling Distribution

Larger sample sizes produce less variability in the sample means

Population Distribution vs Sampling Distribution

Confidence Intervals

What a 95% Confidence Interval Means

Increasing sample size produces narrower confidence intervals

Confidence Level Affects CI Width

The 2SE Rule of Thumb for Confidence Intervals

Example: 2SE Rule of Thumb with penguin body mass

Error bars illustrate uncertainty

Interpreting Confidence Intervals

Confidence Intervals: Book’s Web App

Reporting Means and Confidence Intervals in Writing

Reporting standards vary

Pseudoreplication: When Confidence Is Undeserved

Pseudoreplication: Pulse Rate Example

How to Avoid Pseudoreplication

Lecture Summary: Estimating with Uncertainty

Lecture 6
Estimating with Uncertainty