Lecture 14
Inference for Population Means:
Confidence Intervals and t-Tests

ABD 3e Chapter 11

Chris Merkord

Learning Objectives

By the end of this lecture, you should be able to:

  • Explain why the t-distribution is used instead of the normal distribution when the population standard deviation is unknown
  • Describe how degrees of freedom influence the shape of the t-distribution
  • Construct and interpret a confidence interval for a population mean
  • State null and alternative hypotheses for a one-sample t-test
  • Interpret the results of a t-test using t, degrees of freedom, p-value, and confidence interval

From Probability to Statistical Inference

  • Previous lectures introduced probability distributions and hypothesis testing for categorical data and proportions.
  • Today we apply the same inferential framework to means of numerical variables.
  • The goal is to estimate and test hypotheses about a population mean using sample data.
Diagram illustrating statistical inference. A large circle in the upper left contains many colored dots representing a population. An arrow labeled “Sample” points to a smaller square in the upper right containing a subset of dots representing a sample. A downward arrow labeled “Parameter” leads from the population to a graph labeled “Population Parameter,” which shows a single point marking the population mean. Another downward arrow labeled “Estimation” leads from the sample to a graph labeled “Sample Statistic,” which shows a point with a vertical line range indicating uncertainty in the estimated mean. Both graphs have a vertical axis labeled “Mean (μ)” and no horizontal axis.
Figure 1: Conceptual diagram showing how a population mean (parameter) and a sample mean (statistic) are obtained from a population and a sample. Source: Illustration generated by ChatGPT.

Sampling Distributions (Review)

  • Statistics such as the sample mean vary from sample to sample due to sampling variability.
  • The sampling distribution of the mean describes the distribution of possible sample means.
  • If the population is normal, the sampling distribution of the mean is also normal.
Diagram showing a population of 5th-grade students, many random samples drawn from that population, the sample mean from each sample, and a histogram of those sample means, illustrating the sampling distribution of the mean.
Figure 2: Sampling distribution of the sample mean. Repeated random samples from the same population produce different sample means; the distribution of those means forms the sampling distribution. Source: GeeksforGeeks, “Sampling Distribution.

Standard Error of the Mean (Review)

  • The spread of the sampling distribution is measured by the standard error (SE) of the mean.
  • Standard error quantifies how far sample means tend to vary from the population mean.
  • Larger sample sizes reduce the standard error and improve precision of the estimate.
Figure 3: Sampling distributions of the mean penguin body mass for two sample sizes. Smaller samples (n = 10) produce more variable sample means than larger samples (n = 100).

When the Population SD Is Unknown

Population standard error

  • The standard error of the mean (SE) describes the expected variability of sample means around the population mean.

  • If the population standard deviation (SD) is known, the population SE is calculated using \(\sigma\).

  • This formula describes the theoretical variability of the sample mean when the population variability is known.

Estimated standard error

  • In practice, the population standard deviation \(\sigma\) is almost never known.

  • Instead, we estimate variability using the sample standard deviation \(s\) calculated from the observed data.

  • Replacing \(\sigma\) with \(s\) gives an estimated standard error of the mean.

\[ SE = \frac{\sigma}{\sqrt{n}} \]

\[ SE = \frac{s}{\sqrt{n}} \]

Standardizing the Mean (Review: Z statistic)

  • In statistics, we often want to test:

    Does a sample provide evidence that a population mean differs from a hypothesized value.

  • To evaluate how unusual a sample mean is, we measure its distance from the pop. mean in units of standard error.

  • If the population standard deviation \(\sigma\) were known, we could use the Z statistic.

\[ Z = \frac{\bar{x} - \mu}{\sigma / \sqrt{n}} \]

  • The Z statistic measures how many standard errors the sample mean is from the population mean.

The Z distribution

Bell-shaped standard normal distribution centered at zero, with the horizontal axis labeled Z and the vertical axis labeled probability density.
Figure 4: Standard normal (Z) distribution used to measure how far a sample mean is from a hypothesized population mean in standard error units.

Z Scores vs Z Statistics

  • The previous equation may look unfamiliar.

  • In earlier lectures, we standardized individual observations using the population standard deviation \(\sigma\).

  • When working with sample means, we instead standardize using the standard error of the mean.

  • Sample means vary less than individual observations because they are averages of multiple values.

  • The variability of sample means decreases as sample size increases, which is why the denominator includes \(\sqrt{n}\).

Situation Formula Term Interpretation
Individual observation \(Z = \frac{x - \mu}{\sigma}\) Z score Distance of an observation from the population mean in standard deviations
Sample mean \(Z = \frac{\bar{x} - \mu}{\sigma/\sqrt{n}}\) Z statistic Distance of the sample mean from the population mean in standard errors

Replacing \(\sigma\) with \(s\) → the t statistic

  • In practice, the population standard deviation \(\sigma\) is rarely known.

  • Instead, we estimate population variability using the sample standard deviation \(s\).

  • Substituting \(s\) for \(\sigma\) produces the t statistic.

\[ t = \frac{\bar{x} - \mu}{s / \sqrt{n}} \]

  • Because \(s\) varies from sample to sample, this introduces extra uncertainty in the standardized mean.

  • As a result, the statistic follows a t distribution rather than the normal distribution.

The t distribution

Bell-shaped t distribution centered at zero, with the horizontal axis labeled t and the vertical axis labeled probability density.
Figure 5: Student’s t distribution for a small sample size, showing the distribution of the standardized mean when the population standard deviation is estimated from the sample.

The t distribution

  • When the standard deviation is estimated from the sample, the standardized mean follows a t distribution.
  • The t distribution is similar to the normal distribution but has heavier tails.
  • Heavier tails reflect the additional uncertainty introduced when estimating \(\sigma\) with \(s\).
  • The shape of the t distribution depends on the degrees of freedom.
  • As sample size increases, the t distribution approaches the normal distribution.

Comparing the Z and t distributions

Line graph showing two bell-shaped curves centered at zero, one labeled Z and one labeled t. The t curve is slightly lower at the peak and higher in the tails than the Z curve.
Figure 6: Comparison of the standard normal and Student’s t distributions. The t distribution has heavier tails because estimating the population standard deviation adds uncertainty. Figure generated using R.

Degrees of Freedom

  1. The exact shape of the t distribution depends on the number of degrees of freedom.
  2. For a sample mean the degrees of freedom are \(df = n - 1\).
  3. Larger degrees of freedom produce a distribution closer to normal.
Line graph showing several bell-shaped curves centered at zero representing t distributions with different degrees of freedom, gradually becoming narrower and more similar to a normal distribution as degrees of freedom increase.
Figure 7: Student’s t distributions with different degrees of freedom. As degrees of freedom increase, the t distribution approaches the normal distribution. Figure generated using R.

What Are We Actually Estimating?

  • The quantity of interest is the population mean \(\mu\).
  • The sample mean \(\bar{x}\) provides an estimate of this parameter.
  • Because of sampling variability our estimate always contains uncertainty.
Density curve representing a population distribution with a vertical line marking the population mean and a point with a horizontal confidence interval showing an estimate of the mean from a sample.
Figure 8: A sample mean provides an estimate of the population mean. The confidence interval shows the uncertainty in that estimate.

Confidence Intervals for the Mean

  • A confidence interval provides a range of plausible values for the population mean.
  • The interval accounts for uncertainty caused by sampling variability.
  • Is centered around the sample mean.
  • Extends outward by a margin of error determined by the t distribution.
  • Wider intervals indicate greater uncertainty in the estimate.
Bell-shaped sampling distribution centered at a sample mean, with vertical lines marking the lower and upper bounds of a confidence interval and the sample mean labeled in the center.
Figure 9: A confidence interval for a population mean constructed from a sample. The interval is centered on the sample mean and extends outward by a margin of error determined by the t distribution.

Calculating a Confidence Interval

  • Confidence intervals for means are calculated using the t distribution.
  • The interval combines the sample mean, the standard error, and a critical t value.
    • The critical value comes from the t distribution and depends on the chosen significance level ( \(\alpha\) ) and the degrees of freedom ( \(df = n - 1\) ) for the sample.
  • The formula describes how far the interval extends from the sample mean.

\[ \bar{x} \pm t_{\alpha/2,df}\,SE \]

Equation for a confidence interval with labels identifying the sample mean, critical t value, and standard error.
Figure 10: Annotated diagram showing components of the confidence interval formula. Source: Whitlock & Schluter 3e.

Interpreting a Confidence Interval

  • A 95% confidence interval captures the true mean in 95% of repeated samples.
  • The interval represents uncertainty in estimating the population mean.
  • Narrow intervals indicate more precise estimates.
Figure 11: Repeated 95% confidence intervals for a population mean. Each horizontal line is a confidence interval from one sample; the vertical red line shows the true population mean. Most intervals include the true value, but some do not.

Hypothesis Testing (Reminder)

  • Earlier lectures introduced the general framework of hypothesis testing.
  • We compare observed data with expectations under a null hypothesis.
  • If the result is unlikely under the null hypothesis we reject it.
Histogram-like probability distribution for the number of right-handed toads, centered near the expected value of 9 under the null hypothesis. The observed value of 14 is marked on the right side, and red bars in both tails highlight outcomes as extreme or more extreme than the observation, representing the p-value.
Figure 12: Probability distribution of the number of right-handed toads expected under the null hypothesis. The red tail regions represent outcomes at least as extreme as the observed value, which together form the p-value. Whitlock & Schluter 3e.

The one-sample t test

  • The one-sample t test evaluates whether a sample mean differs from a hypothesized mean \(\mu_0\).
  • Null hypothesis: \(H_0 : \mu = \mu_0\)
  • Alternative hypothesis: \(H_A : \mu \ne \mu_0\)
  • The difference between \(\bar{x}\) and \(\mu_0\) is scaled by the standard error.
  • The resulting t statistic measures how extreme the observed difference is.

\[ t = \frac{\bar{x}-\mu_0}{SE} \]

Bell-shaped t distribution with four degrees of freedom centered at zero. The horizontal axis ranges from −4 to 4 and the vertical axis shows probability density. The outer tails beyond −2.78 and 2.78 are shaded red, each labeled 2.5%, indicating the rejection regions for a two-tailed test at the 5% significance level.
Figure 13: Student’s t distribution with 4 degrees of freedom showing the rejection regions for a two-tailed test at α = 0.05. Each tail contains 2.5% of the probability, with critical values at 𝑡 = ± 2.78.

One-Sided and Two-Sided Tests

  • A two-sided test evaluates whether the mean differs in either direction from \(\mu_0\).
  • A one-sided test evaluates whether the mean is greater than or less than \(\mu_0\).
  • Two-sided tests are typically used unless a directional hypothesis is justified.
(a) Two-tailed t test showing rejection regions in both tails (α = 0.05, 2.5% in each tail).
(b) One-tailed t test showing the rejection region in a single tail (α = 0.05).
Figure 14: Comparison of rejection regions for two-tailed and one-tailed t tests. In a two-tailed test, the significance level (α = 0.05) is split between both tails of the distribution, producing two rejection regions of 2.5% each. In a one-tailed test, the entire significance level is placed in a single tail, producing a single rejection region of 5%.

Assumptions of t Tests

  • Data should represent a random sample from the population.
  • The variable should be approximately normally distributed.
  • Moderate deviations from normality are often tolerated with larger samples.
Panel of small histograms with blue density curves showing the distributions of several stream chemistry variables. The shapes vary widely across panels, including roughly symmetric, right-skewed, and irregular distributions, illustrating that real-world variables often deviate from normality.
Figure 15: Distributions of stream chemistry variables from the Luquillo LTER dataset. Some variables are approximately symmetric, while others are strongly skewed, illustrating the diversity of real-world data distributions.

Sample Size and Interpretation

  • Larger samples reduce the standard error and narrow confidence intervals.
  • With large samples even small differences may become statistically significant.
  • Statistical significance does not always imply biological importance.
    • Example: Average systolic blood pressure in a clinic sample (121 mmHg) differs significantly from the recommended value of 120 mmHg, but a 1 mmHg difference is unlikely to have meaningful consequences for patient health.
    • This is why you always report effect size (means, differences in means), not just p-values
Figure 16: Repeated 95% confidence intervals for a population mean at two sample sizes (n = 10 and n = 25). Each horizontal line is one CI; the vertical red line is the true population mean. Larger samples produce narrower intervals. Data Source: Palmer Penguins.

Reporting Results

  • For a one-sample t-test:
    • Report the sample mean, standard deviation, and confidence interval.
    • Provide the t statistic, degrees of freedom, and p value.
  • For an estimated mean:
    • Report the mean and confidence interval.
  • Results should be interpreted in the biological context of the study.

Example:

I tested whether the average systolic blood pressure in my sample differed from the clinical reference value of 120 mmHg and found a statistically significant result (one-sample t-test, t39=-2.45, two-sided p=0.019).

The mean blood pressure was 117.2 mmHg, a difference of −2.8 mmHg from the reference value, with a 95% confidence interval of −5.1 to −0.5 mmHg, indicating a small effect that is statistically detectable but unlikely to be clinically meaningful for individual patients.