Lecture 19
Comparing means of more than two groups

ABD 3e Chapter 15

Chris Merkord

Learning Objectives

  • Explain why multiple pairwise tests inflate Type I error and why ANOVA is needed
  • Describe how total variation is partitioned into between-group and within-group components
  • Interpret sum of squares and mean squares as measures of variation
  • Explain the \(F\)-statistic as a ratio of between-group to within-group variation
  • State and interpret the hypotheses tested in a one-way ANOVA
  • Interpret ANOVA output, including the meaning of a significant \(F\)-test
  • Identify assumptions of ANOVA and assess when they may be violated
  • Describe alternatives to ANOVA, including data transformation and the Kruskal–Wallis test

Many biological questions involve more than two groups

  • Experiments often include multiple treatments, not just two
  • Example: two medications and a placebo control
  • This allows us to ask richer questions:
    • Are both treatments better than the control?
    • Is one treatment better than the other?
    • How large are these differences?
Figure 1: Image: La Trobe University (CC BY-NC-SA 4.0).

Comparing groups two at a time inflates false positives

  • One approach is to run multiple two-sample tests:
    • Group 1 vs 2
    • Group 2 vs 3
    • Group 1 vs 3
  • This seems reasonable, but it does not scale with more groups
Three treatment groups labeled A. Placebo, B. Moderate dose, and C. High dose, each shown with a differently colored bug icon. Above them are three comparison brackets labeled A-B, B-C, and A-C.
Figure 2: Pairwise comparisons among three treatment groups. Each bracket represents one of the three two-group comparisons that would be made if the groups were analyzed two at a time.

Multiple tests inflate Type I error

  • Problem:
    • Each test has a chance of a Type I error (false positive)
    • Multiple tests increase the chance of at least one false positive
  • Example:
    • 5 groups → 10 pairwise tests
    • Up to ~40% chance of at least one false positive if all nulls are true
Line plot showing probability of at least one Type I error on the y-axis and number of pairwise comparisons on the x-axis. The curve increases from near 0 to close to 1 as the number of comparisons increases. A dashed horizontal line marks alpha = 0.05.
Figure 3: Probability of at least one Type I error as the number of pairwise comparisons increases (\(\alpha\) = 0.05). Assuming independent tests, the probability rises rapidly with more comparisons.

We need a single test for all groups

  • Goal:
    • Test for differences across all groups at once
  • Avoid:
    • Repeated testing
    • Inflated Type I error

Analysis of Variance (ANOVA)

ANOVA compares all group means simultaneously

  • Analysis of variance (ANOVA) tests for differences among multiple means
  • Uses a single overall test
  • Based on:
    • Comparing variation among groups to variation within groups
  • Tests:
    • Are individuals from different groups, on average, more different than individuals from the same group?

ANOVA tests variation to detect differences in means

  • The name (analysis of variance) can be misleading:
    • We are interested in means, not variances
  • Key idea:
    • If group means differ → there will be variation among groups
  • Therefore:
    • Testing for variation among groups tells us whether means differ

One-way ANOVA analyzes one explanatory variable

  • One-way ANOVA:
    • One explanatory variable (factor)
    • Multiple groups defined by that factor
  • Examples:
    • Treatment type
    • Habitat type
    • Species

Case Study: Does light exposure affect circadian phase shift?

  • Study of how light exposure shifts the body’s internal clock (circadian rhythm)

  • 22 participants randomly assigned to one of three treatments:

    • No light (control)
    • Light to knees
    • Light to eyes
  • Each person received a single 3-hour light exposure

  • Researchers measured how much each person’s internal clock shifted

Dot plot showing individual phase shift values for three groups: control, knees, and eyes. Each group has several open circles representing participants. Filled dots indicate group means with vertical error bars. The eyes group shows more negative values (greater delays), while control and knees groups are closer to zero.
Figure 4: Phase shift in circadian rhythm (melatonin production) for participants exposed to different light treatments (control, knees, eyes). Open circles show individual participants; filled points with error bars show group means ± standard error. Whitlock & Schluter, The Analysis of Biological Data, 3e © 2020 W. H. Freeman and Company

ANOVA asks: where does the variation come from?

  • Data vary across individuals, even within the same group
  • Some variation is due to:
    • real differences among groups
    • random variation within groups
  • Goal: Compare between-group variation to within-group variation

Total variation can be partitioned

  • Total variation: how much all observations vary around the overall mean

  • This can be split into two parts:

    • Between-group variation: differences among group means
    • Within-group variation: variation among individuals within groups
  • ANOVA works by comparing these two sources of variation

Partitioning variation in a real dataset

  • Same data shown three ways
    • Total: differences between each observation and the overall mean
    • Groups: differences between each group mean and the overall mean
    • Error: differences between observations and their group mean
  • Total variation = Groups + Error
  • ANOVA asks whether:
    • variation among group means is large relative to variation within groups

Three-panel plot labeled Total, Groups, and Error showing the same data for control, knees, and eyes treatments. Points represent individual observations. The Groups panel shows differences among group means, while the Error panel shows variation within each group around its mean.

Partitioning total variation into between-group (Groups) and within-group (Error) components using circadian phase shift data for three light treatments. Whitlock & Schluter, The Analysis of Biological Data, 3e © 2020 W. H. Freeman and Company.

Mean squares summarize variation

  • ANOVA uses mean squares (MS) to measure variation
  • Two key quantities:
    • \(MS_{groups}\): variation among group means
    • \(MS_{error}\): variation within groups
  • Each is:
    • a measure of variability
    • converted to an average amount of variation per sample unit
  • Larger values = more variation
  • Start with sum of squares (SS):
    • sum of squared deviations from a mean
  • Convert to mean squares: \(MS = \frac{SS}{df}\)
  • So:
    • \(MS_{groups} = \frac{SS_{groups}}{df_{groups}}\)
    • \(MS_{error} = \frac{SS_{error}}{df_{error}}\)
  • Dividing by \(df\) puts both on the same scale so they can be compared

The \(F\)-statistic compares two sources of variation

  • ANOVA test statistic:

\[ F = \frac{MS_{groups}}{MS_{error}} \]

  • Interpretation:
    • \(F \approx 1\) : groups are similar
    • \(F > 1\) : group means differ more than expected by chance
  • The larger the ratio:
    • the stronger the evidence for differences among groups

ANOVA hypotheses

  • Null hypothesis:

\[ H_0: \mu_1 = \mu_2 = \cdots = \mu_k \]

  • Alternative hypothesis:
    • Not all group means are equal
  • Important:
    • ANOVA tests for any difference, not which groups differ

Interpreting the ANOVA result

  • If \(F\) is large → small p-value:
    • Reject \(H_0\)
    • Evidence that at least one group mean differs
  • If \(F\) is near 1:
    • Fail to reject \(H_0\)
    • Differences are consistent with random variation
  • Next step (if significant):
    • Determine which groups differ