Lecture 11
Using the Poisson Distribution
to Test Randomness in Count Data

ABD 3e Chapter 8, Section 4

Chris Merkord

Learning Objectives

By the end of this lecture, you should be able to:

  • Identify when count data are appropriately modeled by a Poisson process.
  • State the assumptions of the Poisson model.
  • Explain how expected frequencies are generated under a Poisson null model.
  • Conduct and interpret a chi-square goodness-of-fit test for count data.
  • Use the variance-to-mean ratio to assess clumping or dispersion.

The Poisson Distribution models random counts in time or space

  • The Poisson distribution models the number of events that occur in fixed blocks of time or space
    • events occur independently of one another
    • events occur at a constant average rate
    • events are equally likely at any instant in time or point in space
  • If these assumptions hold, event counts follow a Poisson distribution.

Clumped and dispersed patterns violate Poisson assumptions

If events are not independent or the rate is not constant, counts will not follow a Poisson distribution.

Instead, events may appear:

  • Clumped (distance between points within clumps is less than distance between points in different clumps)
  • Dispersed (points are near each other less often than you would expect by random)

The Poisson distribution therefore represents a model of random spatial or temporal structure.

The Poisson Probability Formula Defines Event Likelihood

Suppose events are randomly distributed in space:

  • Divide the area into equal grid cells

  • Count events in each cell

  • Tabulate the frequencies of counts

If events are random, the distribution of counts per cell will follow a Poisson distribution.

Once we assume a Poisson process, we can compute exact probabilities for observing different counts:

\[ \operatorname{Pr}[X \text{ successes}] = \frac{e^{-\mu}\mu^{X}}{X!} \]

  • \(\mu\) is the mean number of independent successes in time or space (expressed as a count per unit time or a count per unit space, making this is a rate)

  • \(e\) is the base of the natural log, a constant ≈ 2.718

Random spatial counts produce a characteristic Poisson shape

  • Assume points are distributed randomly

  • Divide area into a grid

  • Count points per square

  • Frequency of counts would follow a Poisson probability distribution

Plot of three Poisson probability mass functions with large colored points connected by lines. Lambda equals 1 is concentrated near zero, lambda equals 4 peaks around four, and lambda equals 10 peaks around ten and appears more symmetric.
Figure 1: Poisson probability distribution for λ = 1, 4, and 10. Points show \(\operatorname{Pr}[X = k]\) for integer values of \(k\), connected by lines to highlight distribution shape. As \(\lambda\) increases, the distribution shifts right and becomes more symmetric.

We use the Poisson distribution to test for randomness

So far, we have described a model.

Now we use the Poisson distribution as a null hypothesis:

  • \(H_0\): Events occur randomly in time or space

  • Observed counts follow a Poisson distribution

This is a Poisson Goodness-of-Fit Test

Workflow when testing whether counts follow a Poisson distribution:

  1. State hypotheses
  2. Estimate the rate parameter (μ) from the data
  3. Compute expected probabilities under the Poisson model
  4. Convert probabilities to expected frequencies
  5. Check chi-square assumptions
  6. Compute chi-square statistic
  7. Determine degrees of freedom
  8. Make a statistical decision
  9. Interpret biologically

Assumptions of the Poisson Model

For counts to follow a Poisson distribution:

  • Events occur independently
  • The average rate ( \(\mu\) ) is constant
  • Events are rare relative to the interval size
  • Two events cannot occur at exactly the same instant

If these assumptions fail, counts may be clumped or dispersed.

Case Study: Are Mass Extinctions Random Through Time?

Biological question:

Do extinctions occur randomly through geological time, or are there intervals with unusually high extinction rates?

If extinctions are random, counts per time interval should follow a Poisson distribution.

Summary of major extinction events through time, highlighting the newly identified Carnian Pluvial Episode at 233 million years ago. D. Bonadonna/MUSE, Trento/EurekAlert!, CNN

Summary of major extinction events through time, highlighting the newly identified Carnian Pluvial Episode at 233 million years ago. D. Bonadonna/MUSE, Trento/EurekAlert!, CNN

Raup and Sapkowski (1982)

  • Raup and Sepkoski (1982) measured extinction rate of marine invertebrates in 76 blocks of time

Figure from Raup DM, Sepkoski JT. 1982. Mass extinctions in the marine fossil record. Science 215: 1501-1503.

Figure from Raup DM, Sepkoski JT. 1982. Mass extinctions in the marine fossil record. Science 215: 1501-1503.

Observed Extinction Counts Across 76 Time Intervals

These are the observed counts we will compare to Poisson expectations.

Raup DM, Sepkoski JT. 1982. Mass extinctions in the marine fossil record. Science 215: 1501-1503.

Formulating hypotheses about random extinction rates

If extinction events are random in time:

  • Counts per interval should follow a Poisson distribution

Therefore:

  • \(H_0\): Extinctions per interval follow a Poisson distribution
  • \(H_A\): Extinctions per interval do not follow a Poisson distribution

This is a goodness-of-fit test.

Estimate the mean rate ( \(\mu\) ) from the observed data

Step 1: Estimate the Poisson rate parameter ( \(\mu\) ) under \(H_0\).

We estimate \(\mu\) using the sample mean number of extinctions per interval.

Number of extinctions Frequency
0 0
1 13
2 15
3 16
4 7
5 10
6 4
etc.

\[ \bar{X}=\frac{(0\times0)+(1\times13)+(2\times15)+\dots}{76} \\ = 4.21 \]

We can use this expected mean \(X\) in place of \(\mu\) in the formula for the Poisson distribution to generate the expected frequencies

Use the Poisson Model to Generate Expected Frequencies

\[ \operatorname{Pr}[X \text{ successes}] = \frac{e^{-\mu}\mu^{X}}{X!} \]

\[ \operatorname{Pr}[3 \text{ extinctions}] = \frac{e^{-4.21} \cdot 4.21^{3}}{3!} \]

\[ \operatorname{Pr}[3 \text{ extinctions}] = 0.1846 \]

\[ \operatorname{E}[3 \text{ extinctions}] = 76 \times 0.1846 = 14.03 \]

  • Step 2: Use the estimated μ to compute expected probabilities.
    • Example: The probability that a block of time has exactly 3 extinctions is 0.1846
  • Step 3: Convert probabilities into expected frequencies.
    • Example: Therefore the expected number of blocks containing 3 extinctions is 14.03
  • Now repeat these steps for each number of extinctions to obtain all expected frequencies

Comparing Observed and Expected Extinction Frequencies

Step 4: Compare observed counts to expected counts.

If differences are large relative to sampling variation, we reject \(H_0\).

The frequency distribution of the number of extinctions (histogram) compared with the frequencies expected from the Poisson distribution having the same mean (curve).

Chi-Square Assumptions Are Violated with Sparse Expected Counts

Before computing the chi-square statistic, we must check assumptions:

  • No expected value < 1

  • No more than 20% of expected counts < 5

These data violate those assumptions.

Grouping Categories Resolves Chi-Square Assumption Violations

  • To satisfy chi-square assumptions, we combine adjacent categories.

  • This reduces small expected counts and allows valid inference.

Degrees of Freedom Determine the Chi-Square Decision

  • We subtract 1 degree of freedom because:

    • Probabilities must sum to 1

    • We estimated one parameter (μ) from the data

  • This reduces the available independent information.

\[ df=(\operatorname{number of categories})-1-(\operatorname{number of parameters}) \\ df = 8 - 1 - 1 = 6 \]

  • Critical value for chi-square with 𝑑𝑓=6 at 𝛼=0.05 : 𝟏𝟐.𝟓𝟗

  • Our chi-square test statistic: 𝟐𝟑.𝟗𝟑

  • Do we reject \(H_0\) or not?

Rejecting \(H_0\) suggests extinction is not random

  • Test statistic: 23.93

  • Critical value (df = 6, α = 0.05): 12.59

  • Because 23.93 > 12.59:

    • We reject \(H_0\)

Statistical conclusion:

  • Extinction counts do not fit a Poisson distribution.
  • We do not know the exact alternative distribution, just that a Poisson model is inadequate

Biological interpretation:

  • Extinctions are not occurring at a constant random rate.

  • There are intervals with unusually high extinction rates.

  • This pattern is consistent with episodic mass extinction events.

Variance-to-Mean Ratio Reveals Clumping or Dispersion

  • An alternative diagnostic for randomness:

    Compare the variance to the mean.

    • Variance ≈ Mean → Consistent with Poisson

    • Variance > Mean → Clumped pattern

    • Variance < Mean → Dispersed pattern

  • This provides a quick check before formal testing.

Where the Poisson Fits in Our Probability Toolkit

Binomial distribution:

  • Fixed number of trials
  • Probability of success per trial

Poisson distribution:

  • Counting events in time or space
  • No fixed number of trials
  • Models rates rather than proportions