ABD 3e Chapter 13
Many methods assume approximate normality of the data, especially for small sample sizes.
Often frequency distributions aren’t normal, and variances aren’t equal
What to do?
Graphical methods (eyeball test)
Plot the data
Histograms of variables
Split into categories if you have them
Make a normal quantile plot
Not all data follow a symmetric, bell-shaped distribution
Common deviations:
These deviations can affect statistical methods that assume normality
Next: how to detect and handle these situations
Each point = one observation
X-axis: expected values if the data were normal with the same mean and standard deviation
Y-axis: observed values, sorted from smallest to largest
Interpretation:
Halpern (2003) calculated biomass ratio as the total mass of all marine plants and animals per unit area of reserve divided by the same quantity in an unprotected control.
A Shapiro-Wilk test evaluates the goodness-of-fit of a normal distribution to a sample.
Hypotheses:
HA: The data are sampled from a population not having a normal distribution
H0: The data are samples from a population having a normal distribution.
Warnings:
A small sample size might not yield enough power to reject the null even when data are non-normal
With large samples, even small deviations from normality can lead to rejection.
BUT, as sample size increases, assumption of normality becomes less important
Take-home: plot the data and use common sense
Answer is not sensitive to violations of the assumptions (some methods are considered robust, others not, learn which)
Not unduly affected by outliers
Provide good performance even with small departures from normal distributions
\(F\)-test is not robust to non-normality
Levene’s test is
Sometimes we can take data and transform it so it better meets assumptions
A data transformation changes each measurement by the same mathematical formula
Use a “prime” mark ( \(\prime\) ) to denote transformed data e.g. \(Y^\prime\)
The log transformation is the most common transformation in biology
Procedure:
\[ Y^\prime = \operatorname{ln}[Y] \]
R code:
Most likely to be useful when:
Measurements are ratios or products of variables
Frequency distribution of data is skewed right (long tail on right)
The group having the larger mean (when comparing two groups) also has the higher standard deviation
The data span several orders of magnitude
Commonly used for:
Measurements such as body size and body mass
Count data (e.g. number of individual organisms in an area)
If data contain zeros, you can’t take log. Sometimes a small constant (e.g., +1) is added, but this should be justified.
\[ Y^\prime = \operatorname{ln}[Y+1] \]
R code:
\[ p\prime = \operatorname{arcsin}[\sqrt{p}] \]
\[ Y^\prime = \sqrt{Y} \]
\[ Y^\prime = Y^2 \]
\[ Y^\prime = e^Y \]
\[ Y^\prime = \frac{1}{Y} \]
Back-transformations
| Transform | Back-transform |
|---|---|
| Log | Antilog |
| Arcsine square root | Sine square \((\operatorname{sin}[Y^\prime])^2\) |
| Square root | Square |
| Square | Square root |
| Antilog | Log |
| Reciprocal | Reciprocal |
Interpretability can suffer (values no longer in original units, harder to explain to others)
Some transforms do not work well with zero, small numbers, or negative numbers
Avoid multiple testing
Do not just try different transformations until you find one that gives a \(p\)-value smaller than 0.05
This increases your chance of a Type I error
Instead, decide a priori (ahead of time) which transformation best yields data that meet the assumptions of the statistical method
Parametric methods
Nonparametric methods
Make fewer assumptions about the distribution of the variables
Usually based on ranks of the data points, not their actual values
Use these when data are not approximately normal (for one-sample or paired comparisons).
Sign test
Wilcoxon signed-rank test
Wilcoxon Rank-Sum Test (aka Mann-Whitney \(U\)-test) compares the distributions of two groups
Permutation tests
Bootstrap
Some types of data require different models:
These are examples of generalized linear models (GLMs)
Avoid forcing normality with transformations when a suitable model exists

BIOL 275 Biostatistics | Spring 2026