Lecture 18
Designing Experiments

ABD 3e Chapter 14

Chris Merkord

Learning Objectives

Distinguish between association and causation and explain why observational studies cannot establish causation
Define confounding and explain how it biases inference
Describe how controls, random assignment, and blinding reduce bias in experiments
Identify experimental units and distinguish true replication from pseudoreplication
Explain how replication, balance, and blocking reduce sampling error and improve precision
Interpret factorial designs and explain how interactions between factors arise
Describe how matching and adjustment reduce confounding in observational studies
Explain how sample size is chosen to achieve desired precision or statistical power

Experiments

Observations Are Not Enough

Observational studies
- Reveal patterns in real-world data
- Researchers observe and measure variables as they naturally occur
But patterns alone do not tell us why they occur
Association: two variables change together
Causation: a change in one variable directly produces a change in another
Multiple explanations can produce the same association
Association does not imply causation

Figure 1: Correlation. Image: XKCD (CC BY-NC 2.5)

Why Causal Inference Is Hard

Confounding: a third variable influences both variables
Directionality: cause and effect can be reversed
These problems cannot be resolved with more data alone
We need a way to isolate the effect of a single factor → experiments

Triangle diagram showing “In the summer…” at the top with arrows pointing to “ice cream” and “drowning deaths.” A crossed-out arrow between ice cream and drowning deaths indicates no direct causal relationship. — Figure 2: Confounding example: a third factor (summer conditions) increases both ice cream consumption and drowning deaths, creating a misleading association without a direct causal link.

Example: When Observational Studies Mislead

Observational studies suggested that hormone replacement therapy (HRT) reduced heart disease risk in women (Stamfer et al. 1991 New England J Med)
Women taking HRT had lower rates of heart disease
Conclusion (at the time): HRT protects against heart disease

Black-and-white table of cardiovascular disease outcomes by hormone use with colored annotations. A magenta box surrounds the "RR (95% CI)" column header. Blue boxes highlight RR values of 1.0 for the no hormone use group. Orange boxes highlight lower RR values for current hormone users across outcomes, indicating an apparent protective association. — Figure 3: Annotated table from Postmenopausal estrogen therapy and cardiovascular disease showing relative risk (RR) estimates for cardiovascular outcomes by hormone use. The magenta box highlights the RR column label, blue boxes mark the reference group (no hormone use, RR = 1.0), and orange boxes highlight reduced RR estimates among current hormone users in the observational data (Stamfer et al. 1991 New England J Med).

What Was the Problem?

Women who chose HRT differed in important ways:
- Higher socioeconomic status
- Better access to healthcare
- Healthier lifestyles overall
These differences (confounders) also reduce heart disease risk

What Happened in an Experiment?

A randomized trial (Women’s Health Initiative) assigned HRT randomly
Result: HRT did not reduce heart disease risk (and increased some risks)
The original association was due to confounding, not causation

Figure 4: Kaplan–Meier Estimates of Cumulative Hazard Rates of CHD (Manson et al. 2003 New England J Med) showing similar rate of chronic hearth disease (CHD) among control and treatment groups.

What Is an Experiment?

A study where researchers actively impose conditions on a system
Researchers assign different conditions (treatments) to experimental units
Outcomes are then compared across those conditions
This allows us to isolate the effect of a specific factor

Why Experiments Work

Experiments are designed to support causal inference
By controlling how conditions are assigned, we reduce alternative explanations
Properly designed experiments break the link between confounders and treatment
Differences in outcomes can be attributed to the treatment

Key Idea

Observational studies measure what already exists
Experiments create conditions for comparison
This is what allows us to move from association → causation

Eliminating Bias

Be Wary of Bias in Your Design

Biased experiments produce biased conclusions
They tell you about your design, not the real world
Bias must be addressed before data are collected
- It cannot be fixed afterward
Large \(n\) does not solve bias
- It can make biased results more convincing

Two target diagrams side by side. In both, individual points represent sample estimates and the bullseye represents the true population value. The first target shows points widely scattered around the center, indicating sampling error or imprecision. The second shows a tight cluster of points located away from the center, indicating systematic error or bias. — Figure 5: Comparison of sampling error and systematic error: each point represents a sample estimate, and the center of the target represents the true population value. Imprecision produces a wide spread of sample estimates centered on the truth, whereas bias produces tightly clustered estimates that are consistently offset from the truth.

Design Features That Reduce Bias

Three core strategies:
1. Controls
  - Provide a baseline for comparison
2. Random assignment
  - Breaks the link between confounders and treatment
3. Blinding
  - Prevents subjects and researchers from influencing outcomes

Eliminating Bias: Strategy 1 — Use Controls

A control group provides a baseline for comparison
Control units are treated as similarly as possible to treatment units
- Except for the treatment itself
This allows us to isolate the effect of the treatment
Without a control, we cannot determine cause and effect

What Makes a Good Control?

A good control matches the treatment group in all relevant ways
The only systematic difference should be the treatment
Poor controls introduce new differences (new confounding)
Doing nothing is not always an appropriate control

Control Example: Placebo

Outcomes can change simply because a treatment is given

A placebo mimics the treatment without an active ingredient

Good placebo: indistinguishable from the treatment

Bad placebo: differs in noticeable ways (e.g., taste, side effects)

Side-by-side realistic images of the same man smiling while being given a pill. In one panel he receives a plain white pill (placebo), and in the other a colored capsule (drug). His expression is similarly positive in both, indicating a comparable response regardless of treatment. — Figure 6: Illustration of the placebo effect: a participant shows a positive response whether receiving an active drug or a placebo, demonstrating how expectations can produce similar perceived and reported outcomes in both groups.

Why Controls? Example: Independent Recovery

People often seek treatment at their worst.
Therefore, people often see their doctor when they are on their way to recovery.
To measure the effects of a new therapy, we need a comparable control group.

Eliminating Bias: Strategy 2 — Random Assignment

Random assignment: assign treatments to units by chance
This breaks the link between confounders and treatment
Known and unknown differences are balanced on average across groups
Prevents systematic differences between groups

Why Random Assignment Matters

Without randomization, group differences can reflect confounding
Non-random assignment can create bias
Example: assign treatment by last name
- May group family members or cultural backgrounds together
- Creates systematic differences between groups
Randomization prevents these patterns

How to randomly assign

Identify the experimental units
Assign treatments using a random process
Goal: each unit has an equal chance of receiving each treatment
In practice:
- Use a random number generator
- Use software (e.g., R)

library(tidyverse)
tibble(id = 1:10) |>
  mutate(
    treatment = sample(
      x = c("control", "treatment"),
      size = n(),
      replace = TRUE
    )
  )

1: Create a tibble (data frame) with one variable id with values of 1 through 10
2: Modify the variables in the tibble
3: Add a new treatment variable using the sample function, which randomly draws values
4: The values should consist of either control or treatment
5: n() ensures the number of values in the sample matches the number of rows in the tibble
6: replace=TRUE ensures true random assignment (which means it allows unequal counts)

Results of random assignment

Randomization does not eliminate confounders
It removes systematic bias by breaking their association with treatment
Remaining differences are due to chance (sampling error)

Eliminating Bias: Strategy 3 — Blinding

Blinding: keeping participants and/or researchers unaware of treatment assignment
Prevents expectations from influencing outcomes
- Participants may respond differently if they know their treatment
- Researchers may treat or measure subjects differently
Reduces bias introduced during data collection

Types of Blinding

Single-blind: participants do not know their treatment
- Prevents subject expectations from influencing outcomes
Double-blind: neither participants nor researchers know
- Prevents both subject and researcher bias
Stronger blinding → less opportunity for bias

Why Blinding Matters

Knowledge of treatment can influence:
- Behavior
- Reporting of symptoms
- Measurement of outcomes
Unblinded studies often show larger effects
- These may reflect bias, not true treatment effects

A realistic image of a woman with a frustrated expression, looking off to the side. A thought bubble above her head reads “No wonder I feel bad…” with a placebo symbol, indicating she believes she received a placebo and is interpreting and reporting her symptoms accordingly. — Figure 9: Illustration of lack of blinding: a participant believes they received a placebo and attributes their symptoms to it, showing how expectations can influence both perceived and reported outcomes.

Blinding in Practice

Blinding requires careful design
- Treatments must be indistinguishable
Placebos are often used to maintain blinding
If blinding fails, bias can re-enter the study

Sampling Error

Reducing sampling error improves precision and power

Even unbiased experiments have variability among individuals (“noise”)
This variability creates sampling error in estimates
Sampling error reduces:
- Precision of estimates
- Power to detect treatment effects

Holding conditions constant reduces noise but limits generality

Reduce noise by keeping conditions constant:
- Environment (e.g., temperature, humidity)
- Participant characteristics (e.g., age, sex, genotype)
Tradeoff:
- More control → less variability
- But results may not generalize broadly

Overly narrow conditions can create bias in applicability

Restricting study populations limits who results apply to
Example:
- Many clinical trials historically included only men
- Results were applied broadly, including to women
Design decisions affect external validity

Key design strategies reduce sampling error

Key design strategies:
1. Replication
2. Balance
3. Blocking
4. Using extreme treatments
Goal: reduce noise without sacrificing generality

Replication is essential to separate signal from noise

Replication: applying each treatment to multiple experimental units
Without replication:
- Cannot distinguish treatment effects from random variation
More replication:
- More information
- Better estimates
- Higher power to detect real effects

Replication depends on independent experimental units

Replicates must be independent units
Experimental unit:
- The unit assigned a treatment independently
Examples:
- Individual organism (if assigned independently)
- Group units: plot, cage, household, petri dish
Key rule:
- Individuals within the same unit are not independent

Replication is not just “more individuals”

Multiple organisms ≠ multiple replicates
If organisms share the same environment:
- They are more similar to each other
- They count as one replicate
Must identify the correct experimental unit:
- Critical for design and analysis

Example: Which designs are truly replicated?

Two growth chambers:
- Control vs Treatment (different light)
Multiple plants per chamber
- Share the same environment
Experimental unit = chamber, not plant
One chamber per treatment → no replication
Lighting differs between chambers
- Cannot separate treatment from chamber effect

Illustration of three experimental setups using potted plants with two fertilizer treatments shown by different pot colors. The top row shows one plant per treatment (no replication). The middle row shows multiple plants per treatment grouped into separate chambers (not independent, still unreplicated). The bottom row shows individual plants randomly assigned to treatments and interspersed, representing proper replication with independent experimental units. — Figure 11: Two growth chambers comparing control and treatment conditions, each containing multiple Brassica rapa plants. Because all plants within a chamber share the same environment—and the chambers differ in light intensity—the chamber, not the plant, is the experimental unit. With only one chamber per treatment, this design is unreplicated.

Interspersion signals proper replication

Proper replication shows interspersion:
- Treatments mixed across units
- Result of random assignment
Lack of interspersion:
- Warning sign of design problems
- Likely non-independence

Diagram comparing experimental layouts using black and white squares to represent two treatments. The top section labeled “Good design” shows treatments evenly mixed across units (completely randomized, randomized block, and systematic). The bottom section labeled “Poor design” shows treatments grouped or separated, including simple segregation, clumped segregation, isolation in separate chambers, interdependent replicates, and no replication. — Figure 12: Examples of good and poor experimental designs illustrating interspersion of treatments. Top rows show properly interspersed designs (completely randomized, randomized block, and systematic), while bottom rows show problematic designs where treatments are segregated, clumped, isolated, interdependent, or lack replication. Figure from Hurlbert (1984).

Pseudoreplication leads to false precision

Treating non-independent units as independent = pseudoreplication
Examples:
- Treating plants within a chamber as separate replicates
- Repeated measurements on same individual
Consequences:
- Standard errors too small
- Overconfidence in results

Why replication reduces standard error

\[ \operatorname{SE}_{\bar{Y}_1 - \bar{Y}_2} = \sqrt{s_p^2 \left(\frac{1}{n_1} + \frac{1}{n_2}\right)} \]

\(n_1\), \(n_2\) = number of independent replicates per treatment
Increasing sample size ↓ standard error
Lower standard error → clearer detection of differences

Replication has practical limits

Increasing sample size improves inference
But comes with costs:
- Time
- Money
- Ethical considerations (e.g., animal use)
Goal:
- Sufficient replication to detect meaningful effects

Balanced designs minimize sampling error

Balanced design = equal sample size in each treatment
Unbalanced design = unequal sample sizes
For a fixed total sample size:
- Standard error is smallest when group sizes are equal
- Balance optimizes precision of comparisons

Diagram comparing balanced and unbalanced experimental designs. The balanced design shows equal numbers of units in control (circles) and treatment (squares). The unbalanced design shows many control units and few treatment units, illustrating unequal sample sizes across treatments. — Figure 13: Balanced and unbalanced experimental designs illustrating allocation of sample size across treatments. In the balanced design, control and treatment groups have equal numbers of independent experimental units (n = 6 each). In the unbalanced design, most units are assigned to the control group (n = 10) and few to the treatment group (n = 2). Circles represent control units and squares represent treatment units.

Why balance improves precision

\[ \operatorname{SE}_{\bar{Y}_1 - \bar{Y}_2} = \sqrt{s_p^2 \left(\frac{1}{n_1} + \frac{1}{n_2}\right)} \]

For fixed \(n_1 + n_2\):
- SE minimized when \(n_1 = n_2\)
Example (total \(n = 20\)):
- Balanced: \(n_1=10\), \(n_2=10\) → smaller SE
- Unbalanced: \(n_1=19\), \(n_2=1\) → much larger SE

Estimating a difference requires:
- Precise estimate of both means
Unbalanced design:
- One group well estimated
- Other poorly estimated → weak comparison
Balance allocates effort efficiently

Balance is optimal but not strictly required

Increasing sample size improves precision:
- Even if added to only one group
But for a fixed total sample size:
- Equal allocation is optimal
Additional benefit:
- Statistical methods are more robust
- Especially when variances differ between groups

Blocking reduces noise from known sources of variation

Blocking: group similar experimental units into blocks
Units within a block:
- Share location or other characteristics
- Are more similar to each other than to units in other blocks
Goal:
- Remove variation not caused by the treatment
- Increase precision and power

How blocking works

Within each block:
- Assign treatments randomly
- Treatments are interspersed within the block
Analyze differences within blocks, not across all units
Conceptually:
- Repeat the same experiment in each block
- Compare treatments under similar conditions

Diagram comparing no blocking and blocking designs. In the no blocking design, all individuals are randomly divided into treatment and control. In the blocking design, individuals are first grouped into blocks, then treatments are randomly assigned within each block. — Figure 14: Comparison of experimental designs with and without blocking. Blocking groups similar individuals before random assignment, allowing treatment comparisons within blocks and reducing variation among groups. Image: JHK111 (CC0 1.0 Universal)

When is blocking useful?

Use blocking when:
- Units differ due to known factors (e.g., location, time, group)
Examples of blocks:
- Field plots in the same area
- Animals from the same litter
- Patients from the same clinic
- Experiments run on the same day
Key condition:
- Units within blocks are similar
- Blocks differ from each other

Scatterplot of weight loss by individuals without blocking. Blue points represent placebo and green points represent diet pills, with substantial overlap between groups making treatment differences difficult to distinguish. — (a) Without blocking: diet pills vs placebo on weight loss. Individuals are not grouped, so variation among individuals obscures differences between treatments. Image: JHK111 (CC0 1.0 Universal).

Scatterplot of weight loss grouped into two blocks labeled females and males. Within each block, blue points represent placebo and green points represent diet pills, showing clearer separation between treatments compared to the unblocked design. — (a) Without blocking: diet pills vs placebo on weight loss. Individuals are not grouped, so variation among individuals obscures differences between treatments. Image: JHK111 (CC0 1.0 Universal).

Example: Extreme treatments reveal nitrogen effects

Clark and Tilman (2008) studied whether nitrogen addition reduces plant diversity
Typical (background) N deposition: ~1–10 kg N ha⁻¹ yr⁻¹
Experimental treatments: Up to 100 kg N ha⁻¹ yr⁻¹ (extreme)
Why use extreme levels?
- Amplify the response
- Make treatment effects easier to detect
Result: Clear decline in species richness with higher N

Scatterplot showing species loss (proportion) versus nitrogen input rate. Points are scattered across increasing nitrogen levels, with fitted lines indicating an upward trend in species loss as nitrogen input increases, especially at higher application rates. — Figure 16: Relationship between nitrogen input and plant species loss in grassland ecosystems. Species loss increases with nitrogen addition, with stronger effects at higher (extreme) nitrogen levels. Points represent observations and lines show fitted trends. Adapted from Clark and Tilman (2008).

Extreme treatments make effects easier to detect

Treatment effects are easiest to detect when they are large
Small differences:
- Hard to distinguish from random variation
- Require larger sample sizes
Large differences:
- Stand out against noise
- Increase power to detect an effect
Strategy:
- Include extreme treatment levels

Extreme treatments increase power, but with tradeoffs

Stronger treatments → larger response differences
- Higher probability of detecting an effect
Useful as a first step:
- Does this variable affect the response at all?
Caution:
- Effects may not scale linearly
- Extreme treatments may not reflect realistic conditions
Balance:
- Detection vs realism

Experiments with More Than One Factor

Experiments can include more than one factor

A factor:
- A single treatment variable of interest
Many experiments include multiple factors
- More efficient:
  - Answer multiple questions at once
  - Use time, materials, and effort more effectively
Example idea:
- Temperature + nutrients
- Light + water

A 2×2 grid showing plant growth under combinations of low and high light and water. Plants are smallest under low light and low water, larger with either factor increased, and largest when both light and water are high. — Figure 17: Factorial design illustrating the combined effects of light and water on plant growth. Each panel represents a different combination of low and high levels of both factors, showing how growth depends on their interaction.

Factorial designs test combinations of factors

Factorial design:
- Includes all combinations of treatment levels
Example structure (2 factors):
- Factor A: A₁, A₂
- Factor B: B₁, B₂
Treatments:
- A₁B₁, A₁B₂, A₂B₁, A₂B₂
Key advantage:
- Can test interactions between factors

	Variable B
Factorial design: two factors with two levels each
	B₁	B₂
Variable A
A₁	A₁B₁	A₁B₂
A₂	A₂B₁	A₂B₂

Figure 18: Factorial design with two variables (A and B), each with two levels. Rows represent levels of Variable A and columns represent levels of Variable B. Each cell shows a treatment combination (e.g., A₁B₁), representing one unique combination of factor levels included in the experiment.

Interactions: when effects depend on each other

Interaction: Effect of one factor depends on another factor
Without interaction:
- Effects are independent and additive
With interaction:
- Combined effect differs from separate effects
Only detectable with factorial design
Examples of types of interactions (Duda et al. 2023)

Line graph showing response versus Variable B (B₁, B₂) with separate lines for Variable A (A₁, A₂). The line for A₁ increases from B₁ to B₂, while the line for A₂ decreases, illustrating a non-parallel interaction effect. — Figure 19: Hypothetical interaction between Variable A and Variable B. The effect of Variable B on the response differs depending on the level of Variable A, as shown by the non-parallel lines.

Example: 4-factor factorial experiment (smoking reduction)

Study in Cook et al. (2015) Addiction
Outcome: % reduction in cigarettes/day
4 factors (2 levels each: yes vs no):
- Nicotine patch
- Nicotine gum
- Motivational interviewing
- Behavioral reduction
Design: 2 × 2 × 2 × 2 = 16 combinations
Key idea: Effects depend on combinations of treatments (interactions)

Multi-panel bar chart showing mean percent reduction in cigarettes per day across combinations of four treatments: nicotine patch, nicotine gum, motivational interviewing, and behavioral reduction. Each panel corresponds to patch and motivational interviewing conditions, with bars representing gum and behavioral reduction combinations. Differences among bars indicate that treatment effects depend on combinations of factors. — Figure 20: Mean percent reduction in cigarettes per day for all combinations of four treatments (nicotine patch, nicotine gum, motivational interviewing, and behavioral reduction). Each panel represents a different combination of patch and motivational interviewing, with bars showing gum and behavioral reduction combinations. Results illustrate how treatment effects vary across combinations, indicating interactions among factors. Adapted from Cook et al. (2015).

What if You Can’t Do an Experiment?

When experiments are not possible

Use observational studies
- Researcher does not assign treatments
- Subjects come as they are
Strengths:
- Detect real-world patterns
- Generate hypotheses
Limitation:
- Cannot use randomization
- → greater risk of bias

Observational studies still use good design principles

Apply as many experimental design features as possible:
- Controls
- Blinding (when possible)
- Replication, balance, blocking
Key missing feature:
- Randomization
Biggest challenge:
- Confounding variables

Strategy 1: Matching

Matching: Pair each treated individual with a similar control
Match on known confounders:
- Age, sex, weight, background, etc.
Common in: Case–control studies
Benefits:
- Reduces bias
- Reduces sampling error (like blocking)
Limitation:
- Only controls known confounders

Grid of human icons arranged in pairs connected by arrows. Each pair matches individuals with the same sex (male or female), race (light or dark shading), and age (with or without a cane), illustrating one-to-one matching on multiple confounding variables. — Figure 21: Illustration of matching in a case–control study. Individuals are paired so that cases and controls have the same characteristics (sex, race, and age), reducing confounding. From Dey et al. (2020) Chest J.

Strategy 2: Adjustment

Adjustment: Use statistical methods to control for confounders
Example approach:
- Compare groups at the same value of a confounder (e.g., age)
Methods:
- Regression
- Analysis of covariance (ANCOVA)
Key requirement:
- Groups must overlap in confounder values

Scatterplot of body mass versus flipper length for Adelie penguins, colored by sex, with separate regression lines for males and females. A dashed vertical line marks a common flipper length, and points on each line at that position indicate predicted body mass for each sex, showing adjusted comparison controlling for flipper length. — Figure 22: Relationship between flipper length and body mass in Adelie penguins, with separate regression lines for males and females. Points show individual observations, and lines show fitted values from a model including sex and flipper length. The vertical line marks a common flipper length, and highlighted points show predicted (adjusted) body mass for each sex at that value, illustrating comparison after accounting for flipper length.

Limits of observational studies

Observational studies can reveal important patterns
But without randomization:
- Confounding cannot be fully eliminated
Strongest inference:
- Experiments > observational studies
Best use:
- Identify relationships
- Generate hypotheses for experiments

Choosing a Sample Size

Choosing a sample size matters

Goal: Choose enough samples to get useful results
Too small:
- Cannot detect effects
- Very wide confidence intervals
Too large:
- Wastes time, money, and resources
- May raise ethical concerns
Key question: How many replicates per treatment?

Two ways to plan sample size

Plan for precision
- Want a narrow confidence interval
Plan for power
- Want a high probability of detecting a real effect
Focus here:
- Comparing two means

Planning for precision

Goal: Estimate the difference in means:

\[ \mu_1 - \mu_2 \]

Use sample estimate:

\[ \bar{Y}_1 - \bar{Y}_2 \]

Want a 95% confidence interval with small width

Margin of error drives sample size

Confidence interval form:

\[ (\bar{Y}_1 - \bar{Y}_2) \pm \text{margin of error} \]

Margin of error ≈

\[ 2 \times \operatorname{SE} \]

Standard error:

\[ \operatorname{SE} = \sqrt{\frac{2\sigma^2}{n}} \]

Larger \(n\) → smaller SE → narrower interval

Sample size formula (approximate)

Solve for sample size per group:

\[ n \approx \frac{8\sigma^2}{(\text{margin of error})^2} \]

Interpretation:
- Larger variability (\(\sigma\)) → need larger \(n\)
- Higher precision (smaller margin) → need larger \(n\)

Practical challenge

\(\sigma\) is unknown
- Use:
  - Pilot studies
  - Previous research
  - Educated guess
Result:
- Sample size planning is approximate

Choosing sample size for power

Goal: Choose \(n\) so you can detect a meaningful effect
You must specify:
- Effect size (what difference matters biologically)
- Variability (\(\sigma\); from pilot data or past studies)
- Significance level (\(\alpha\), usually 0.05)
- Desired power (commonly 80%)
  - 80% chance of rejecting a false null
  - 20% chance of missing a real effect (Type II error)

How you do it:
- Use software (e.g., R, online calculators)
- Input these values → solve for required \(n\)
Key idea:
- Larger effect → smaller \(n\)
- Higher variability → larger \(n\)
- Higher power → larger \(n\)

More data improves precision

Small sample size:
- Very wide confidence intervals
Increasing \(n\):
- Rapid improvement at first
Large \(n\):
- Diminishing returns

Line graph showing expected margin of error divided by sigma versus sample size per treatment. The curve decreases steeply from small sample sizes and then levels off, indicating that increases in sample size lead to diminishing improvements in precision. — Figure 23: Relationship between sample size per treatment and expected precision of a 95% confidence interval for the difference in means. Precision is expressed as the margin of error divided by σ. Precision improves rapidly at small sample sizes and then more slowly, illustrating diminishing returns.

Diminishing returns

Precision improves quickly at first:
- e.g., \(n = 2 \rightarrow 5\)
Then slows:
- e.g., \(n = 15 \rightarrow 20\)
Each additional replicate adds less new information
Tradeoff:
- Precision vs cost

Summary of Considerations in Experimental Design

Summary: Designing studies and making inference

Experiments assign treatments and enable causal inference
Bias is reduced through controls, randomization, and blinding
Randomization balances confounding variables on average
Observational studies lack randomization and have weaker inference
Confounding in observational studies is reduced by matching and adjustment

Sampling error is reduced by replication, balance, and blocking
Extreme treatments increase the ability to detect effects
Factorial designs test multiple factors and their interactions
Sample size is planned for precision or power
Study design involves tradeoffs among precision, cost, and feasibility

Lecture 18 Designing Experiments