Lecture 1
Statistics and Samples

ABD 3e Chapter 1

Chris Merkord

Learning Objectives

  1. Understand the major goal of statistics

  2. Distinguish between a sample and a population

  3. Distinguish between an estimate and a parameter

  4. Identify why estimates from samples may deviate from parameters of populations

  5. Identify the properties of a good sample

What’s the point of statistics?

  • Goal: We want to learn about the world.

  • Challenge: We can’t look at the whole world.

  • Solution: Take a sample and generalize outward.

  • New Challenge: Samples deviate from the populations by Bad luck (sampling error) or Unrepresentative sampling (sampling bias)

Statistics’ fundamental obsession

Question:

How do we make inferences about the WORLD from our finite observations?

Answer:

Make models to account for the process of sampling and the associated hazards

Populations and samples

  • A population contains all the individuals units of interest

  • A sample is a subset of units taken from the population that we collect and analyze to learn about a population

Sample Units can be almost anything

Units are often individual organisms, but not always

Could be:

  • Groups of organisms (e.g. familes)

  • Similar parts on a single organisms (e.g. trichomes on a leaf)

  • Single parts across multiple organisms (e.g. beaks on birds)

  • Areas of land (e.g. vegetation quadrats)

  • Etc.

Parameters versus Estimates (1 of 2)

Parameters

  • Describe populations
  • Only known if you measure entire population
  • Parameters are fixed
  • Parameters are the world
  • Parameters are the truth

Statistics

  • Estimates of parameters
  • Inferred from samples
  • Are random variables
  • Change from one sample to the next

Sampling Bias and Sampling Error

Sampling bias

Systematic difference between estimates and parameters

Sampling error

Undirected deviation of estimates away from parameters.

Precision

How close estimates are to each other

Colored points are estimates from a sample. The Red X is the true parameter value.

Colored points are estimates from a sample. The Red X is the true parameter value.

Sampling bias

  • Systematic difference between a parameter and its estimate

  • Estimates from multiple samples tend to deviate from parameters in the same direction

  • Arises when samples aren’t representative

Source: Kevin C at https://skepticalscience.com/print.php?n=1366

Source: Kevin C at https://skepticalscience.com/print.php?n=1366

Volunteer Bias

Volunteers for a study are likely to be different, on average, from the population.

Examples:

  • Volunteers for sex surveys are more likely to be open about sex.

  • Volunteers for medical studies may be sicker than the general population.

  • Animals that are caught may be slower or more docile than those that are not.

Taking random samples is hard and requires effort

A random sample is a good sample

When units are chosen at random from a population, it is called a random sample

Random sampling minimizes bias and allows for estimation of sampling error

Rules:

  • Each unit should have an equal chance of being included in a sample

  • Selection of units must be independent

All statistics we do assumes a random sample

Convenience sample: easy but biased; not random, not independent

How to get a random sample

Carefully characterize a population and use computer code (e.g. the sample() function in R) to select participants randomly.

Sampling Error

Sampling Error:

  • The difference between the estimate and its true parameter value.

Even if you sample perfectly, by the book, your estimates will differ from the true parameter by chance.

Estimates are random variables

Because an estimate is a random variable, the value of an estimate is influenced by chance

Sampling error declines with sample size

  • Larger sample -> smaller sampling error