Lecture 1
Statistics and Samples

ABD 3e Chapter 1

Chris Merkord

Learning Objectives

Understand the major goal of statistics
Distinguish between a sample and a population
Distinguish between an estimate and a parameter
Identify why estimates from samples may deviate from parameters of populations
Identify the properties of a good sample

What’s the point of statistics?

Goal: We want to learn about the world.
Challenge: We can’t look at the whole world.
Solution: Take a sample and generalize outward.
New Challenge: Samples deviate from the populations by Bad luck (sampling error) or Unrepresentative sampling (sampling bias)

Statistics’ fundamental obsession

Question:

How do we make inferences about the WORLD from our finite observations?

Answer:

Make models to account for the process of sampling and the associated hazards

Populations and samples

A population contains all the individuals units of interest
A sample is a subset of units taken from the population that we collect and analyze to learn about a population

Sample Units can be almost anything

Units are often individual organisms, but not always

Could be:

Groups of organisms (e.g. familes)
Similar parts on a single organisms (e.g. trichomes on a leaf)
Single parts across multiple organisms (e.g. beaks on birds)
Areas of land (e.g. vegetation quadrats)
Etc.

Parameters versus Estimates (1 of 2)

Parameters

Describe populations
Only known if you measure entire population
Parameters are fixed
Parameters are the world
Parameters are the truth

Statistics

Estimates of parameters
Inferred from samples
Are random variables
Change from one sample to the next

Sampling Bias and Sampling Error

Sampling bias

Systematic difference between estimates and parameters

Sampling error

Undirected deviation of estimates away from parameters.

Precision

How close estimates are to each other

Colored points are estimates from a sample. The Red X is the true parameter value. — Colored points are estimates from a sample. The **Red X** is the true parameter value.

Sampling bias

Systematic difference between a parameter and its estimate
Estimates from multiple samples tend to deviate from parameters in the same direction
Arises when samples aren’t representative

Source: Kevin C at https://skepticalscience.com/print.php?n=1366

Volunteer Bias

Volunteers for a study are likely to be different, on average, from the population.

Examples:

Volunteers for sex surveys are more likely to be open about sex.
Volunteers for medical studies may be sicker than the general population.
Animals that are caught may be slower or more docile than those that are not.

Taking random samples is hard and requires effort

A random sample is a good sample

When units are chosen at random from a population, it is called a random sample

Random sampling minimizes bias and allows for estimation of sampling error

Rules:

Each unit should have an equal chance of being included in a sample
Selection of units must be independent

All statistics we do assumes a random sample

Convenience sample: easy but biased; not random, not independent

How to get a random sample

Carefully characterize a population and use computer code (e.g. the sample() function in R) to select participants randomly.

Sampling Error

Sampling Error:

The difference between the estimate and its true parameter value.

Even if you sample perfectly, by the book, your estimates will differ from the true parameter by chance.

Estimates are random variables

Because an estimate is a random variable, the value of an estimate is influenced by chance

Sampling error declines with sample size

Larger sample -> smaller sampling error

Lecture 1 Statistics and Samples

Learning Objectives

What’s the point of statistics?

Statistics’ fundamental obsession

Populations and samples

Sample Units can be almost anything

Parameters versus Estimates (1 of 2)

Parameters

Statistics

Sampling Bias and Sampling Error

Sampling bias

Volunteer Bias

A random sample is a good sample

How to get a random sample

Sampling Error

Estimates are random variables

Sampling error declines with sample size

Lecture 1
Statistics and Samples