What is statistical inference?

Statistical inference is about drawing conclusions from data under uncertainty. In practice, it often means using hypothesis tests to decide whether the data provide enough evidence against a default assumption (the null hypothesis).

A common tool is the t-test (also called Student’s t-test), which can be used to test hypotheses about means:

  • One-sample t-test: compare the mean of one sample to a reference value.
  • Paired t-test: compare two measurements taken on the same units (before/after, matched pairs).

What is a t-test?

A t-test compares two competing hypotheses:

  • \(H_0\): the mean (or mean difference) equals a reference value.
  • \(H_1\): the mean (or mean difference) is different (two-sided) or greater/less (one-sided).

In R, the base function is t.test():

t.test(x, y = NULL, mu = 0, alternative = "two.sided", paired = FALSE, var.equal = FALSE)

Key arguments:

  • x: numeric vector.
  • y: second numeric vector (for two-sample tests).
  • mu: reference mean (or reference mean difference for paired tests).
  • paired: set to TRUE for paired data.
  • var.equal: relevant for two-sample unpaired tests; it uses a pooled variance estimate when TRUE.

One-sample t-test

A one-sample t-test compares the mean of a sample to a theoretical mean \(\mu\).

\[ t = \frac{\bar{x} - \mu}{s/\sqrt{n}} \]

where \(\bar{x}\) is the sample mean, \(s\) the sample standard deviation, and \(n\) the sample size.

The result includes a p-value. With a significance level of 0.05:

  • p-value < 0.05: reject \(H_0\).
  • p-value ≥ 0.05: insufficient evidence to reject \(H_0\).

Paired t-test

A paired t-test is used when the two samples are dependent (same units measured twice, or matched pairs). The test is equivalent to a one-sample t-test on the differences \(d = x - y\).

Example: sales before vs after a discount program

A company tracks daily sales for the same shop for 7 days before and 7 days after a discount program.

set.seed(123)

sales_before <- rnorm(7, mean = 50000, sd = 50)
sales_after  <- rnorm(7, mean = 50075, sd = 50)

t.test(sales_before, sales_after, paired = TRUE)
## 
##  Paired t-test
## 
## data:  sales_before and sales_after
## t = -2.6102, df = 6, p-value = 0.04011
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -97.618586  -3.152003
## sample estimates:
## mean difference 
##       -50.38529

Notes:

  • Use paired = TRUE for paired data.
  • In a paired test, var.equal is not meaningful in the same way as in an unpaired two-sample test, because the test works on a single vector of differences.

Summary

The t-test is a fundamental tool in inferential statistics for testing hypotheses about means.

Test Hypothesis Code
One-sample t-test Mean of a sample differs from a reference value t.test(x, mu = mu0)
Paired t-test Mean difference between paired measurements differs from 0 t.test(x, y, paired = TRUE)
 

A work by Gianluca Sottile

gianluca.sottile@unipa.it