Graduate → Probability and Statistics → Statistical Inference ↓
Hypothesis Testing
Hypothesis testing is a fundamental concept in statistics, used to make judgements about the characteristics of a population. It is a method that allows us to use sample data to decide between two competing hypotheses about a population parameter.
Introduction to hypothesis testing
Basically, hypothesis testing is a process by which we check whether a statement (hypothesis) about a population parameter is plausible based on the sample data we have. The main elements of hypothesis testing are the null hypothesis, alternative hypothesis, test statistic, rejection region, and conclusion.
Null and alternative hypotheses
The null hypothesis (denoted as H 0
) is a statement about the population parameter we want to test. It is usually a statement of no effect or no difference. In contrast, the alternative hypothesis (denoted as H 1
or H a
) is what we assume to be true if the null hypothesis is rejected. It represents an effect or a difference.
Null Hypothesis (H 0 ): μ = μ 0 Alternative Hypothesis (H a ): μ ≠ μ 0 (two-tailed) Alternative Hypothesis (H a ): μ > μ 0 (right-tailed) Alternative Hypothesis (H a ): μ < μ 0 (left-tailed)
Test statistic
A test statistic is a value calculated from sample data that is used to assess the probability of the null hypothesis. The choice of test statistic depends on the type of data and the hypothesis being tested. Common examples include the z-score, t-score, and F-statistic.
For example, if we want to test a hypothesis about a mean, we can calculate z-score
for the sample mean:
z = (x̄ - μ 0 ) / (σ/√n)
Rejection zone
The rejection region is determined by the significance level, denoted by α
, which is the probability of rejecting the null hypothesis when it is actually true. Common choices for α
are 0.05, 0.01, and 0.10.
If the test statistic falls in the rejection region, we reject the null hypothesis in favor of the alternative hypothesis.
Conclusion
Based on the test statistic and the rejection region, we draw conclusions. If the test statistic falls in the rejection region, we reject the null hypothesis, suggesting that there is enough evidence to support the alternative hypothesis. If it does not fall in the rejection region, we fail to reject the null hypothesis.
Types of hypothesis testing
Hypothesis tests can be classified into several types depending on the population parameters of interest and the data available.
One-sample z-test
The one-sample z-test is used when we want to compare a sample mean with a known population mean. This test assumes that the population is normally distributed and the population variance is known.
One-sample t-test
If the population variance is unknown, we use a one-sample t-test instead of a z-test. It is appropriate when the sample size is small, and the population is assumed to be normally distributed.
Two-sample t-test
The two-sample t-test compares the means of two independent samples. It tests whether the means of two groups are equal. This test uses the pooled standard deviation when it is assumed that the variances are equal or uses the individual variances when the variances are unequal.
Example: hypothesis testing for the mean
Suppose a company manufactures light bulbs with an average life of 1000 hours. A researcher believes the true average life is less than 1000 hours and wants to test this hypothesis using a random sample of 30 light bulbs.
Let's define our hypotheses:
H 0 : μ = 1000 H a : μ < 1000
Significance level: α = 0.05
.
Assuming that the standard deviation of the population is 100, the test statistic can be calculated as follows:
z = (x̄ - μ 0 ) / (σ/√n) x̄ = sample mean n = sample size σ = population standard deviation μ 0 = population mean under H 0
If the calculated z
value falls to the left of our critical value on the normal distribution curve (found in z-tables), we will reject the null hypothesis.
Real life applications of hypothesis testing
- Medical research: Comparing the effectiveness of a new drug to a placebo.
- Manufacturing: Comparing the means of different production processes to determine which process is more efficient.
- Marketing: Evaluating the impact of a new campaign on customer sales or engagement compared to an old strategy.
- Education: Determining if a new teaching method works better than the traditional method.
Common mistakes in hypothesis testing
Type I and Type II errors
Two common mistakes in hypothesis testing are Type I and Type II errors:
- Type I error: rejecting the null hypothesis when it is true. The probability of a Type I error occurring is the significance level
α
. - Type II error: Failing to reject the null hypothesis when the alternative hypothesis is true. The probability of committing a Type II error is denoted as
β
.
Example of errors in hypothesis testing: medical testing
In a medical test for a disease, the null hypothesis might be that the person does not have the disease (H 0
: the person does not have the disease). The alternative hypothesis might be that the person does have the disease (H a
: the person does have the disease).
- Type I error: The test shows that the person has the disease when in fact they do not. This can cause unnecessary stress and treatment.
- Type II error: The test fails to identify a disease when a person actually has it. This results in the disease not being treated.
The power of testing
The power of a test is the probability that it correctly rejects the false null hypothesis (1 - β
). Higher power means a greater probability of detecting an effect when there is an effect, thereby reducing type II errors.
The power can be increased as follows:
- Increasing the sample size.
- Selecting a higher significance level (which increases the probability of a Type I error).
- This is the increase in effect size we expect to find.
Critical values and p-values
Critical values
Critical values are the threshold values that define the boundaries of the rejection region(s). For a z-test, the critical values correspond to the z
scores at which the tail(s) of the normal distribution beyond the critical value fall within a predefined significance level (α
).
P-value
The p-value is the probability of reaching a peak at least equal to the observed test statistic, assuming the null hypothesis is true. A small p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis, so we reject H 0
A large p-value (> 0.05) indicates weak evidence against H 0
, so we fail to reject it.
Example of use of p-value
In a study about average sleep hours among students, the null hypothesis states that students sleep an average of 7 hours per night. The sample data provides an average of 6.6 hours with a calculated p-value
of 0.03
.
Conclusion: Since 0.03 < 0.05
, we reject the null hypothesis, and conclude that the average sleep time is different from 7 hours.
Conclusion
Hypothesis testing provides a systematic way to make decisions using data. Although it does not provide definitive evidence, it helps us evaluate evidence against probability statements about a population. Understanding the process, types, errors, and how to effectively apply these concepts can significantly impact outcomes in fields such as medicine, business, and the social sciences.
Materials and references for further reading
- Using and Interpreting Statistics in the Social, Behavioral, and Health Sciences, by William R. Nugent
- Statistics for Experimenters: Design, Innovation, and Discovery, by George E. P. Box, J. Stuart Hunter, and William G. Hunter
- Principles of Statistics, by M.G. Bulmer