Graduate → Probability and Statistics → Statistical Inference ↓

Bayesian Methods

Introduction

Bayesian methods play a fascinating role in statistical inference, providing a framework for reasoning about uncertainty. This approach is based on Bayes' theorem, named after the 18th-century statistician and theologian Reverend Thomas Bayes. Unlike frequentist statistics, which uses probability only to refer to long-term frequencies, Bayesian statistics allows probabilities to express one's level of belief or certainty about an event.

Bayes' theorem

The basis of Bayesian inference is Bayes' theorem, which is expressed mathematically as follows:

P(H|E) = (P(E|H) * P(H)) / P(E)

This formula can be broken down into its components:

P(H|E): Posterior probability. The probability of the hypothesis H based on the observed evidence E
P(E|H): Probability. The probability of observing evidence E given hypothesis H is true.
P(H): Prior probability. The initial degree of belief in the hypothesis H before observing E
P(E): Marginal likelihood. The overall probability of the evidence under all possible hypotheses.

Basic example: tossing a coin

Consider a simple example where we want to know if a coin is biased towards heads. We observe ten coin tosses, seven of which are heads. We want to find the probability of the coin being biased using a Bayesian framework.

Example

Let H be the hypothesis that the coin is inclined to heads, and E be the evidence that heads occur seven times out of ten tosses. Now, we need to specify:

P(H): Our prior belief about the coin being biased. Suppose we believe that every coin has a 50% chance of being biased. Thus, P(H) = 0.5.
P(E|H): the probability of observing seven heads under the hypothesis. If this is biased, assume P(E|H) = 0.9.
P(E): The marginal likelihood can be calculated by considering all hypotheses. For simplicity, let's say P(E) = 0.5.

Now apply Bayes' theorem:

P(H|E) = (0.9 * 0.5) / 0.5 = 0.9

Therefore, there is a strong possibility that the coin is biased.

Priors

The prior probability P(H) communicates the initial belief before observing the evidence. In Bayesian analysis, the choice of prior can greatly affect the final result, especially with small data sizes. Priors can be informative or non-informative.

Informative preferences

Informative prior knowledge consists of specific, previous knowledge about the parameter of interest. In the coin example, if previous experiments show that the coin falls heads with a 70% probability, this information will guide our prior selection.

Non-informative preferences

Non-informative or weak forecasts do not provide much specific information about the hypothesis, often reflecting a state of relative ignorance. Common alternatives include uniform distributions where all outcomes are equally likely.

Back

Once the evidence is taken into account via Bayes’ theorem, we obtain the posterior probability, code{P(H|E)}, which incorporates all of our information about the hypothesis — the prior and the data combined. The posterior probability is the most important aspect of Bayesian inference, as it represents how our understanding of a hypothesis is modified by new data.

Possibility

Probability is a core component of Bayesian calculations. It measures how likely observed data are under different hypotheses. Mathematically, the likelihood, code{P(E|H)}, assesses the compatibility between the data and the hypothesis.

Marginal likelihood

The marginal likelihood, code{P(E)}, ensures that the posterior probabilities add up to 1. It involves summing the probabilities across all hypotheses. In practice, calculating the marginal likelihood can be complicated, especially in models with many parameters.

Advanced example: disease testing

Suppose a medical test checks for a disease with the following characteristics:

The sensitivity of this test is 95%, meaning it correctly identifies 95% of patients suffering from the disease.
The specificity of this test is 90%, meaning it correctly identifies 90% of healthy patients.
1% of the population has this disease.

Example

Let H be the event that the patient gets the disease, and E be the positive test result.

P(H) = 0.01 (prior probability of having the disease)
P(E|H) = 0.95 (probability of testing positive if sick)

To calculate the overall probability of a positive test, code{P(E)}, consider both true and false positive results:

P(E) = P(E|H) * P(H) + P(E|H') * P(H')

P(E) = 0.95 * 0.01 + 0.1 * 0.99 = 0.1045

Finally, use Bayes' theorem to find the posterior:

P(H|E) = (0.95 * 0.01) / 0.1045 ≈ 0.091

Despite a positive test, the probability of having the disease based on this result is only 9.1%.

Updating beliefs

Bayesian inference is an iterative process. As more evidence is collected, you continuously update your beliefs using Bayes' theorem. Each new piece of evidence acts as a probability that modifies your previous beliefs to form subsequent beliefs. Over time, this process refines our understanding and improves decision making.

Conjugate priors

In many cases, choosing a conjugate prior simplifies the calculation. A conjugate prior is one that, when used as a prior, yields a posterior distribution of the same family, thereby simplifying the analytical solution. For example, in binomial probabilities, the beta distribution as a prior will yield a beta posterior distribution - so the distribution type remains constant.

Application

Bayesian methods have extensive applications in a variety of fields. Some of the notable ones are:

Medicine: For diagnosing diseases, Bayesian methods balance prior information about the prevalence of a disease with diagnostic test evidence.
Finance: Bayesian models are used to predict stock prices, incorporating both historical data and expert forecasts.
Machine learning: Bayesian techniques power probability models for tasks such as classification, clustering, and regression.
Natural language processing: Bayesian inference extends models such as topic models to identify patterns in text data.

Challenges

While powerful, Bayesian methods also present challenges. Complex models often require significant computational resources. It can be difficult to calculate the posterior distribution analytically, requiring approximation techniques such as Markov Chain Monte Carlo (MCMC).

Conclusion

Bayesian methods provide a flexible, consistent framework for statistical inference. By combining prior beliefs with new evidence, Bayesian inference refines understanding in a logical, intuitive way. Despite the calculation challenges in a wide range of situations, its principles shine in many real-world applications, making it invaluable in the statistician's toolkit.

Mark as read

Graduate → 5.2.3

username

completed in Graduate

← Prev (5.2.2)

Confidence Intervals

Next (5.2.4) →

Maximum Likelihood Estimation

Bayesian Methods

Introduction

Bayes' theorem

Basic example: tossing a coin

Example

Priors

Informative preferences

Non-informative preferences

Back

Possibility

Marginal likelihood

Advanced example: disease testing

Example

Updating beliefs

Conjugate priors

Application

Challenges

Conclusion

Comments

Bayesian Methods