Graduate

Graduate


Probability and Statistics


Probability and statistics are branches of mathematics that deal with the concept of uncertainty. They are used to analyze and predict outcomes and make decisions based on data. While probability provides the theoretical framework for measuring uncertainty, statistics uses that framework to collect, analyze, interpret, and present empirical data.

Understanding probability

Probability is a measure of the likelihood of an event occurring. It measures our expectations about that event based on certain conditions or experiments. The probability of any event is a number between 0 and 1, where 0 represents impossibility and 1 represents certainty. Events with high probability are more likely to occur than events with low probability.

Consider a simple example. If you have a fair six-sided dice, the probability of getting a particular number, say 3, is:

Probability of rolling a 3 = 1/6 = 0.1667

This is because the dice has six sides and each side has an equal chance of falling face up.

Visual example: Probability of a single dice roll

1 2 3 4 5 6

Basic concepts of probability

Randomized experiment

A random experiment is a procedure whose outcome cannot be predicted with certainty in advance. For example, tossing a coin, throwing dice, or drawing a card from a shuffled deck. Despite their randomness, these experiments can become predictable after a long series of trials.

Sample locations and events

The sample space of a random experiment, often denoted by S, is the set of all possible outcomes. Each possible outcome is called a sample point. An event is any subset of the sample space. For example, in throwing a dice, the sample space S is {1, 2, 3, 4, 5, 6} and an event could be "rolling an even number", which includes the outcomes {2, 4, 6}.

Combination of events

Two or more events in the sample space can be combined using set operations such as union, intersection, and complement. For example, if A and B are two events, then:

  • A ∪ B (Union): Event occurs if either A or B or both occur.
  • A ∩ B (Intersection): The event occurs if both A and B occur.
  • A' (complement): The event occurs if A does not occur.

Laws of probability

Probability rules are rules that govern how probabilities can be assigned to events in a probability space. These include the axioms of probability, which are as follows:

  1. Non-negativity: The probability of any event A is greater than or equal to 0.
    P(A) ≥ 0
  2. Generalization: The entire sample space S has probability 1.
    P(S) = 1
  3. Additivity: For mutually exclusive events A and B, the probability of A or B occurring is the sum of their individual probabilities.
    P(A ∪ B) = P(A) + P(B)

For more complex scenarios, conditional probability and Bayes' theorem are essential concepts:

Conditional probability

The probability of event A given that event B has occurred is called the conditional probability of A B denoted by P(A|B). It is calculated as follows:

P(A|B) = P(A ∩ B) / P(B)
provided that P(B) > 0.

Bayes' theorem

Bayes' theorem deals with the conditional and marginal probabilities of random events. It is an important tool for updating probabilities based on new information. Bayes' theorem is expressed as:

P(A|B) = [P(B|A) * P(A)] / P(B)

Random variables and probability distributions

Random variables

A random variable is a variable that takes on different numerical values depending on the outcome of a random experiment. Random variables are classified into discrete and continuous types.

Discrete random variables: These take a countable number of possible outcomes. For example, throwing a dice and counting the number of successes in a series of experiments.

Continuous random variables: These can take an infinite number of possible outcomes within a given range. For example, the exact height of persons or the time taken to complete a task.

Probability distributions

The probability distribution describes how probabilities are distributed over the values of a random variable. For a discrete random variable, this is known as the probability mass function (PMF):

P(X = x) = p(x)
For a continuous random variable, this is known as the probability density function (PDF):
f(x)
The area under the PDF curve for a given interval provides the probability that the random variable falls within that interval.

Visual example: Probability mass function

0.25 0.20 0.25 0.05

Normal probability distribution

Binomial distribution

The binomial distribution is a discrete distribution that describes the number of successes in a certain number of independent Bernoulli trials, with the same probability of success. The probability of exactly k successes in n trials is given by:

P(X = k) = C(n, k) * p^k * (1-p)^(nk)
where C(n, k) is the binomial coefficient.

Normal distribution

The normal distribution, also known as the Gaussian distribution, is a continuous distribution that is symmetric about the mean. It is defined by its mean (µ) and standard deviation (σ) and is given by the probability density function:

f(x) = (1/(σ√(2π))) * e^(-(x-µ)²/(2σ²))

Visual example: Normal distribution

Mean

Introduction to statistics

Statistics is the discipline that deals with the collection, organization, analysis, interpretation, and presentation of data. It has two main branches: descriptive statistics and inferential statistics.

Descriptive statistics

Descriptive statistics summarize and describe the main characteristics of a dataset. This includes measures of central tendency, dispersion, and graphical representation.

Measures of central tendency: These measures describe the center of a data set. Common measures include the mean, median, and mode.

Measures of dispersion: Dispersion represents the spread of data points. Common measures include range, variance, and standard deviation.

Inferential statistics

Inferential statistics uses random samples of data taken from a population to describe and make inferences about the population. It includes hypothesis testing, estimation, and prediction.

Hypothesis testing

Hypothesis testing is a method of making decisions using data obtained from a scientific study. It involves testing an assumption or claim about a population parameter.

For example, you may want to test whether a new drug is more effective than an existing drug. You form two hypotheses:

  • H0 (null hypothesis): There is no difference in effectiveness.
  • H1 (alternative hypothesis): The new drug is more effective.

Conclusion

Probability and statistics form the basis of data analysis and decision making in various fields including science, engineering, economics and others. Probability helps measure uncertainty and estimate the likelihood of outcomes in random experiments, while statistics helps explain real-world imperatives based on data collection and analysis. Understanding these concepts is essential for making informed decisions in an uncertain world.


Graduate → 5


U
username
0%
completed in Graduate


Comments