PHD ↓
Probability and Statistics
Introduction
Probability and statistics are two important branches of mathematics that are very closely intertwined. Probability is the study of randomness and uncertainty, and it provides a way to forecast the likelihood of various outcomes in uncertain situations. Statistics is the science of collecting, analyzing, interpreting, presenting, and organizing data. Together, these fields help us make predictions, test hypotheses, and make decisions based on data.
Understanding Probability
Probability measures how likely an event is to occur. It is a number between 0 and 1, where 0 indicates that an event cannot occur and 1 indicates that it is certain to occur. Probability can be expressed as:
P(Event) = Number of favorable outcomes / Total number of possible outcomes
Example of Probability
Consider the simple example of tossing a fair coin. There are two possible outcomes: heads or tails. The probability of heads is:
P(Heads) = 1 / 2 = 0.5
Similarly, the probability of getting tails is also 0.5. We can also represent it like this:
Basic Probability Concepts
There are several fundamental concepts in probability that you should understand:
- Experiment: A process that leads to one or more outcomes. For example, throwing dice or drawing a card.
- Sample space: The set of all possible outcomes from an experiment. For example, {1, 2, 3, 4, 5, 6} for a six-sided dice.
- Event: A subset of outcomes from the sample space. It can be a single outcome or multiple outcomes.
- Complementary Events: Events that are not part of the original event. For example, if event A is getting an even number, then the complementary event is getting an odd number. The probability of complementary events is given by:
P(A') = 1 - P(A)
Example of Complementary Events
If we throw a 6-sided dice, the probability of getting a number greater than 4 (i.e. 5 or 6) is:
P(Number > 4) = 2/6 = 1/3
Thus, the probability that the number coming out is not more than 4 is:
P(Number ≤ 4) = 1 - P(Number > 4) = 1 - 1/3 = 2/3
Conditional Probability
Conditional probability is the probability of an event occurring, provided another event has already occurred. It is represented as P(A|B), which is read as the probability of A given B.
P(A|B) = P(A ∩ B) / P(B)
Example of Conditional Probability
Suppose you have a deck of 52 cards, and you want to find the probability that a card drawn is a king, given that it is red. The probability of drawing a king (A) and the probability of the card being red (B) are:
P(King) = 4/52 = 1/13
P(Red) = 26/52 = 1/2
Since there are 2 kings among the 26 red cards, we have:
P(King ∩ Red) = 2/52 = 1/26
Thus, the conditional probability is:
P(King | Red) = P(King ∩ Red) / P(Red) = (1/26) / (1/2) = 2/26 = 1/13
Law of Total Probability and Bayes' Theorem
Law of Total Probability
The law of total probability is used to calculate the probability of an event by considering all possible ways for the event to occur. It states that if B1, B2, ..., Bn are mutually exclusive events that form a partition of the sample space, then:
P(A) = P(A ∩ B1) + P(A ∩ B2) + ... + P(A ∩ Bn)
Using conditional probability, this can be written as:
P(A) = P(A|B1)P(B1) + P(A|B2)P(B2) + ... + P(A|Bn)P(Bn)
Bayes' theorem
Bayes' theorem is a powerful tool in probability that allows us to invert conditional probabilities. It is given as:
P(A|B) = [P(B|A) * P(A)] / P(B)
Example of Bayes Theorem
Suppose 1% of a population suffers from a particular disease and there is a test available for this disease that is 99% accurate.
- P(disease) = 0.01 (1% have the disease)
- P(no disease) = 0.99
- P(positive test|disease) = 0.99
- P(positive test|no disease) = 0.01 (false positive rate)
To find the probability that a person actually has the disease given a positive test result, use Bayes' theorem:
P(Disease|Positive Test) = [P(Positive Test|Disease) * P(Disease)] / P(Positive Test)
Where:
P(Positive Test) = P(Positive Test|Disease) * P(Disease) + P(Positive Test|No Disease) * P(No Disease)
= 0.99 * 0.01 + 0.01 * 0.99
= 0.0099 + 0.0099
= 0.0198
Thus, the probability that a person actually has the disease after a positive test result is:
P(Disease|Positive Test) = [0.99 * 0.01] / 0.0198 = 0.0099 / 0.0198 = 0.5
Statistics overview
As we move from probability to statistics, we focus more on data collection, analysis, and interpretation. Some of the basic concepts in statistics are as follows:
Descriptive statistics
Descriptive statistics summarize the characteristics of a data set. They can provide simple summaries about samples and measurements. Here are some key terms:
- Mean: The average of a data set.
- Median: The middle value when the data is sorted.
- Mode: The value that appears most often.
- Variance: This is a measure of how much the values in a data set vary from the mean.
- Standard deviation: The square root of the variance, which shows how spread out the values are around the mean.
Inferential statistics
Inferential statistics allows us to make predictions or inferences about a population based on a sample of data. This includes estimating population parameters, testing hypotheses, and making predictions.
Example
Suppose we have the following data set showing the test scores of a group of 10 students:
Test Scores: 82, 90, 76, 88, 95, 79, 84, 92, 78, 81
We can calculate the mean, median and mode as follows:
- Mean: sum of scores, divided by the number of observations:
Mean = (82 + 90 + 76 + 88 + 95 + 79 + 84 + 92 + 78 + 81) / 10 = 84.5
- Median: The middle score when the data is arranged in ascending order:
Ordered Scores: 76, 78, 79, 81, 82, 84, 88, 90, 92, 95
Median = (82 + 84) / 2 = 83 - Mode: Most Frequent Score:
Mode = None (all scores appear only once)
Probability distributions
Probability distributions describe how the probabilities of different outcomes are distributed in the sample space. Normal distributions include:
Discrete distribution
- Binomial distribution: Describes the number of successes in a fixed number of independent Bernoulli trials (e.g., tossing a coin).
- Poisson distribution: Describes the number of events that occur in a certain interval of time or space.
Continuous delivery
- Normal distribution: Also known as the Gaussian distribution, this is a bell-shaped curve that is symmetric about the mean (e.g., people's height).
- Exponential distribution: Describes the time between events in a Poisson process.
Conclusion
Probability and statistics together form a fundamental part of mathematics that helps understand and deal with uncertainty. From predicting outcomes based on probability models to analyzing real-world data with statistical techniques, these fields provide powerful tools for decision-making in various fields such as business, engineering, healthcare, and more. With an understanding of basic concepts such as sample space, events, probability rules, and statistical measures, one can effectively interpret data and draw conclusions that guide action. As you delve deeper into each topic, the mathematical elegance and practical applications will become apparent, demonstrating the richness and utility of probability and statistics in the real world.