FALL 2019

Introduction to Probability

RANDOM PROCESSES

Random process: a process/situation where we can identify a set of possible events/outcomes that could occur, but we don’t know which event will happen

  • Examples include coin tosses, die rolls, daily stock price

  • Often it is helpful to model processes as random that are not truly random

PROBABILITY

Terminology

  • Random process: process where we know the set of all possible outcomes but do not know the order in which they will occur
  • Sample space: set of all possible outcomes from a random process; denoted \(S\)
  • Event: possible outcome from \(S\); denoted by capital letters (e.g. A, B)
  • Probability of an event: proportion of times the outcome would occur if the random process could be observed an infinite number of times; denoted \(P(A)\) (read P of A)
    • \(0 \leq P(A) \leq 1\)

EXAMPLE

EXAMPLE

LAW OF LARGE NUMBERS

As the sample size grows (i.e. as more observations are recorded), the observed probability of an event (\(\hat{p}_n\)) converges to the probability of that event, \(p\).

LAW OF LARGE NUMBERS

Example: flips of a fair coin; \(\hat{p}_n\) is the proportion of observed heads

LAW OF LARGE NUMBERS

Assume you are tossing a fair coin and you observe 10 heads in a row. What is the probability that the eleventh toss will also result in a head? Is it 0.5? Less than 0.5? More than 0.5?

LAW OF LARGE NUMBERS

Assume you are tossing a fair coin and you observe 10 heads in a row. What is the probability that the eleventh toss will also result in a head? Is it 0.5? Less than 0.5? More than 0.5?

It is still 0.5; \(P(11^{th} \text{ toss H})=0.5\).

LAW OF LARGE NUMBERS

Assume you are tossing a fair coin and you observe 10 heads in a row. What is the probability that the eleventh toss will also result in a head? Is it 0.5? Less than 0.5? More than 0.5?

It is still 0.5; \(P(11^{th} \text{ toss H})=0.5\).

  • The coin is not “due” for a tail.
  • The common misunderstanding of the LLN is that random processes are supposed to compensate for whatever happened in the past; this is just not true and is also called gambler’s fallacy (or law of averages).

DISJOINT EVENTS

Two events/outcomes are disjoint if they cannot both happen. In other words, if you know one event happens, you also know that the other does not happen. Such events are also called mutually exclusive.

What are some examples of disjoint events?

DISJOINT EVENTS

Two events/outcomes are disjoint if they cannot both happen. In other words, if you know one event happens, you also know that the other does not happen. Such events are also called mutually exclusive.

What are some examples of disjoint events?

  • If a coin comes up heads, it cannot come up tails on the same toss
  • If a dice roll is odd, it cannot be even (on the same roll)

DISJOINT EVENTS - ADDITION RULE

If two events \(A\) and \(B\) are disjoint, then the probability that at least one of them occurs is

\[P(A \text{ or } B)=P(A)+P(B)\] If there are more than two disjoint events (\(k\), in this case), then the probability that at least one of them occurs is

\[P(A_1 \text{ or } A_2 \text{ or } \cdots \text{ or } A_k)=P(A_1)+P(A_2)+\cdots+P(A_k)\]

PRACTICE

Let’s play a game. As a class, you can pick a card color (red/black) and then a suit corresponding to that color (diamonds/hearts if red; spades/clubs if black). Then, we will draw 10 cards from a standard, well shuffled deck (52 cards). The class gets a point for each card corresponding to their suit and color. I get a point otherwise.

As we play, think about whether this is a fair game.

PRACTICE

Let’s play a game. As a class, you can pick a card color (red/black) and then a suit corresponding to that color (diamonds/hearts if red; spades/clubs if black). Then, we will draw 10 cards from a standard, well shuffled deck (52 cards). The class gets a point for each card corresponding to their suit and color. I get a point otherwise.

What is the probability that I get a point on a draw?

Suppose the class picked hearts. Then,

\[P(\text{not } heart)=P(diamond)+P(spade)+P(club)\] \[P(\text{not } heart)=1/4+1/4+1/4=3/4\]

PRACTICE

Consider the example about the Dow Jones. What is the probability that the Dow Jones does not go down tomorrow? Denote this using probability notation.

PRACTICE

Consider the example about the Dow Jones. What is the probability that the Dow Jones does not go down tomorrow? Denote this using probability notation.

  • U = up
  • D = down
  • NC = no change

\[P(\text{not } D)=P(U \text{ or } NC)=P(U) +P(NC)\]

GENERAL ADDITION RULE

Let A and B be any two events (disjoint or not). Then, the probability at least one of them occurs is \[P(A \text{ or } B)=P(A)+P(B)-P(A \text{ and } B) \] where \(P(A \text{ and }B)\) is the probability that both events occur.

Why do we need to subtract \(P(A \text{ and }B)\) in this expression?

GENERAL ADDITION RULE

Let A and B be any two events (disjoint or not). Then, the probability at least one of them occurs is \[P(A \text{ or } B)=P(A)+P(B)-P(A \text{ and } B)\] where \(P(A \text{ and }B)\) is the probability that both events occur.

Why do we need to subtract \(P(A \text{ and }B)\) in this expression?

PRACTICE

Recall our smallpox example from last week.

          Result
Inoculated died lived  Sum
       no   844  5136 5980
       yes    6   238  244
       Sum  850  5374 6224

What is the probability that an individual was inoculated (V) or lived (L)?

PRACTICE

Recall our smallpox example from last week.

          Result
Inoculated died lived  Sum
       no   844  5136 5980
       yes    6   238  244
       Sum  850  5374 6224

What is the probability that an individual was inoculated (V) or lived (L)?

\[P(V \text{ or } L)=P(V)+P(L)-P(V \text{ and } L)\]

\[P(V \text{ or } L)=244/6224+5374/6224-238/6224=\]

0.8643959

PROBABILITY DISTRIBUTIONS

A probability distribution is a list of the possible outcomes/events with corresponding probabilities satisfying the following three rules:

  1. The outcomes listed must be disjoint.
  2. Each probability must be between 0 and 1.
  3. The probabilities must total 1.

PROBABILITY DISTRIBUTIONS

For discrete random variables, a probability distribution can be represented in a table of all disjoint outcomes and their associated probabilities.

PROBABILITY DISTRIBUTIONS

Example: Handedness

About 90% of people are right-handed and the remainder are left-handed. The probability distribution for handedness of a child is:

Handedness R L
Probability 0.90 0.10

PROBABILITY DISTRIBUTIONS

Example: STAT 140 class data

Class year probability distribution:

 2020  2021  2022  2023 
0.333 0.273 0.364 0.030 

PRACTICE

In a survey, 52% of respondents said they are Democrats. What is the probability that a randomly selected respondent from this sample is a Republican?

  1. 0.48
  2. more than 0.48
  3. less than 0.48
  4. not enough information to calculate

PRACTICE

In a survey, 52% of respondents said they are Democrats. What is the probability that a randomly selected respondent from this sample is a Republican?

  1. 0.48
  2. more than 0.48
  3. less than 0.48
  4. not enough information to calculate

This depends on how many party affiliations we have. We need to be able to list all the possible events and their probabilities (the probability distribution) to answer this question.

COMPLEMENT OF AN EVENT

The complement of an event A is denoted by Ac, and Ac represents all the outcomes not in A. A and Ac are related:

  • \(P(A)+P(A^c)=1\)
  • \(P(A)=1-P(A^c)\)

Examples of complements:

  • Passing and failing the same exam.
  • Attending class and missing class on Wednesday this week.

PRACTICE

Assume you have two fair dice. What is the probability that the sum of the dice is less than or equal to 10?

Let T denote the total (sum) of the two dice faces.

PRACTICE

Assume you have two fair dice. What is the probability that the sum of the dice is less than or equal to 10?

Let T denote the total (sum) of the two dice faces.

\(P(T \leq 10)=1-P(T=11 \text{ or } T=12)\)

\(P(T=11 \text{ or } T=12)=P(T=11)+P(T=12)\)

\(P(T=11)+P(T=12)=2/36+1/36=1/12\)

\(P(T \leq 10)=1-1/12=11/12\)

PRACTICE

If two events are complements of each other, are they disjoint?

PRACTICE

If two events are complements of each other, are they disjoint?

Yes, if one happens, then the other cannot happen.

Example: Flipping a coin - if the coin is heads, then it cannot be tails on that flip. The events are heads (\(H\)) and tails (\(T\)), which can also be denoted not heads (\(H^c\)).

Example: Residence hall - if I live in the Rockies (\(R\)), I cannot live in any other residence hall (\(R^c\)).

PRACTICE

If two events are disjoint, are they complements of each other?

PRACTICE

If two events are disjoint, are they complements of each other?

Not necessarily. If \(B\) is the complement of \(A\) (i.e. \(B\) is \(A^c\)), then \(B\) is everything that \(A\) is not. \(A\) and \(B\) can be disjoint, however, without being complements. Consider the following:

Example: Flipping a coin - there are only two possible events here, and we know that they are disoint. In this case, \(H\) and \(T\) are disjoint events, but they are also complements (\(T=H^c\)).

Example: Residence hall - living in the Rockies (\(R\)) and living in Wilder (\(W\)) are disjoint events. However, the complement of \(R\), \(R^c\) is not \(W\). Rather, \(R^c\) is all the residences halls excluding the Rockies.

INDEPENDENT EVENTS

Two processes are independent is knowing the outcome of one provides no information about the outcome of the other.

Examples:

  • Results of flipping many coins
  • Drawing cards with replacement from a well-shuffled deck
  • The handedness of two people

HANDEDNESS OF TWO PEOPLE

About 90% of people are right-handed and the remainder are left-handed. Knowing the handedness of one person does not give you any information about the handedness of another person.

The probability distribution for handedness of two people is:

Handedness RR RL LR LL
Probability (0.90)(0.90)=0.81 (0.90)(0.10)=0.09 (0.10)(0.90)=0.09 (0.10)(0.10)=0.01

INDEPENDENT EVENTS - MULTIPLICATION RULE

Let A and B represent events from two different and independent processes. Then the probability that both A and B occur is the product of their separate probabilities:

\[ P(A \text{ and } B)=P(A)\times P(B).\] If there are \(k\) events \(A_1,...,A_k\) from \(k\) independent processes, then the probability they all occur is

\[P(A_1)\times P(A_2)\times\cdots\times P(A_k).\]

PRACTICE

When a certain telemarketer makes a call, they have a 10% chance of making a sale (Y) and a 90% of not making a sale (N).

If the telemarketer makes three calls, what is the probability of making exactly one sale?

PRACTICE

When a certain telemarketer makes a call, they have a 10% chance of making a sale (Y) and a 90% of not making a sale (N).

If the telemarketer makes three calls, what is the probability of making exactly one sale?

\(P(\text{exactly one } Y)=P(\text{one } Y \text{ and } \text{two } N)\) \(P(\text{one } Y \text{ and } \text{two } N)=P(YNN \text{ or } NYN \text{ or } NNY)\) \(P(YNN \text{ or } NYN \text{ or } NNY)=P(Y)P(N)P(N)+P(N)P(Y)P(N)+P(N)P(N)P(Y)\) \(P(Y)P(N)P(N)+P(N)P(Y)P(N)+P(N)P(N)P(Y)=3\times(0.1)(0.9)(0.9)=0.243\)

PRACTICE

When a certain telemarketer makes a call, they have a 10% chance of making a sale (Y) and a 90% of not making a sale (N).

If the telemarketer makes three calls, what is the probability of making at least one sale?

PRACTICE

When a certain telemarketer makes a call, they have a 10% chance of making a sale (Y) and a 90% of not making a sale (N).

If the telemarketer makes three calls, what is the probability of making at least one sale?

\(P(\text{at least one } Y)=1-P(\text{all } N)\) \(1-P(\text{all } N)=1-P(N)P(N)P(N)=1-(0.9)^3=0.271\)

Conditional Probability

MOTIVATION

Let’s revisit our smallpox example one more time. At the time of the 1721 smallpox epidemic in Boston, there was hypothesis that vaccinated people were more likely to survive the epidemic than non-vaccinated people.

Conditional probability allows us to explore the relationship between vaccination status (yes/no) and result (lived/died) to address that hypothesis.

MARGINAL PROBABILITY

Marginal Probability: a probability based on only one variable or process; example form:

\[ P(A) \]

Small pox example

(\(V\)=vaccinated, \(V^c\)= not vaccinated, \(L\)=lived, \(L^c\)=died):

  • \(P(V)\)= probability vaccinated
  • \(P(V^c)\)= probability not vaccinated
  • \(P(L)\)= probability lived
  • \(P(L^c)\)= probability died

JOINT PROBABILITY

Joint Probability: a probability of outcomes for two or more variables or processes; example form:

\[ P(A \text{ and }B) \]

Small pox example

  • \(P(V \text{ and } L)\)=probability vaccinated and lived
  • \(P(V^c \text{ and } L^c)\)=probability not vaccinated and died

CONDITIONAL PROBABILITY

Conditional Probability:

\[P(A|B)=\frac{P(A \text{ and }B)}{P(B)}\]

Components of conditional probability:

  • Outcome of interest: \(A\)
  • Condition: \(B\) - we are only looking at cases that satisfy this condition.

PRACTICE

Smallpox example:

  • What is the probability of a randomly chosen person not being vaccinated and living?

  • What is the probability of a randomly chosen person who is not vaccinated living?

\(L^c\) \(L\) Total
\(V^c\) 844 5136 5980
\(V\) 6 238 244
Total 850 5374 6224

PRACTICE

Smallpox example:

What is the probability of a randomly chosen person not being vaccinated and living?

\(L^c\) \(L\) Total
\(V^c\) 844 5136 5980
\(V\) 6 238 244
Total 850 5374 6224

Joint probability: \[P(V^c \text{ and } L)=5136/6224\]

PRACTICE

Smallpox example:

What is the probability of a randomly chosen person who is not vaccinated living?

\(L^c\) \(L\) Total
\(V^c\) 844 5136 5980
\(V\) 6 238 244
Total 850 5374 6224

Conditional probability: \[P(L|V^c)=5136/5980\]

GENERAL MULTIPLICATION RULE

In general, if \(A\) and \(B\) represent any two outcomes or events (either independent or dependent), then

\[P(A \text{ and } B)=P(A|B)\times P(B).\]

GENERAL VERSUS INDEPENDENT MULTIPLICATION RULE

It is always true that

\[P(A \text{ and } B)=P(A|B)\times P(B).\]

Recall for independent events \(A\) and \(B\),

\[P(A \text{ and } B)=P(A)\times P(B).\]

Why does \(P(A)=P(A|B)\) when \(A\) and \(B\) are independent?

PET OWNERSHIP DEMOGRAPHICS

Millenials led US pet ownership to 84.6 million in 2016

Pet Owner: Y or N Demographic: Gen Y/millenial (M), Baby Boomer (B), Other (O)

Probability of (living in a household) owning a pet: \[P(Y)=0.68\] Sums of demographics of pet owners (condition=pet ownership, denoted Y): \[ P(M|Y)+P(B|Y)+P(O|Y)=0.35+0.32+0.33=1 \]

SUM OF CONDITIONAL PROBABILITIES

Let \(A_1, ..., A_k\) represent all the disjoint events for a variable or process. Then, if \(B\) is an event, possibly for another event or process, we have:

\[P(A_1|B)+\cdots +P(A_k|B)=1 \]

PET OWNERSHIP DEMOGRAPHICS

Millenials led US pet ownership to 84.6 million in 2016

Pet Owner: Y or N Demographic: Gen Y/millenial (M), Baby Boomer (B), Other (O)

Probability of (living in a household) owning a pet: \[P(Y)=0.68\] Probability of not being a millenial given that own a pet: \[ P(M^c|Y)=1-P(M|Y)=1-0.35=0.65 \]

COMPLEMENTS - CONDITIONAL PROBABILITIES

The rule for complements also holds when an event and its complement are conditioned on the same information:

\[P(A|B)=1-P(A^c|B) \]

BAYES’ THEOREM - MOTIVATING EXAMPLE

Diagnostic tests are used to determine whether an individual has a particular disease. However, diagnostic tests are not perfect, so a positive test result does not guarantee that the tested individual has the disease. Generally, characteristics of the test are determined in a lab so we know how much we can trust the result. These are called sensitivity and specificity.

  • Notation: \(D=\text{ disease}\), \(D^c=\text{ no disease}\), \(T= \text{ positive test}\), \(T^c= \text{negative test}\)

Known quantities:

  • Sensitivity: \(P(T|D)\)
  • Specificity: \(P(T^c|D^c)\)
  • Prevalence: \(P(D)\)

Want to know:

  • \(P(D|T)\)
  • Use Bayes’ Theorem!

BAYES’ THEOREM

Allows us to invert probabilities. For events \(A\) and \(B\),

\[P(B|A)=\frac{P(A|B)P(B)}{P(A \text{ and } B)+P(A \text{ and } B^c)}=\frac{P(A\text{ and }B)}{P(A)}.\]

This rule sets the foundation for Bayesian statistics, which we will not cover in this class.

REFERENCES

  • Diez et al. (2019) OpenIntro Statistics, Fourth Edition