FALL 2019
The normal distribution is the most common distribution in statistics. It has a mean \(\mu\) and a standard deviation \(\sigma\), which describe it entirely.
If we assume a random variable \(X\) is normally distributed (mean \(\mu\), SD \(\sigma\)), then
The Z-score of an observation is the number of standard deviations it falls above or below the mean. The Z-score for an observation \(x\) that follows a distribution with mean \(\mu\) and standard deviation \(\sigma\) is computed as:
\[Z=\frac{x-\mu}{\sigma}\]
The normal distribution with \(\mu=0\) and \(\sigma=1\) is called the standard normal distribution.
If \(X \sim N(\mu, \sigma)\), then \(X\) is a continuous random variable (recall end of Chapter 3 lecture).
What does probability mean for a continuous random variable?
If \(X \sim N(\mu, \sigma)\), then \(X\) is a continuous random variable (recall end of Chapter 3 lecture).
What does probability mean for a continuous random variable?
How do we find the area under the curve (probability) if \(X \sim N(\mu,\sigma)\)?
For (2) and (3), we assume our process follows a normal distribution - this is a MODEL - it is NOT EXACT.
pnorm() gives us the probability of an observation below a certain value, given an appropriate mean and SD
## P(X < 69): pnorm(69, mean=mean(cdc_m$height), sd=sd(cdc_m$height))
## [1] 0.338728
## P(X < 72): pnorm(72, mean=mean(cdc_m$height), sd=sd(cdc_m$height))
## [1] 0.7193796
According to the CDC data base, the mean height of US men is 70.25 inches and the SD is 3.01 inches. If we model height (random variable \(X\)) as normal (with the previously stated mean and variance), what is the probability a male is between 69 and 72 inches tall?
According to the CDC data base, the mean height of US men is 70.25 inches and the SD is 3.01 inches. If we model height (random variable \(X\)) as normal (with the previously stated mean and variance), what is the probability a male is between 69 and 72 inches tall?
Step 1: Draw a picture to identify what you want
Step 2: Identify what you information you can get
## P(X < 69) pnorm(q=69, mean=mean(cdc_m$height), sd=sd(cdc_m$height))
## [1] 0.338728
## P(X < 72) pnorm(q=72, mean=mean(cdc_m$height), sd=sd(cdc_m$height))
## [1] 0.7193796
Step 3: Connect what we have to what we want
Both pnorm() and the Z table give lower tail probabilities. To get what we want:
\[P(69 \leq X \leq 72)=P(X\leq 72)-P(X\leq 69)=0.7914-0.3387=0.4527\]
At Heinz ketchup factory the amounts which go into bottles of ketchup are supposed to be normally distributed with mean 36 oz. and standard deviation 0.11 oz. Once every 30 minutes a bottle is selected from the production line, and its contents are noted precisely. If the amount of ketchup in the bottle is below 35.8 oz. or above 36.2 oz., then the bottle fails the quality control inspection. What percent of bottles have less than 35.8 ounces of ketchup?
At Heinz ketchup factory, the amounts which go into bottles of ketchup are supposed to be normally distributed with mean 36 oz. and standard deviation 0.11 oz. Once every 30 minutes a bottle is selected from the production line, and its contents are noted precisely. If the amount of ketchup in the bottle is below 35.8 oz. or above 36.2 oz., then the bottle fails the quality control inspection. What percent of bottles have less than 35.8 ounces of ketchup?
At Heinz ketchup factory, the amounts which go into bottles of ketchup are supposed to be normally distributed with mean 36 oz. and standard deviation 0.11 oz. Once every 30 minutes a bottle is selected from the production line, and its contents are noted precisely. If the amount of ketchup in the bottle is below 35.8 oz. or above 36.2 oz., then the bottle fails the quality control inspection. What percent of bottles have less than 35.8 ounces of ketchup?
\(P(X<35.8)=P(X\leq 35.8)=\) 0.0345182
What percent of bottles pass quality control inspection?
What percent of bottles pass quality control inspection?
What percent of bottles pass quality control inspection?
## P(X < 36.2)-P(X < 35.8) pnorm(q=36.2, mean=36, sd=0.11)-pnorm(q=35.8, mean=36, sd=0.11)
## [1] 0.9309637
Body temperatures of healthy humans are distributed nearly normally with mean 98.2 F and standard deviation 0.73 F. What is the cutoff for the lowest 3% of human body temperatures?
qnorm(0.03)
## [1] -1.880794
Body temperatures of healthy humans are distributed nearly normally with mean 98.2 F and standard deviation 0.73 F. What is the cutoff for the lowest 3% of human body temperatures on the original scale?
co <- qnorm(0.03) co
## [1] -1.880794
Body temperatures of healthy humans are distributed nearly normally with mean 98.2 F and standard deviation 0.73 F. What is the cutoff for the lowest 3% of human body temperatures on the original scale?
co <- qnorm(0.03) co
## [1] -1.880794
\(Z=\frac{x-\mu}{\sigma}=\frac{x-98.2}{0.73}=-1.88\)
\(-1.88\times 0.73+98.2=96.8\)
Body temperatures of healthy humans are distributed nearly normally with mean 98.2 F and standard deviation 0.73 F. What is the cutoff for the highest 10% of human body temperatures (on the original scale)?
Body temperatures of healthy humans are distributed nearly nor- mally with mean 98.2 F and standard deviation 0.73 F. What is the cutoff for the highest 10% of human body temperatures (on the original scale)?
co <- qnorm(0.90) co
## [1] 1.281552
Body temperatures of healthy humans are distributed nearly nor- mally with mean 98.2 F and standard deviation 0.73 F. What is the cutoff for the highest 10% of human body temperatures (on the original scale)?
co <- qnorm(0.90) co
## [1] 1.281552
\(Z=\frac{x-\mu}{\sigma}=\frac{x-98.2}{0.73}=1.28\)
\(1.28\times 0.73+98.2=99.1\)
Rule of thumb for the probability of falling within 1, 2, and 3 standard deviations of the mean in the normal distribution.
SAT scores are distributed nearly normally with mean 1500 and standard deviation 300.
SAT scores are distributed nearly normally with mean 1500 and standard deviation 300.
Which of the following is false?
Which of the following is false?
When a certain telemarketer makes a call, they have a 10% chance of making a sale (Y) and a 90% of not making a sale (N).
If the telemarketer makes three calls, what is the probability of making exactly one sale?
Scenario 1:
\[P(YNN)=P(Y)P(N)P(N)=0.1\times 0.9\times 0.9=0.081\]
Scenario 2:
\[P(NYN)=P(N)P(Y)P(N)=0.9\times 0.1\times 0.9=0.081\]
Scenario 3:
\[P(NNY)=P(N)P(N)P(Y)=0.9\times 0.9\times 0.1=0.081\]
Prob exactly one sale: 0.081+0.081+0.081=3x0.081=0.243
The question on the previous slide asked for the probability of a given number of “successes”, \(k\), in a given number of independent trials, \(n\) (\(k=1\) success in \(n=3\) trials).
We calculated this probability as: \[\# \ scenarios \times P(single \ scenario)\]
Fortunately, there is a less tedious way to count the “number of scenarios”.
Writing out the number of scenarios is possible for small examples, like the telemarketer problem. For larger \(n\) and/or \(k\) different than 1, (e.g. \(n=9\), \(k=2\)), this gets much more tedious and error prone (feel free to try by modifying the previous example).
The choose function is used to calculate the number of ways to choose \(k\) successes in \(n\) trials:
\[\left(\begin{matrix}n \\ k \end{matrix}\right)=\frac{n!}{k!(n-k)!}\] Factorial:
Which of the following is false?
Which of the following is false?
Binomial distribution: used to describe the number of successes \(k\) in a fixed number of independent trials \(n\)
\[P(X=k)=\underbrace{\left(\begin{matrix}n \\ k \end{matrix}\right)}_{\# \ scenarios}\underbrace{p^k(1-p)^{n-k}}_{P(single \ scenario)}=\frac{n!}{k!(n-k)!}p^k(1-p)^{n-k}\]
The random variable \(X\) is binomial if the following conditions are met:
All four of these conditions must be satisfied to have a binomial distribution. You should know these conditions.
Revisiting our motivating example:
When a certain telemarketer makes a call, they have a 10% chance of making a sale (Y) and a 90% of not making a sale (N).
Question: Is this binomial?
Revisiting our motivating example:
When a certain telemarketer makes a call, they have a 10% chance of making a sale (Y) and a 90% of not making a sale (N).
Question: Is this binomial?
How do we find probabilities when \(X \sim Binomial(n,p)\)?
For all options, we assume our process follows a binomial distribution - this is a MODEL
You can always calculate binomial probabilities using the formula, but it gets unwieldy quickly as \(n\) increases. For small enough \(n\), it is practical to use the formula:
\[P(X=k)=\left(\begin{matrix}n \\ k \end{matrix}\right)p^k(1-p)^{n-k}=\frac{n!}{k!(n-k)!}p^k(1-p)^{n-k}\]
dbinom() gives us the probability of observing exactly a certain number of successes \(k\) (x argument) given a fixed number of trials, \(n\) (size argument) and a probability of success, \(p\) (prob argument)
EXAMPLE: Suppose there is a quiz with 10 multiple choice questions, each having four possible answers. If a student guesses randomly on each question, what it the probability that they get exactly 8 questions correct.
## P(X=8), X ~ Binomial(10, 0.25) dbinom(x=8, size=10, prob=0.25)
## [1] 0.0003862381
dbinom() gives us the probability of observing exactly a certain number of successes \(k\) (x argument) given a fixed number of trials, \(n\) (size argument) and a probability of success, \(p\) (prob argument)
EXAMPLE: Suppose there is a quiz with 10 multiple choice questions, each having four possible answers. If a student guesses randomly on each question, what it the probability that they get at least 8 questions correct.
## P(X>=8), X ~ Binomial(10, 0.25) (dbinom(x=8, size=10, prob=0.25) +dbinom(x=9, size=10, prob=0.25) +dbinom(x=10, size=10, prob=0.25))
## [1] 0.000415802
pbinom() gives us the probability of observing at most a certain number of successes \(k\) (q argument) given a fixed number of trials, \(n\) (size argument) and a probability of success, \(p\) (prob argument)
EXAMPLE: Suppose there is a quiz with 10 multiple choice questions, each having four possible answers. If a student guesses randomly on each question, what it the probability that they get at least 8 questions correct.
## P(X >= 8), X ~ Binomial(10, 0.25) 1-pbinom(q=7, size=10, prob=0.25)
## [1] 0.000415802
The binomial distribution with probability of success, \(p\), is nearly normal when the sample size \(n\) is sufficiently large such that \(np\) and \(np(1-p)\) are both at least 10. The approximate normal distribution has parameters corresponding to the mean and standard deviation of the binomial distribution:
Steps