SPRING 2020

Normal Distribution

MOTIVATING EXAMPLE

  • Red line - true density
  • Blue dashed line - normal model with mean and SD from observed heights

NORMAL DISTRIBUTION

The normal distribution is the most common distribution in statistics. It has a mean \(\mu\) and a standard deviation \(\sigma\), which describe it entirely.

If we assume a random variable \(X\) is normally distributed (mean \(\mu\), SD \(\sigma\)), then

  • \(X\sim N(\mu,\sigma)\)
  • Distribution of \(X\) is:
    • Unimodal
    • Symmetric
    • “Bell-shaped”

NORMAL DISTRIBUTION

HOW DO WE COMPARE NORMAL CURVES?

STANDARDIZING WITH Z-SCORES

The Z-score of an observation is the number of standard deviations it falls above or below the mean. The Z-score for an observation \(x\) that follows a distribution with mean \(\mu\) and standard deviation \(\sigma\) is computed as:

\[Z=\frac{x-\mu}{\sigma}\]

STANDARDIZED NORMAL CURVES - HEIGHTS

STANDARD NORMAL DISTRIBUTION

The normal distribution with \(\mu=0\) and \(\sigma=1\) is called the standard normal distribution.

WHY STANDARDIZE?

  • Facilitate comparisons
    • Percentiles for standardized testing, SAT vs ACT
    • Percentiles for height and weight for toddlers in different populations
  • Z-scores have a special meaning
    • number of standard deviations above or below the mean

NORMAL DISTRIBUTION - PROBABILITY

If \(X \sim N(\mu, \sigma)\), then \(X\) is a continuous random variable (recall Section 3.5).

What does probability mean for a continuous random variable?

NORMAL DISTRIBUTION - PROBABILITY

If \(X \sim N(\mu, \sigma)\), then \(X\) is a continuous random variable (recall Section 3.5).

What does probability mean for a continuous random variable?

  • Defined as area under the curve (e.g. \(P(69\leq X \leq 73)\)):

FINDING TAIL AREAS

How do we find the area under the curve (probability) if \(X \sim N(\mu,\sigma)\)?

  1. Calculus (not covered in this course)
  2. R: pnorm() function
  3. Normal table

For (2) and (3), we assume our process follows a normal distribution - this is a MODEL - it is NOT EXACT.

FINDING TAIL AREAS: R

pnorm() gives us the probability of an observation below a certain value, given an appropriate mean and SD

## P(X < 69):
pnorm(69, mean=mean(cdc_m$height), sd=sd(cdc_m$height))
## [1] 0.338728
## P(X < 72):
pnorm(72, mean=mean(cdc_m$height), sd=sd(cdc_m$height))
## [1] 0.7193796

FINDING TAIL AREAS: TABLE

PRACTICE

According to the CDC data base, the mean height of US men is 70.25 inches and the SD is 3.01 inches. If we model height (random variable \(X\)) as normal (with the previously stated mean and variance), what is the probability a male is between 69 and 72 inches tall?

PRACTICE

According to the CDC data base, the mean height of US men is 70.25 inches and the SD is 3.01 inches. If we model height (random variable \(X\)) as normal (with the previously stated mean and variance), what is the probability a male is between 69 and 72 inches tall?

Step 1: Draw a picture to identify what you want

PRACTICE

Step 2: Identify what you information you can get

## P(X < 69)
pnorm(q=69, 
      mean=mean(cdc_m$height), 
      sd=sd(cdc_m$height))
## [1] 0.338728

## P(X < 72)
pnorm(q=72, 
      mean=mean(cdc_m$height), 
      sd=sd(cdc_m$height))
## [1] 0.7193796

PRACTICE

Step 3: Connect what we have to what we want

Both pnorm() and the Z table give lower tail probabilities. To get what we want:

\[P(69 \leq X \leq 72)=P(X\leq 72)-P(X\leq 69)=0.7914-0.3387=0.4527\]

PRACTICE

At Heinz ketchup factory the amounts which go into bottles of ketchup are supposed to be normally distributed with mean 36 oz. and standard deviation 0.11 oz. Once every 30 minutes a bottle is selected from the production line, and its contents are noted precisely. If the amount of ketchup in the bottle is below 35.8 oz. or above 36.2 oz., then the bottle fails the quality control inspection. What percent of bottles have less than 35.8 ounces of ketchup?

PRACTICE

At Heinz ketchup factory, the amounts which go into bottles of ketchup are supposed to be normally distributed with mean 36 oz. and standard deviation 0.11 oz. Once every 30 minutes a bottle is selected from the production line, and its contents are noted precisely. If the amount of ketchup in the bottle is below 35.8 oz. or above 36.2 oz., then the bottle fails the quality control inspection. What percent of bottles have less than 35.8 ounces of ketchup?

PRACTICE

At Heinz ketchup factory, the amounts which go into bottles of ketchup are supposed to be normally distributed with mean 36 oz. and standard deviation 0.11 oz. Once every 30 minutes a bottle is selected from the production line, and its contents are noted precisely. If the amount of ketchup in the bottle is below 35.8 oz. or above 36.2 oz., then the bottle fails the quality control inspection. What percent of bottles have less than 35.8 ounces of ketchup?

\(P(X<35.8)=P(X\leq 35.8)=\) 0.0345182

PRACTICE

What percent of bottles pass quality control inspection?

PRACTICE

What percent of bottles pass quality control inspection?

PRACTICE

What percent of bottles pass quality control inspection?

## P(X < 36.2)-P(X < 35.8)
pnorm(q=36.2, mean=36, sd=0.11)-pnorm(q=35.8, mean=36, sd=0.11)
## [1] 0.9309637

FINDING CUTOFF POINTS

Body temperatures of healthy humans are distributed nearly normally with mean 98.2 F and standard deviation 0.73 F. What is the cutoff for the lowest 3% of human body temperatures?

  • Use R: qnorm() function (what does this return?)
qnorm(0.03)
## [1] -1.880794
  • Use Z table (work backwards)

FINDING CUTOFF POINTS

Body temperatures of healthy humans are distributed nearly normally with mean 98.2 F and standard deviation 0.73 F. What is the cutoff for the lowest 3% of human body temperatures on the original scale?

  • Use R: qnorm() function (what does this return?)
co <- qnorm(0.03)
co
## [1] -1.880794

FINDING CUTOFF POINTS

Body temperatures of healthy humans are distributed nearly normally with mean 98.2 F and standard deviation 0.73 F. What is the cutoff for the lowest 3% of human body temperatures on the original scale?

  • Use R: qnorm() function (what does this return?)
co <- qnorm(0.03)
co
## [1] -1.880794

\(Z=\frac{x-\mu}{\sigma}=\frac{x-98.2}{0.73}=-1.88\)

\(-1.88\times 0.73+98.2=96.8\)

PRACTICE

Body temperatures of healthy humans are distributed nearly normally with mean 98.2 F and standard deviation 0.73 F. What is the cutoff for the highest 10% of human body temperatures (on the original scale)?

PRACTICE

Body temperatures of healthy humans are distributed nearly nor- mally with mean 98.2 F and standard deviation 0.73 F. What is the cutoff for the highest 10% of human body temperatures (on the original scale)?

co <- qnorm(0.90)
co
## [1] 1.281552

PRACTICE

Body temperatures of healthy humans are distributed nearly nor- mally with mean 98.2 F and standard deviation 0.73 F. What is the cutoff for the highest 10% of human body temperatures (on the original scale)?

co <- qnorm(0.90)
co
## [1] 1.281552

\(Z=\frac{x-\mu}{\sigma}=\frac{x-98.2}{0.73}=1.28\)

\(1.28\times 0.73+98.2=99.1\)

68-95-99.7 RULE

Rule of thumb for the probability of falling within 1, 2, and 3 standard deviations of the mean in the normal distribution.

USING THE 68-95-99.7 RULE

SAT scores are distributed nearly normally with mean 1500 and standard deviation 300.

USING THE 68-95-99.7 RULE

SAT scores are distributed nearly normally with mean 1500 and standard deviation 300.

PRACTICE

Which of the following is false?

  1. Majority of Z scores in a right skewed distribution are negative.
  2. In skewed distributions the Z score of the mean might be different than 0.
  3. For a normal distribution, IQR is less than 2 x SD.
  4. Z scores are helpful for determining how unusual a data point is compared to the rest of the data in the distribution.

PRACTICE

Which of the following is false?

  1. Majority of Z scores in a right skewed distribution are negative.
  2. In skewed distributions the Z score of the mean might be different than 0.
  3. For a normal distribution, IQR is less than 2 x SD.
  4. Z scores are helpful for determining how unusual a data point is compared to the rest of the data in the distribution.

REFERENCES

  • Diez et al. (2019) OpenIntro Statistics, Fourth Edition