Statistics 101
 Data Analysis and Statistical Inference

Answers to extra problems on normal distribution, expected values, and Central Limit Theorem
 


1.  Drinking and Driving

i)   The problem states that the measurement of breathalyzer in percentages for someone with blood alcohol level .095 follows a normal curve with mean .095 and standard deviation .004.

We want the probability of getting more than .10 as a measurement.  Let's standardize .10 by subtracting the mean and dividing by the standard deviation:

z = (.10-.095)/.004  = 1.25

We want the area under the normal curve to the right of 1.25. To get this using the table, we take half of (1 - .7887) to obtain .1056.  Or, you can get it directly from JMP.

ii)  The measurement of breathalyzer in percentages for someone with blood alcohol level .15 follows a normal curve with mean .15 and standard deviation of  .004.

We want the chance that the measurement for this person is less than .10.   Standardizing, we get

z = (.10 - .15) / .004 = -12.5.

The chance of observing a z-value less than (to the left of) 12.5 is very, very small (and not even on the table).

iii)  For people with true levels of .10, the chance that any one individual will be booked is .50 (the median value for these people is .10).   Hence, the probability that any one individual will not be booked is also 0.50.

Now, want Pr(at least one individual is booked).  Since the only possible outcomes are "at least one person is booked"  and "no one is booked", it is true that:

Pr(at least one is booked) + Pr(no one is booked) = 1.   So, Pr(at least one is booked) = 1 - Pr(no one is booked).

Now,

Pr(no one is booked) = Pr(first person not booked, and second person not booked, and... etc..., and eighth person not booked, and ninth person not booked).

Since the test is done separately on each person, we can assume that readings are independent.

Hence,

Pr(first person not booked, and second person not booked, and..., etc..., and ninth person not booked)
=  Pr(first person not booked)  * Pr(second person not booked) * ... * Pr(ninth person not booked)
= 0.5 * 0.5 * ... * 0.5   (nine of these 0.5s)
= 0.5^9.

Finally, we have Pr(at least one person booked) = 1 - 0.5^9.
 

2.  Let's win some money!!

Here's the probabilities associated with each possible outcome of Game 1:

outcome    Probability of outcome
------------------------------------------
    0                    .40
    1                    .10
    2                    .50

Here's the probabilities associated with each possible outcome of Game 2:

outcome    Probability of outcome
-----------------------------------------
    0                   .05
    1                   .80
    2                   .15
 

We can think of these games as two box models, each box containing 100 tickets.  In the box for Game 1, 40 tickets have a zero, 10 tickets have a one, and 50 tickets have a two.  In the box for Game 2, 5 tickets have a zero, 80 tickets have a one, and 15 tickets have a two.    Using the method for determining expected values, we get:

(i)    EV for Game 1 = (0 * 40 + 1 * 10 + 2 * 50)/100  =  1.10.
       EV for Game 2 = (0 *  5 + 1 *  80 + 2 * 15)/100  =  1.10.

Since the EVs are equal, you expect to earn the same amount playing either game!

(ii)  If you play over and over and over again, then the amount of money you actually earn in game 1 should approach EV of Game 1.  Similarly, if you play over and over and over again, then the amount of money you actually earn in game 2 should approach EV of Game 2.   Hence, you should get approximately the same payoff in each game.  The right answer is (c).

(iii)   To save space, from now on I will use decimals in place of dividing by 100.  For example, I write .40 in place of multiplying by 40 and dividing by 100.

        SD of Game 1  = square root[ (0 - 1.1)*(0 - 1.1)  *  .40  +  (1 - 1.1)* (1 - 1.1)  *  .10  +  (2 - 1.1)*(2-1.1)  * .50 ]  =  square root [0.89]
        SD of Game 2 = square root[ (0 - 1.1)*(0-1.1)  *  .05  +  (1 - 1.1)*(1-1.1)  *  .80  +  (2 - 1.1)*(2-1.1)  * .15  ] =  square root [0.19]

Game 1 has larger variation in the amounts returned.

(iv)  Game 1:   Pr ( positive winnings) = Pr(win 1) + Pr(win 2) = .10 + .50 = .60.
       Game 2:   Pr (positive winnings) =  Pr(win 1) + Pr(win 2)  =  .80 + .15 = .95
 

If I wanted to maximize my chance of winning "big" money, I'd try to play Game 1 because I have the best chance of getting 2 dollars.  On the other hand, there's a real good chance that I will lose my money in Game 1, since Pr(win 0) = .40.  Game 1 would be my best strategy if I wanted to try to accumulate money in as few plays as possible and I was willing to take some risks.   If I was more conservative, then I'm better off playing Game 2.  In Game 2, most times I will simply get back the dollar that I paid to play.   But, I am more likely to get 2 dollars than to get 0 dollars, so that if I play long enough I should get in the positive.
 

3.  Stock Returns

First, convert percentages to decimals, so that 20% is 0.20.  We'll make a box model with 100 tickets, 20 of which have .2, 30 of which have .5, and 50 of which have .3.

1). EV of returns   =  .20 * 0.2 + .50 * 0.3  + .30 * 0.5 = .34.

2). SD of returns =  square root [ (.20 - .34)^2 * .2  + (.50 - .34)^2 * .3    + (.30 - .34)^2 * .5 ]  = square root [0.01168].
 

4.  Hack-a-Shaq

1) For one regular shot, there are only two outcomes: miss or make.  If he makes it, he gets 2 points.  If he misses it, he gets 0 points.  Thus, the probabilities associated with each outcome for regular shots are:

Pr(0 points) = Pr(Shaq failed in making shot) = 1-57.2% = 42.8%,

Pr (2 points) = Pr(Shaq succeeded in making shot) = 57.2%.

So,  EV of points = 2 * .572 + 0 * .428 = 1.142.
 

2) For two foul shots, the points scored can equal 0, 1, or 2. And because the two foul shots are independent, the probabilities of each of these outcomes are:

Pr(0 points) = Pr(Shaq failed in both foul shots) = (1-.513) * (1-.513) = .237

Pr(1 point) = Pr(Shaq succeeded only one time)

= Pr(Shaq succeeded in the first time and failed in the second time) + Pr(Shaq failed in the first time and succeeded in the second time)

= .513 * (1-.513) + (1-.513) * .513 = .50.

Pr(2 points) = Pr(Shaq succeeded in both foul shots) = .513 * .513 = .263.

So, EV of points = 0 * .237 + 1 * .5 + 2 * .263 = 1.026.

3) The expected score of a regular shot for Shaq  is  1.144, and the expected score of two foul shots for Shaq is1.026.

Because 1.144 > 1.026,  it appears that Shaq is expected to make less points if the hack-a-Shaq is employed.  Thus, it looks like the hack-a-Shaq should be adopted.  (Although, it didn't work when tried by some teams.  Shaq made 13 foul shots in a row, foiling the hack-a-Shaq and helping the Lakers to win the NBA Championship.)

5.   Problem on the Central Limit Theorem

According to the problem,  the contents of bottles of cola follow a normal curve with mean 298 and standard deviation 3.

a)    We want the chance that a bottle has less than 295.   Standardizing, we get

z = (295-298)/3 = -1.

We want the area under the normal curve to the left of -1, which we get by taking half of (1 - .6827), which equals .1587.

b)  Since the amount of cola in each bottle is independent of amounts in other bottles, the distribution of the sample average is normal with mean 298 and SD = 3 / square root (6)  =  1.2247.

We want  the chance of getting a sample average less than 295.  Standardizing by the mean and SD of the sample average,  we get

z=(295-298)/1.2247 = -2.45.

We want the area to the left of -2.45, which we get by taking half of (1 - .9857), which is about .0072.  JMP-IN's value is more precise because the table is limited.

c)  We want the chance that the proportion of bottles out of 100 that have less than 295 ml is less than .10.

Since n=100 is large, we can use the Central Limit Theorem for the distribution of the sample proportion.  Thus, the sample proportion approximately follows a normal curve with mean equal to p and SD = square root [ p(1-p)/100],  where p = the probability that a bottle has less than 295 ml.   By part 1, p = .1587.   Hence,  the mean of the normal distribution for the sample proportion is .1587, and the SD is square root [ .1587(.8413)/100] = .0365.

Standardizing, we get

z = (.10 - .1587)/ .0365  = - 1.606
 

We want the area under the normal curve to the left of -1.606,  This equals .0541.

iv)  The 99th percentile is found by unwinding the value of the z statistic.  Let k = the value of milliliters that equals the 99th percentile for individual bottles.  Recall that individual bottles have mean 298 and standard deviation 3.

Then, looking at the normal table and interpolating between 2.3 and 2.4, we can determine that Pr( (k - 298)/3 < 2.33) = .99.

Solving for k, we get:

k = 298 + 3*2.33 = 305.

Hence, 99% of all bottles are less than 305 ml.