Instructions for lab 4
Lab Objective
To gain experience with correlations and simple regressions.
Lab Procedures
This lab has several parts, so I recommend you start
this one at home. You should work with the applets in lab so
that you can get the TAs' feedback.
Unit 1: Predicting eruptions of Old Faithful
The geyser Old Faithful in Yellowstone National Park erupts
seemingly at random times. Or does it? Perhaps we can
predict when the next eruption will occur based on characteristics of
the previous eruption. Load in the data set geyser.jmp
by clicking on the link. It is comprised of two variables measured
on 21 eruptions of Old Faithful. The two variables include the duration
of the previous eruption in minutes (LAST) and the number of minutes in
between the current and previous eruptions (NEXT).
Questions:
1. Describe the distribution of waiting time until eruption
(NEXT). Specifically, answer the following four questions.
What is the value of a typical waiting time? What is the
value of a typical deviation from the average waiting time? Are
there any severe outliers in waiting time? Does a normal curve
describe the histogram of waiting time reasonably well (justify your
answer by referring to graphical displays)?
2. Is there a strong linear association between waiting time
and length of previous eruption? Provide one number that
summarizes the strength of the association, and give a brief
description of some relevant graph to justify that the number is an
appropriate summary.
3. What is the regression equation for predicting waiting time
until next eruption (Y) from length of previous eruption (X)?
To fit a regression line, go to Analyze - Fit Y by X. Select "NEXT" as the Y variable and "LAST" as the X variable. Once you see the scatter plot, go to the red arrow next to Bivariate Fit. Select Fit Line.
4. What is a typical value of the deviation of waiting time
from the predicted regression line?
5. Does the plot of residuals versus the predictor (LAST)
suggest any violations of the regression assumptions? Justify
your answer in at most two sentences.
To obtain the plot of residuals versus the predictor values, click
on the red arrow next to Linear Fit, which is just below the
scatter plot. Then, select Plot Residuals.
6. If the last eruption lasted 3.2 minutes, can you use the
regression equation to predict the wait until the next eruption?
If you think so, write down the estimated average wait until the
next eruption. If you think not, explain why not in at most one
sentence.
7. If the last eruption lasted 9.6 minutes, can you use the regression equation to predict the wait until next eruption? If you think so, write down the estimated average wait until the next eruption. If you think not, explain why not in at most one sentence.
Unit 2: Understanding
regression better with applets
Applet 1: Drawing regression lines by
eye
As discussed in class, the regression line is the line that yields
the smallest sum of the squared residuals. Just what does that
mean, exactly? Let's use some applets to illustrate this concept.
On the "Statistics
on the web" (click on this link) page, open the applet "Draw your
best guess at a regression line." Read the instructions on the
page, then hit the "Begin" button to try it out. Your goal is to
try to make the "MSE," which stands for mean square error and is the
typical deviation around the regression line, as small as possible.
In addition to playing with the applet, try the following. You
don't have to write anything on the lab report for this part, but
you'll use the ideas underpinning this applet on exams.
a) Draw a line that
clearly does not fit the data at all. Notice that the MSE is
relatively large.
b) Hit "Show minimum MSE" to get the value of the mean square
error for the actual regression line. Using that, keep adding more lines
to get as close as possible to the minimum MSE. Hint: Outliers in the horizontal and
vertical direction can pull the line towards them.
c) Compete against classmates or the TAs to see who gets the
closest line. Try to diagnose where your lines went wrong by
comparing the actual line to the ones you fit.
Applet 2: Seeing the effect of individual points on the regression line.
On the "Statistics
on the web" (click on this link) page, open the applet "See how
individual points can affect regression lines." Read the
instructions on the page, and see how placing various points affects
where the best fitting line is located.
Try the following. You don't have to write anything on the lab
report for this part, but again, these concepts are important in
statistics and so are likely to appear on exams.
d) Add a point to
make the slope decrease.
e) Add a point to make
the slope increase.
f) Add a point to make
the slope remain roughly the same but increase the intercept.
g) Add a point to make
the slope and intercept remain roughly the same.