Instructions for lab 5
Lab Objective
To gain experience with correlations and simple regressions.
Lab Procedures
Write answers to all questions on your lab sheet, and
turn in at the end of the lab period. This lab asks you to do
lots of things, so defintely start this one at home.
Unit 1: Predicting eruptions of Old Faithful
The geyser Old Faithful in Yellowstone National Park erupts
seemingly at random times. Or does it? Perhaps we can
predict when the next eruption will occur based on characteristics of
the previous eruption. Load in the data set geyser.jmp
by clicking on the link. It is comprised of two variables measured
on 21 eruptions of Old Faithful. The two variables include the duration
of the previous eruption in minutes (LAST) and the number of minutes in
between the current and previous eruptions (NEXT).
Questions:
1. Describe the distribution of waiting time until eruption
(NEXT). Specifically, answer the following four questions.
What is the value of a typical waiting time? What is the
value of a typical deviation from the average waiting time? Are
there any severe outliers in waiting time? Does a normal curve
describe the histogram of waiting time reasonably well (justify your
answer by referring to graphical displays)?
2. Is there a strong linear association between waiting time
and length of previous eruption? Provide one number that
summarizes the strength of the association, and give a brief
description of some relevant graph to justify that the number is an
appropriate summary.
3. What is the regression equation for predicting waiting time
until next eruption (Y) from length of previous eruption (X)?
To fit a regression model, go to Analyze - Fit Y by X. Select "NEXT" as the Y variable and "LAST" as the X variable. Once you see the scatter plot, go to the red arrow next to Bivariate Fit. Select Fit Line.
4. What is a typical value of the deviation of waiting time
from the predicted regression line?
5. Does the plot of residuals versus the predictor (LAST)
suggest any violations of the regression assumptions? Justify
your answer in at most two sentences.
To obtain the plot of residuals versus the predictor values, click
on the red arrow next to Linear Fit, which is just below the
scatter plot. Then, select Plot Residuals.
6. If the last eruption lasted 3.2 minutes, can you use the
regression equation to predict the wait until the next eruption?
If you think so, write down the estimated average wait until the
next eruption. If you think not, explain why not in at most one
sentence.
7. If the last eruption lasted 9.6 minutes, can you use the regression equation to predict the wait until next eruption? If you think so, write down the estimated average wait until the next eruption. If you think not, explain why not in at most one sentence.
Unit 2: Characteristics of mammals
Do mammals with bigger brains need more sleep? Does sleep vary by
the level of danger the animal lives in? To answer these
questions, Allison and Cicetti (1971) gathered information on 62
different mammals. Their data are in the file Sleeping
Animals.jmp. This data set is in the JMP data sets folder.
Click on File - Open, and select the folder JMP
In Data. Select Sleeping Animals.jmp, then
click on Open.
The variables in the data set include in column order:
a) species;
b) average body weight of species in kg;
c) average weight of brain of species in grams;
d) average number of daily hours of non-dreaming sleep for species;
e) average number of daily hours of dreaming sleep for species;
f) average number of daily hours of total sleep for species;
g) average life span of species in years;
h) average number of weeks in gestation period;
i) an index of predation (range from 1 - 5, with 1 = unlikely to be
preyed upon and 5 = likely to be preyed upon);
j) an index of exposure (range from 1 - 5, with 1 = sleeps in a
well-protected den and 5 = worst exposure);
k) an index of overall danger based on a variety of factors (range from
1 - 5, with 1 = least danger from other animals and 5 = most danger
from other animals).
Again, there are missing data in the file. We'll ignore them
for simplicity, although that is not the approach I recommend.
Questions:
8. Using all data points, make a scatter plot of the relationship
between total sleep (Y-axis) and brain weight (X-axis). As
you'll see, it's hard to detect any patterns in this plot because the
horizontal axis gets stretched out so far by the heavy brain animals.
Instead, let's fit a scatter plot using only the animals with
brains weighing less than 1000 grams. (JMP Hint: You'll
have to exclude the appropriate rows.)
Using this scatter plot, describe the relationship between
total sleep and brain weight. Items to include in your
description are the general trend of the relationship (e.g.,
positive and linear, negative and linear, some other pattern, no
clear pattern) and whether there are any outliers or points that
do not fit the pattern. Also, don't forget to
mention the mammals you excluded: do they generally follow the same
trends as those in the graph?
9. When computed using all the data, does the correlation between
total sleep and brain weight meaningfully summarize the relationship
between these two variables? Explain in at most two sentences.
10. Using all data points, describe the relationship between
total sleep and the danger index. Fit a regression line to help
your interpretations.
11. For the regression of sleep versus danger index, what is a
typical deviation around the regression line?
12. For a mammal that has a danger index of 2, what is the
estimated average total sleep?
13. Plot the residuals versus the danger index. Does the
plot of residuals versus the predictor suggest any violations of the
regression assumptions? Justify your answer in at most two
sentences.