Statistics 101
Data Analysis and Statistical
Inference
Instructions for lab 5
Lab Objective
The purpose of the lab is to help you pull together what you have
learned about univariate and bivariate graphical and numerical summaries
in the context of a case study. The lab will also demonstrate
that in these four weeks you have acquired most of the statistical
tools that form the basis of a published scholarly work.
Lab Procedures
Before coming to lab, read the paper by Landrigan et
al. (1975) "Neuropsychological dysfunction in children with
chronic low-level lead absorption". The Lancet, March
29, pp. 708--712. The Lancet is one of the
leading journals in medical science. I recommend you start this
lab before lab period to complete it all.
In this lab, you will have access to the data presented in this
paper. While this paper was published in 1975, it still has an
impact today. The topic of lead exposure in children remains
under investigation and new research results appear in the news almost
every month. The topic is studied by interdisciplinary teams of medical
personnel, epidemiologists, social scientists, environmentalists, and
policy makers.
Open the data file lead.jmp by
clicking on this link. A description of all the variables names
in the data set (often called a Code Book) can be found at the end of
this lab.
The variable in the data set for the blood group has three categories.
The researchers suggest that two categories (below 40mg and above
40mg) are adequate. So, let's make a new variable that recodes the
group variable to low ("L") and high ("H"). To do this go
to, Columns-New Column. Give the new column the name
"group.recode". Select Data Type-Character to tell JMP that
we're inputing names. Now, click on New Property-Formula,
then Edit Formula. When the formula box pops open,
highlight "group" from the Table Columns list. Next,
holding down the Shift Key, select from the Functions box the
option Conditional and then Match . A list of
the current response choices (groups 1, 2, 3) is listed. Replace the [then
clause] as follows:
Match(group){1
"L"
2
"H"
3
"H"
else
}
Make sure to include quotes around the letters. Just enter a
space in the else condition.
This replaces the 1s with Ls and the 2s and 3s with Hs. After you
click OK, you should have a variable for high and low groups.
Questions:
1. This question is based on your reading of the article. You don't
need to use JMP for Question 1.
a) What are the experimental units (the subjects)?
b) What are the three treatment conditions?
c) Is this an observational study or a randomized
experiment?
d) Are there background characteristics (not
treatment or response variables) that differ substantially in the two
groups? If so, state which one(s).
e) Why is it important to have treated and control groups
with similar socioeconomic statuses?
2a ) The main analysis compares mean performance IQ scores (W.I.S.C. +
W.P.P.S.I.) for the high lead and low lead groups. Let's make sure
we get the same results when we analyze the data. Do the means
and standard deviations for performance IQs (see code book at end
of lab for the variable name) for the high and low groups match the
means and SDs reported in the paper? Your answer need only be a short
sentence saying whether or not your calculated values equal (within
rounding error) their calculated values. JMP Tip: Put
"group.recode" in the By box to separate the data based on
"group.recode".
2b) In any analysis, it is important to check whether the means
and SDs are strongly influenced by individual data points. The low
lead group has three outliers on performance IQ, and the high lead group
has one outlier on performance IQ. Exclude these four
observations and compare the means and SDs to those in part 2a.
Are any of the changes big enough that the authors should have
mentioned the effect of the outliers in their article? Report the
two new means and SDs as part of your answer, as well as a
three-sentence-maximum explanation.
3. A comparison of means and standard deviations might be
inadequate. For example, suppose one group has a right-skewed
distribution, and the other group has a left-skewed distribution.
Just reporting means and standard deviations does not inform the
reader about such structure. Compare the distributions of performance
IQ of the high and low lead groups. Describe any differences
between the two groups' distributions of performance IQ, e.g.,
compare locations of most of the data, the spreads of the distributions,
and whether there are outliers. Write at most three sentences.
4. The authors chose to categorize blood lead level rather than
use it as a continuous variable. Is there a strong linear
relationship between performance IQ and the blood level in 1972 measured
on a continuous scale?
Data Analysis Tip: Researchers sometimes categorize continuous
variables to simplify analyses. However, when there are strong
linear relationships, categorization sacrifices information and can
lead to inaccurate results. Implicitly, disecting blood levels
at 40mg assumes that the average performance IQ of all kids in
the population with blood levels below 40 equals some constant, i.e.,
their average performance IQ does not depend on the actual blood
levels. When categorizing, be sure to have a valid scientific
rationale for choosing the end points of the categories.
5. Older kids typically have faster reflexes than younger kids.
Hence, when comparing finger-wrist tapping speeds for the high and
low lead groups, we want to make sure the two groups have similar
distributions of ages. Age is in funky units (e.g., 1011
means 10 years and 11 months), so I created a new variable with age in
months. This is Age mo, which is located in the last
column of the data set. Compare the distributions of age in
months for the high and low lead groups. Do your findings suggest
that a comparison of the groups' average finger-wrist tapping speeds
could be confounded by age? Explain in at most three sentences.
6. One of the key analyses is a regression of finger-wrist tapping
speed on age in months (bottom right corner of page 710). Let's
see if we can replicate their regression, as well as check its
validity. Use the Analyze--Fit Y by X, putting
finger-wrist tapping right in Y (be sure to use the correct
variable) and the Age mo in X. In addition, put group.recodein
the By box. This results in separate regressions for
both groups.
a) Do your regression equations match (within rounding) the
equations on the bottom of page 710? A simple sentence saying they
match or do not match will suffice.
b) Compare the typical deviations from the regression line for
the two regressions. Are they of roughly similar magnitude?
(For purposes of this question, compare the ratio of the two
values. If that ratio is near 1, i.e. less than 1.5, say they are
of similar magnitude.)
c) Examine the plots of residuals versus the predictors.
Are there any patterns that cause you to worry about the validity
of the regression assumptions, or do the models do a reasonable job of
fitting the data? Justify your answer in at most three
sentences.
7. For this question, assume any patterns you noticed in Part 6
result by random chance, so that the regression model reasonably fits
the data. On page 711, Landrigan et al. state, "To
adjust these data for age, a regression of dominant-hand finger-wrist
tap data against age was plotted for each group (see figure); the
slopes of the resulting lines are nearly parallel."
a) Do your regressions verify that the lines are nearly parallel?
Justify your answer by reporting the intercept and slope for each
line given by JMP.
b) What is the difference in predicted average tapping speed
between a kid in the low lead group and a kid of the same age in the
high lead group? Assume the kids in question have ages within
the range of ages in the data.
c) If the lines were not parallel, would your answers to part 7b
change? Explain.
d) The difference in the average tapping speeds reported on the
top of page 711 equals 7.7 (54.1 - 46.1=7.7). Is this
difference adjusted for age? Explain why or why not.
Code Book for lead.jmp .
ID
: person ID number
AREA: Residence on Aug. 1972
1= 0-1 miles from smelter
2= 1-2.5 miles
3= 2.5-4.1 miles
AGE
: 1011=10 years,
11 months
SEX: 1=male 2=female
IQ
TEST RESULTS
INFO - information subtest in WISC and WPPSI
COMP - comprehension subtest in WISC and WPPSI
AR -
arithmetic subtest in WISC and WPPSI
DS -
digit span subtest(WISC) and sentence
completion(WPPSI)
V/RAW
- raw score/verbal IQ
PC -
picture completion subtest in WISC and WPPSI
BD -
block design subtest in WISC and WPPSI
OA -
object assembly subtest(WISC), animal house
subtest(WPPSI)
COD - coding subtest(WISC), geometric design subtest(WPPSI)
P/RAW -
raw/score performance subtest
HH/INDEX - Hollingshead index of social status
IQV - verbal IQ
IQP -
performance IQ
IQF -
full scale IQ (not sum or average of IQV and IQP)
TYPE OF IQ TEST 1=WISC (usually given
to children GT 5 years) 2=WPPSI (usually given to children LE 5 years
of age)
GROUP – Blood lead level group
1=
blood lead levels below 40 micrograms/100ML in both 1972/1973
2=
blood lead levels GE to 40 micrograms/100ML in both 1972/1973 or GE 40
micrograms/100ML in 1973 alone (3 cases only)
3= blood lead levels GE to 40
micrograms/100ML in 1972 and LT 40 in 1973
LD72
- blood lead values in 1972 (micrograms/100ML) MISSING=99
LD73
- blood lead values in 1973 (micrograms/100ML)
FST2YRS
- did child live for 1st 2 years within 1 mile of smelter
TOTYRS - total number of years
spent within 4.1 miles of smelter
SYMPTOM DATA (AS REPORTED BY PARENTS) 1=Yes, 2=No
PICA
COLIC
CLUMSINESS
IRRITABILITY
CONVULSIONS
NEUROLOGICAL
TEST DATA
Note: MISSING DATA ( -1 or 99).
TAPS/RIGHT
- # of taps for right hand in the 2-plate tapping test (#taps in
one 10 second trial)
TAPS/LEFT-
# of taps for left hand in the 2-plate tapping test (#taps
in one 10 second trial)
REACTION/RIGHT- visual reaction time right hand
(milliseconds)
REACTION/LEFT- visual reaction time left hand
(milliseconds)
AUDITORY/RIGHT- auditory reaction time right hand
(milliseconds)
AUDITORY/LEFT- auditory reaction time left hand
(milliseconds)
FINGER/RIGHT-
finger-wrist tapping test right hand (taps
in one 10 second trial)
FINGER/LEFT-
finger-wrist tapping test left hand (taps
in one 10 second trial)
WWPS
- Werry-Weiss-Peters Scale for
hyperactivity
0=no activity . . . .
4=severely hyperactive (parent reports)