Statistics 101
Data Analysis and Statistical Inference
 

Instructions for lab 5


Lab Objective

The purpose of the lab is to help you pull together what you have learned about univariate and bivariate graphical and numerical summaries in the context of a case study.  The lab will also demonstrate that in these four weeks you have acquired most of the statistical tools that form the basis of a published scholarly work.

Lab Procedures


Before coming to lab,  read the paper  by Landrigan  et al. (1975) "Neuropsychological dysfunction in children with chronic low-level lead absorption".  The Lancet,  March 29, pp. 708--712.  The  Lancet  is one of the leading journals in medical science.  I recommend you start this lab before lab period to complete it all.

In this lab, you will have access to the data presented in this paper.  While this paper was published in 1975, it still has an impact today.  The topic of lead exposure in children remains under investigation and new research results appear in the news almost every month. The topic is studied by interdisciplinary teams of medical personnel, epidemiologists, social scientists, environmentalists, and policy makers.

Open the data file lead.jmp by clicking on this link.  A description of all the variables names in the data set (often called a Code Book) can be found at the end of this lab.

The variable in the data set for the blood group has three categories.  The researchers suggest that two categories (below 40mg and above 40mg) are adequate.  So, let's make a new variable that recodes the group variable to low ("L") and high ("H").    To do this go to, Columns-New Column.  Give the new column the name "group.recode".  Select Data Type-Character to tell JMP that we're inputing names.  Now, click on New Property-Formula, then Edit Formula.   When the formula box pops open, highlight "group" from the Table Columns list.  Next, holding down the Shift Key, select from the Functions box the option Conditional and then Match .  A list of the current response choices (groups 1, 2, 3) is listed. Replace the [then clause] as follows:

Match(group){1              "L"
                       2              "H"
                       3              "H"
                       else              }

Make sure to include quotes around the letters.  Just enter a space in the else  condition.  This replaces the 1s with Ls and the 2s and 3s with Hs.  After you click OK, you should have a variable for high and low groups.

Questions:


1. This question is based on your reading of the article. You don't need to use JMP for Question 1.
  a)  What are the experimental units (the subjects)?
  b)  What are the three treatment conditions?
  c)   Is this an observational study or a randomized experiment?
  d)  Are there background characteristics (not treatment or response variables) that differ substantially in the two groups?  If so, state which one(s).
  e)  Why is it important to have treated and control groups with similar socioeconomic statuses?

2a ) The main analysis compares mean performance IQ scores (W.I.S.C. + W.P.P.S.I.) for the high lead and low lead groups.  Let's make sure we get the same results when we analyze the data.   Do the means and standard deviations for performance IQs (see code book at end of lab for the variable name) for the high and low groups match the means and SDs reported in the paper? Your answer need only be a short sentence saying whether or not your calculated values equal (within rounding error) their calculated values.  JMP Tip:  Put "group.recode" in the By box to separate the data based on "group.recode".

2b)  In any analysis, it is important to check whether the means and SDs are strongly influenced by individual data points.  The low lead group has three outliers on performance IQ, and the high lead group has one outlier on performance IQ.    Exclude these four observations and compare the means and SDs to those in part 2a.  Are any of the changes big enough that the authors should have mentioned the effect of the outliers in their article?  Report the two new means and SDs as part of your answer, as well as a three-sentence-maximum explanation.

3.   A comparison of means and standard deviations might be inadequate.  For example, suppose one group has a right-skewed distribution, and the other group has a left-skewed distribution.  Just reporting means and standard deviations does not inform the reader about such structure. Compare the distributions of performance IQ of the high and low lead groups.  Describe any differences between the two groups' distributions of performance IQ, e.g., compare locations of most of the data, the spreads of the distributions, and whether there are outliers.  Write at most three sentences.

4.  The authors chose to categorize blood lead level rather than use it as a continuous variable.  Is there a strong linear relationship between performance IQ and the blood level in 1972 measured on a continuous scale?  

Data Analysis Tip:    Researchers sometimes categorize continuous variables to simplify analyses.   However, when there are strong linear relationships, categorization sacrifices information and can lead to inaccurate results.   Implicitly, disecting blood levels at 40mg assumes that the  average performance IQ of all kids in the population with blood levels below 40 equals some constant, i.e., their average performance IQ does not depend on the actual blood levels.   When categorizing, be sure to have a valid scientific rationale for choosing the end points of the categories.

5.  Older kids typically have faster reflexes than younger kids.  Hence, when comparing finger-wrist tapping speeds for the high and low lead groups, we want to make sure the two groups have similar distributions of ages.   Age is in funky units (e.g., 1011 means 10 years and 11 months), so I created a new variable with age in months.  This is Age mo, which is located in the last column of the data set.  Compare the distributions of age in months for the high and low lead groups.   Do your findings suggest that a comparison of the groups' average finger-wrist tapping speeds could be confounded by age?  Explain in at most three sentences.


6. One of the key analyses is a regression of finger-wrist tapping speed on age in months (bottom right corner of page 710).   Let's see if we can replicate their regression, as well as check its validity.   Use the Analyze--Fit Y by X, putting finger-wrist tapping right in Y (be sure to use the correct variable) and the Age mo in X.  In addition, put group.recodein the By box.  This results in separate regressions for both groups.    

a)  Do your regression equations match (within rounding) the equations on the bottom of page 710?  A simple sentence saying they match or do not match will suffice.

b)  Compare the typical deviations from the regression line for the two regressions.  Are they of roughly similar magnitude?  (For purposes of this question, compare the ratio of the two values.  If that ratio is near 1, i.e. less than 1.5, say they are of similar magnitude.)

c)  Examine the plots of residuals versus the predictors.  Are there any patterns that cause you to worry about the validity of the regression assumptions, or do the models do a reasonable job of fitting the data?   Justify your answer in at most three sentences.


7.  For this question, assume any patterns you noticed in Part 6 result by random chance, so that the regression model reasonably fits the data.  On page 711,  Landrigan et al. state, "To adjust these data for age, a regression of dominant-hand finger-wrist tap data against age was plotted for each group (see figure); the slopes of the resulting lines are nearly parallel."

a)  Do your regressions verify that the lines are nearly parallel?  Justify your answer by reporting the intercept and slope for each line given by JMP.

b)  What is the difference in predicted average tapping speed between a kid in the low lead group and a kid of the same age in the high lead group?  Assume the kids in question have ages within the range of ages in the data.

c)  If the lines were not parallel, would your answers to part 7b change?   Explain.

d)  The difference in the average tapping speeds reported on the top of page 711 equals 7.7  (54.1 - 46.1=7.7).  Is this difference adjusted for age?  Explain why or why not.




Code Book for lead.jmp .

ID :   person ID number

AREA:  Residence on Aug. 1972

1= 0-1 miles from smelter

2= 1-2.5 miles

3= 2.5-4.1 miles

AGE :       1011=10 years, 11 months

SEX      1=male  2=female

IQ TEST RESULTS

INFO     - information subtest in  WISC and WPPSI

COMP   - comprehension subtest in WISC and WPPSI

AR         - arithmetic subtest in WISC and WPPSI

DS          - digit span subtest(WISC) and sentence completion(WPPSI)

V/RAW - raw score/verbal IQ

PC          - picture completion subtest in WISC and WPPSI

BD         - block design subtest in WISC and WPPSI

OA         - object assembly subtest(WISC), animal house subtest(WPPSI)

COD      - coding subtest(WISC), geometric design subtest(WPPSI)

P/RAW - raw/score performance subtest

HH/INDEX - Hollingshead index of social status

IQV        - verbal IQ

IQP        - performance IQ

IQF        - full scale IQ (not sum or average of IQV and IQP)

TYPE OF IQ TEST  1=WISC  (usually given to children GT 5 years) 2=WPPSI (usually given to children LE 5 years of age)


GROUP – Blood lead level group

1= blood lead levels below 40 micrograms/100ML in both 1972/1973

2= blood lead levels GE to 40 micrograms/100ML in both 1972/1973 or GE 40 micrograms/100ML in 1973 alone (3 cases only)

3= blood lead levels GE to 40 micrograms/100ML in 1972 and LT 40 in 1973

LD72 - blood lead values in 1972 (micrograms/100ML) MISSING=99

LD73 - blood lead values in 1973 (micrograms/100ML)

FST2YRS - did child live for 1st 2 years within 1 mile of smelter

TOTYRS - total number of years spent within 4.1 miles of smelter


SYMPTOM DATA (AS REPORTED BY PARENTS) 1=Yes, 2=No

PICA  

COLIC 

CLUMSINESS 

IRRITABILITY

CONVULSIONS 

NEUROLOGICAL TEST DATA     Note:  MISSING DATA ( -1 or 99).

TAPS/RIGHT -  # of taps for right hand in the 2-plate tapping test (#taps in one 10 second trial)

TAPS/LEFT-    # of taps for left hand in the 2-plate tapping test (#taps in one 10 second trial)

REACTION/RIGHT-  visual reaction time right hand (milliseconds)

REACTION/LEFT-  visual reaction time left hand (milliseconds)

AUDITORY/RIGHT-  auditory reaction time right hand (milliseconds)

AUDITORY/LEFT-  auditory reaction time left hand (milliseconds)

FINGER/RIGHT-  finger-wrist tapping test right hand (taps in one 10 second trial)

FINGER/LEFT-    finger-wrist tapping test left hand (taps in one 10 second trial)

WWPS - Werry-Weiss-Peters Scale for hyperactivity

0=no activity . . . . 4=severely hyperactive (parent reports)