Statistics 103
  Probability and Statistical Inference
 

Instructions for lab 5


Lab Objective

The purpose of the lab is to help you pull together what you have learned about univariate and bivariate graphical and numerical summaries in the context of a case study.  The lab will also demonstrate that in these four weeks you have acquired most of the statistical tools that form the basis of a published scholarly work.

Lab Procedures


Before coming to lab,  read the paper  by Landrigan  et al. (1975) "Neuropsychological dysfunction in children with chronic low-level lead absorption".  The Lancet,  March 29, pp. 708--712.  The  Lancet  is one of the leading journals in medical science.  I recommend you start this lab before lab period to complete it all.

In this lab, you will have access to the data presented in this paper.  While this paper was published in 1975, it still has an impact today.  The topic of lead exposure in children remains under investigation and new research results appear in the news almost every month. The topic is studied by interdisciplinary teams of medical personnel, epidemiologists, social scientists, environmentalists, and policy makers.

Open the data file lead.jmp by clicking on this link.  A description of all the variables names in the data set (often called a Code Book) can be found at the end of this lab.

The variable in the data set for the blood group has three categories.  The researchers suggest that two categories (below 40mg and above 40mg) are adequate.  So, let's make a new variable that recodes the group variable to low ("L") and high ("H").    To do this go to, Columns-New Column.  Give the new column the name "group.recode".  Select Data Type-Character to tell JMP that we're inputing names.  Now, click on New Property-Formula, then Edit Formula.   When the formula box pops open, highlight "group" from the Table Columns list.  Next, holding down the Shift Key, select from the Functions box the option Conditional and then Match .  A list of the current response choices (groups 1, 2, 3) is listed. Replace the [then clause] as follows:

Match(group){1              "L"
                       2              "H"
                       3              "H"
                       else              }

Make sure to include quotes around the letters.  Just enter a space in the else  condition.  This replaces the 1s with Ls and the 2s and 3s with Hs.  After you click OK, you should have a variable for high and low groups.

Questions:


1. This question is based on your reading of the article. You don't need to use JMP for Question 1.
  a)  What are the experimental units (the subjects)?
  b)   What is the treatment variable?  What is the name of one of the response variables?
  c)   Is this an observational study or a randomized experiment?

(Note: most of the background characteristics in the Table 1 are pretty similar in the lead and control groups, except for age.  A one year difference could have a large impact on mental and physical abilities for children of young ages.)
 
2a ) NOT HANDED IN:  The main analysis compares mean performance IQ scores (W.I.S.C. + W.P.P.S.I.) for the high lead and low lead groups.  Let's make sure we get the same results when we analyze the data.   Do the means and standard deviations for performance IQs (see code book at end of lab for the variable name) for the high and low groups match the means and SDs reported in the paper?  JMP Tip:  Put "group.recode" in the By box to separate the data based on "group.recode".

2b)  HANDED IN:  In any analysis, it is important to check whether the means and SDs are strongly influenced by individual data points.  The low lead group has three outliers on performance IQ, and the high lead group has one outlier on performance IQ.    Exclude these four observations and compare the means and SDs to those in part 2a.  Are any of the changes big enough that the authors should have mentioned the effect of the outliers in their article?  Report the two new means and SDs as part of your answer, as well as a three-sentence-maximum explanation.

3.    Perform computations for Question 3 and all later questions with all data points; don't exclude anything.   A comparison of means and standard deviations might be inadequate.  For example, suppose one group has a right-skewed distribution, and the other group has a left-skewed distribution.  Just reporting means and standard deviations does not inform the reader about such structure. Compare the distributions of performance IQ of the high and low lead groups.  Describe any differences between the two groups' distributions of performance IQ, e.g., compare locations of most of the data, the spreads of the distributions, and whether there are outliers.  Write at most three sentences.  Reminder: Box plots are useful for side-by-side comparisons.

4.  The authors chose to categorize blood lead level rather than use it as a continuous variable.  Is there a strong linear relationship between performance IQ and the blood level in 1972 measured on a continuous scale?  

Data Analysis Tip:    Researchers sometimes categorize continuous variables to simplify analyses.   However, when there are strong linear relationships, categorization sacrifices information and can lead to inaccurate results.   Implicitly, disecting blood levels at 40mg assumes that the  average performance IQ of all kids in the population with blood levels below 40 equals some constant, i.e., their average performance IQ does not depend on the actual blood levels.   When categorizing, be sure to have a valid scientific rationale for choosing the end points of the categories.

5.  Older kids typically have faster reflexes than younger kids.  Hence, when comparing finger-wrist tapping speeds for the high and low lead groups, we want to make sure the two groups have similar distributions of ages.   Age is in funky units (e.g., 1011 means 10 years and 11 months), so I created a new variable with age in months.  This is Age mo, which is located in the last column of the data set.  Compare the distributions of age in months for the high and low lead groups.   Based on your comparisons, could the groups' average finger-wrist tapping speeds (or some other outcome variable) reflect effects of age differences?  Explain in at most three sentences.


6. One of the key analyses is a regression of finger-wrist tapping speed on age in months (bottom right corner of page 710).   Let's replicate their regression and check its validity.   Use the Analyze--Fit Y by X, putting finger-wrist tapping right in Y (be sure to use the correct variable) and the Age mo in X.  In addition, put group.recodein the By box.  This results in separate regressions for both groups.    

a)  NOT HANDED IN.  You should get very similar results as those in the paper.  If not, you did something wrong!

b)  HANDED IN.  Examine the scatter plot.  Are there any patterns (e.g., curvilinear relationships) that cause you to worry about the validity of the regression lines as a way to summarize the trends in the data?  Or, do the regression lines do a reasonable job of fitting the data?   Justify your answer in at most three sentences.

7.  For this question, assume any patterns you noticed in Part 6 result by random chance, so that the regression model reasonably fits the data.  On page 711,  Landrigan et al. state, "To adjust these data for age, a regression of dominant-hand finger-wrist tap data against age was plotted for each group (see figure); the slopes of the resulting lines are nearly parallel."

a)  NOT HANDED IN.  Verify that the regression lines are nearly parallel.  Parallel regression lines have the same slopes with possibly different intercepts.

b)  HANDED IN:  Since the lines are parallel, what is the difference in predicted average tapping speed between a kid in the low lead group and a kid of the same age in the high lead group?  Assume the kids in question have ages within the range of ages in the data.

c)  HANDED IN:  If the lines were not parallel, describe (in two sentences) how the answer you got in part b might not be correct for all ages.



Code Book for lead.jmp .

ID :   person ID number

AREA:  Residence on Aug. 1972

1= 0-1 miles from smelter

2= 1-2.5 miles

3= 2.5-4.1 miles

AGE :       1011=10 years, 11 months

SEX      1=male  2=female

IQ TEST RESULTS

INFO     - information subtest in  WISC and WPPSI

COMP   - comprehension subtest in WISC and WPPSI

AR         - arithmetic subtest in WISC and WPPSI

DS          - digit span subtest(WISC) and sentence completion(WPPSI)

V/RAW - raw score/verbal IQ

PC          - picture completion subtest in WISC and WPPSI

BD         - block design subtest in WISC and WPPSI

OA         - object assembly subtest(WISC), animal house subtest(WPPSI)

COD      - coding subtest(WISC), geometric design subtest(WPPSI)

P/RAW - raw/score performance subtest

HH/INDEX - Hollingshead index of social status

IQV        - verbal IQ

IQP        - performance IQ

IQF        - full scale IQ (not sum or average of IQV and IQP)

TYPE OF IQ TEST  1=WISC  (usually given to children GT 5 years) 2=WPPSI (usually given to children LE 5 years of age)


GROUP – Blood lead level group

1= blood lead levels below 40 micrograms/100ML in both 1972/1973

2= blood lead levels GE to 40 micrograms/100ML in both 1972/1973 or GE 40 micrograms/100ML in 1973 alone (3 cases only)

3= blood lead levels GE to 40 micrograms/100ML in 1972 and LT 40 in 1973

LD72 - blood lead values in 1972 (micrograms/100ML) MISSING=99

LD73 - blood lead values in 1973 (micrograms/100ML)

FST2YRS - did child live for 1st 2 years within 1 mile of smelter

TOTYRS - total number of years spent within 4.1 miles of smelter


SYMPTOM DATA (AS REPORTED BY PARENTS) 1=Yes, 2=No

PICA  

COLIC 

CLUMSINESS 

IRRITABILITY

CONVULSIONS 

NEUROLOGICAL TEST DATA     Note:  MISSING DATA ( -1 or 99).

TAPS/RIGHT -  # of taps for right hand in the 2-plate tapping test (#taps in one 10 second trial)

TAPS/LEFT-    # of taps for left hand in the 2-plate tapping test (#taps in one 10 second trial)

REACTION/RIGHT-  visual reaction time right hand (milliseconds)

REACTION/LEFT-  visual reaction time left hand (milliseconds)

AUDITORY/RIGHT-  auditory reaction time right hand (milliseconds)

AUDITORY/LEFT-  auditory reaction time left hand (milliseconds)

FINGER/RIGHT-  finger-wrist tapping test right hand (taps in one 10 second trial)

FINGER/LEFT-    finger-wrist tapping test left hand (taps in one 10 second trial)

WWPS - Werry-Weiss-Peters Scale for hyperactivity

0=no activity . . . . 4=severely hyperactive (parent reports)