Statistics 103
  Probability and Statistical Inference
 

Instructions for lab 3


Lab Objective

To explore data with histograms and scatter plots.

Lab Procedures

Unit 1:  Let's go to the movies!

What are the characteristics of U.S. movies that make the most money?  Let's address this question with the data set movies.jmp.   It comprises data on the 216 top grossing movies of all time as of February 2000.  The file is in the JMP data files folder.  Click on File - Open, and select the folder  JMP In Data.  Select movies.jmp, then click on Open.

Or, even easier, download the data by clicking on this link:  movies.jmp

There are missing data in this file.  We'll ignore them for simplicity.  In general, when confronted with missing data, it is best to get the advice of a professional statistician before doing analyses.

Questions: 

Data Analysis Tip: The unit of measurement for the three monetary variables is not stated.  That's bad practice. Always include a description of the units somewhere on the file.  Based on knowledge of movie revenues, it is clear that that the unit of measurement is $1,000,000.    

1)  Describe the distribution of foreign grosses.  That is, say where most values are, note any outliers, and say whether the distribution is tightly packed around its mean or is spread out.  Also, report the mean and standard deviation.  

JMP Tip:  Make a histogram using Analyze-Distribution, then entering the variable name in the Y box.  You can make the histogram horizontal by clicking on the red arrow next to the variable name, selecting Display Options - Horizontal Layout.

2)  Which sentence best describes the distributions of domestic and foreign grosses?  You can just write the letter of your choice on the lab report.
Choice a)  Domestic and foreign grosses are very similar.
Choice b)  Domestic and foreign grosses have similar distributional shapes, but foreign grosses tend to be larger than domestic grosses.
Choice c)  Domestic and foreign grosses have similar distributional shapes, but domestic grosses tend to be larger than foreign grosses.
Choice d)  The two distributions look nothing like each other, because one has a long left tail and the other has a long right tail.

3)  What is the name of the movie that is the clear outlier on all three monetary variables?  

4)  We can examine the relationship between world-wide gross and movie type using a box plot.   To  get a box plot, go to Analyze - Fit Y by X.  Put in the continuous variable in the Y box and the character variable in the X box.  On the subsequent screen, select the red arrow next to "Oneway Analysis of ...."  Then, select Quantiles.   To clean up the graph, click the red arrow, and select Display Options to bring up various options for managing the display.  Answer the three questions below.

    a)  Out of Comedy and Family movies, which one has a distribution of world-wide grosses that is most similar to the distribution of world-wide grosses for Action movies?  Justify your choice in at most two sentences.
   
    b)  Compare the distributions for Drama movies and Mystery movies.  Do they have reasonably similar medians?  Is one more spread out than the other (if so, say which one)?

    c)  If you directed a movie and wanted to make lots of money worldwide, which type appears to give you the best chance of doing so?  Base your answer on the results of the box plot.

5)  Describe the relationship between domestic gross and foreign gross.  To make a scatter plot, use Analyze, then select Fit - Y by X.  Enter the continuous variable for the vertical axis in the Y box and the continuous variable for the horizontal axis in the X box.  Items to include in your description are the general trend of the relationship (e.g., positive and linear, negative and linear, some other pattern, no clear pattern) and whether there are any outliers or points that do not fit the pattern.

6)  Report the three pairwise correlations between Foreign, Domestic, and World-wide gross.  You can find all three simultaneously by selecting Analyze - Multivariate Methods - Multivariate, then entering in all three variables into the "Variables" box.    Do the correlations suggest strongly positive linear relationships, weakly positive linear relationships, no linear relationships, weakly negative linear relationships, or strongly negative linear relationships?

7)  Why are the correlations between Domestic and World-wide, and Foreign and World-wide, stronger than than the correlation between Domestic and Foreign?  The answer has to do with the definitions of the variables.

8)  Outliers can have a strong effect on correlations.  Let's check to see if excluding Titanic changes the correlations substantially.  To exclude Titanic, highlight the row number 1 (for Titanic), select Rows - Exclude/Include.   Now, re-calculate the correlations in (6). Did the correlations get stronger or weaker?   Does the substance of your conclusions in (6) change very much when excluding Titanic?  

Data Analysis Tip:   It is not acceptable to exclude outliers from analyses unless you have a scientific reason to do so (e.g., a data entry error, or maybe the outlying unit is not part of your target population).  Hiding outliers is fudging data to get results you want.  That is dishonest and unethical.  When you see outliers, do analyses with and without them.  When the results do not change much, report the results based on the full data  set, and tell your audience that the results were not sensitive to the outliers.  When the results do change substantially, report both sets of analyses: one with and one without the outliers.  This  honestly informs people that your conclusions are not on very solid ground, because particular data points affect the results greatly.  

Unit 2:  The Correlation Challenge.

On the "Statistics on the web" page of our course web site (click this link), go to the web game "Guess the correlations."   Play it at least three times against a classmate. Don't forget to talk trash if you win.  You don't have to write anything down for this part of the lab.  If you're feeling really cocky, challenge the TAs.  And, if you're feel like you need to be humbled, come challenge me (Prof. Reiter).  If you beat me at the correlation game, I will buy you lunch.  If I win, I will do some serious gloating.