STA113 Probability and Statistics in Engineering
Section 001, #4847
Spring 2004
- Instructor:
- I. H. Dinwoodie, 219 Old Chemistry Building
- Office Hours:
- Th: 3:30 - 4:30
- Fr: 1:30 - 3:00
-
- Lecture:
- Tu-Th 2:15 - 3:30, Social Sciences 136
- Assistants:
- Natesh Pillai, Margaret Polinkovsky, Abel Rodriguez
- Discussion Hours:
- Section 01, #4858: F 11:50-12:40, Teer 106
- Section 02, #5178: F 1:10-2:00, Teer 106
- Section 03, #6464: F 10:30-11:20, Teer 106
- TA schedule at the new SECC
- Text:
- Jay Devore,
Probability and Statistics for Engineering and the Sciences, 6th Ed. , Duxbury.
- Final Exam:
- Monday April 26, 2-5 P.M. in Physics 114
( annotated answers)
- Midterm Exams:
- Tuesday, February 17, through Chapter 3 (including selections from Chapter 12) (answers)
- Tuesday, March 30, Chapters 4-6 (annotated answers, answers)(mean: 19.7/28, 6 perfect scores)
- Prerequisites:
- Math 103
- Grading:
- The final grade will be based on the final exam (30%), two midterm exams
(20% each),
quizzes (15%) and projects (15%).
- A calculator will be necessary for exams and quizzes. Some limits will be
placed on the type of calculator.
- There will be two projects (1,
2) involving simulation and
modelling. We will use Matlab with its Statistics Toolbox on godzilla.acpub.duke.edu or
craven2.acpub.duke.edu (usually faster), or any other installation.
Its free counterpart Octave is not bad but lacks some
essential statistical functions in the Toolbox.
The Matlab functions can be examined with "type", but many of them will
not run in Octave. Matlab's strong points are its programming language and computational power.
The free statistics software R has some advantages, but
I will only spend time in class on Matlab.
-
- Documents:
- Homework Problems:
- Data sets for homework are on the web, I'll tell you where.
- Ch1: 19, 25, 33, 41, 43, 50, 59, 82a (histp.m for probability histograms)
- Ch12 (12.1, 12.2 only): 3 (get the estimates and correlation), 16abc
- Selected Matlab commands:
- help, quit, diary, !, who, whos, more on,
- load, tdfread, textread, save, reshape
- Getting tabular data into matlab is a bit harder than with some other statistics
software. "tdfread" is pretty good, but for a space-separated file you need to
specify the delimiter--tdfread('data.txt', ' '). It also expects a line of column labels
at the top.
- median, mean, std, prctile (what formula do they use?), hist, histfit, boxplot, corrcoef, cdfplot
- plot, polyfit, regress, lsline, hold
- tabulate, diff, cumsum, ./
- Ch2: 1, 3, 6, 12, 18, 21, 26, 33, 40, 42, 43, 45, 46, 52, 60, 64, 69, 78, 82, 95
- Ch3: 6, 8, 11, 12, 16, 28, 33, 36, 44acd, 45b, 46, 54, 61, 67, 68, 70, 76, 77, 81, 82, 86, 88, 99, 109
- Unfortunately, the book counts the number of failures before the success
as the geometric r.v., rather than the more standard trial of the first
success. This leads to a pmf shifted one unit to the left,
and the mean in the book is the real mean 1/p minus 1 (giving 1/p -1 = (1-p)/p).
I will not use the book's system!
- Ch4: 2, 11, 22, 23, 26, 32, 37, 49, 57, 59, 63, 64, 66, 82
- The exponential distribution has a tradition in communication technology for modeling waiting
times of various kinds and it is tied in with the Poisson process.
It is still useful on modern tcp/ip networks. For example, the
utility "tcpdump" collects information about traffic on an ip network to which your computer is
connected. The utility "ethereal" can be used to look at it packet-by-packet. An example is
the data ethereal.out collected Feb 6. The times (seconds) of the packets
are in the second column and are separated into the file times.txt (and
digitized by 1/1000 of a second to count the arrival interval in tE-4.txt).
- A Poisson Process is the random curve N(t) that counts the number of arrivals up to
time t, when the time between arrivals has the exponential distribution. If the exponential
distribution has rate lambda, then the number of arrivals in any time interval [s, t] has the
Poisson distribution with mean=lambda(t-s), the obvious thing. That is, the "rise" of the graph
over a time interval has the Poisson distribution, and the times between unit jumps upward
are exponential. The graph from the packet data times.txt is here
and the histogram of the interarrival times is here.
- To do calculations with
the Weibull distribution in Matlab, such as for 66, note that the Matlab parametrization is different than the book--
weibpdf(x,a,b) is the function f(x)=abxb-1e-axb for x > 0, with mean a-1/bGamma(1+1/b)
- 82 --try "qqplot" in Matlab, which essentially plots
the normal percentiles at values (1-.5)/10, (2-.5)/10, ..., (10-.5)/10 against the ordered data:
- >> y=sort(data)
- >> x=norminv( ((1:10)-.5)/10)
- >> plot(x,y,'bo')
- Normal(mu, sigma) data will give a qqplot that is nearly a straight line with slope sigma, intercept mu.
To see this, try y=normrnd(2,3,100,1) to simulate N(2,3) data and apply qqplot to y.
- We will look at the ibm stock price data, and the t-distribution with 20 degrees of freedom
using both histograms and qqplots for comparison. Note especially the qqplot of the stock price compared
to the qqplot of the stock returns!
- Ch5: 3, 10, 13, 17, 27, 28, 30, 41ab, 46, 52, 62, 63, 64ab, 65
- The Matlab command "histfit" is useful for seeing the central limit theorem:
>>u=unifrnd(0,1,10,10000);
>>s=sum(u);
>>histfit(u(1,:))
>>hold
>>histfit(s)
- Ch6: 1abcd, 9, 14, 19, 20, 23, 25, 33
- Capture/Recapture application of Maximum Likelihood Estimation.
- Ch7: 1ac, 3, 7, 12, 14, 23, 30, 32, 33, 43, 45, 52, 54
- Matlab does not have the newer method for the binomial c.i., but this .m file does it.
- Ch8: 1, 11abc, 21, 30, 32, 35, 36, 44, 46, 47
- Ch9: 5, 28, 33, 41, 44 (sections 1-3 only)
- Matlab does not have the newer method, sometimes called Welch's approximation, for confidence intervals in a two sample situation where the variances are not the same. The .m file twosampleci.m does it right.
Other exercises may be assigned. Quizzes will be based largely
on homework problems.
January 2004
Su Mo Tu We Th Fr Sa
1 2 3
4 5 6 7 8 9 10 Th first day of class, Ch1
11 12 13 14 15 16 17 Ch1, Ch12
18 19 20 21 22 23 24 2.1, 2.2, 2.3
25 26 27 28 29 30 31 2.4
February 2004
Su Mo Tu We Th Fr Sa
1 2 3 4 5 6 7 2.5, 3.1, 3.2, 3.3, 3.4
8 9 10 11 12 13 14
15 16 17 18 19 20 21 First Exam Tu, 4.1, 4.2
22 23 24 25 26 27 28 4.3, 4.4, 4.6
29
March 2004
Su Mo Tu We Th Fr Sa
1 2 3 4 5 6 4.3, 4.6, 5.1
7 8 9 10 11 12 13 no classes
14 15 16 17 18 19 20 Tu: 5.2, 5.3, 5.5. Th: 2nd project due, 5.4 Central Limit Theorem
21 22 23 24 25 26 27 6.1, 6.2, 7.2
28 29 30 31 Second Exam Tu
April 2004
Su Mo Tu We Th Fr Sa
1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24 Tu last day of class
25 26 27 28 29 30 Mo final exam 2-5 P.M.
May 2004
Su Mo Tu We Th Fr Sa
1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31
Possibly useful:
last updated: April 6 2004