STA
210B/ENV 251
Statistics and Data Analysis for
the Biological Sciences
In this page, I will maintain a list of terms and concepts that we are learning in this portion of the course. This is a dynamic reference to reinforce the lecture material. It will be updated frequently. Terms will be added as they are introduced in the course. This page may also be used as an exam review sheet.
Can't find a term? If we are using a term or concept in lecture or lab that is not in this list, email me immediately. I will include it here and review it in the following lecture. Additionally, you may refer to either of your textbooks.
Hint: Try using Find or
Ctrl-F
to search for terms.
Probability Distributions
P(A) is the probability that event A occurs. P(A and B) = P(A)P(B) when and only when events are independent. P(A or B) = P(A) + P(B) - P(A and B) is the probability that either A or B occurs (or possibly both). When P(A and B) = 0, events A and B are said to be disjoint. (They never occur together.)
"X ~ B(n, p)" means that X has a binomial distribution with parameters n and p. That is, X is the number of successes among n independent events each with probability p. The expected number of successes is E(X)=np. The variance is Var(X)=np(1-p), and the standard deviation is the square root of the variance.
"Y ~ Poisson(m)" means that Y has a Poisson distribution. That is, Y is the number of events which have occured out of an unlimited number of possibilities but with only m expected. E(Y) = Var(Y) = m.
"Z ~ N(0, 1)" means that Z has a normal (bell-shaped) distribution with mean 0 and variance 1. Values are tabled in text.
Z-score is the observed values minus the expected value all divided by the standard deviation. For X ~ N(n, p), Z=(X-np)/sqrt(np(1-p)). For Y ~ Poisson(m), Z = (Y-m)/sqrt(m). Z-scores are approximately distributed as as standard normal, and the approximation more accurate for larger values of np or m.
The square of a standard normal random variable has a chi-squared distribution with 1 degree of freedom. The sum of k squares of standard normals has a chi-squared distribution with k degrees of freedom. Since, Z-scores are not precisely distributed as standard normals the sum of k squared Z-scores may be better approximated as a chi-square with degrees of freedom less than k.
Estimation
Where X ~ B(n,p), we would estimate p, the true probability of success, with X/n or p-hat. P-hat is called and estimator of the estimand p. A specific value of p-hat is called an estimate. The standard deviation of an estimator is called the standard error. For p-hat, SE is sqrt( p-hat(1 - p-hat)/n).
Hypothesis Testing
A hypothesis is a statement about a parameter of parameters which can be invalidated by observations, such as p1 = p2. The null hypothesis is usually the simpler of two hypotheses which must be rejected to lend support to the alternative hypothesis, H0: p1=p2 versus Ha: p1<>p2.
In classical hypothesis testing the null hypothesis is either rejected or not rejected. If a test rejects when the null is in actuallity true, then a Type I Error is said to have occured. Conversely, if a test does not reject the null when the alternative hypothesis is true, then a Type II Error is said to have occured. The alpha level of the test is the probability that a test makes a Type I error when the null is true. Confidence is the complement of the alpha level.
A test statistic, e.g. Z-score or Pearsons Chi-Squared Statistic, is compared with a hypothetical test distribution, e.g. standard normal or chi-squared, respectively. (The test distribution is related to the null hypothesis and is sometimes called the null distribution.) If the test statistic is too far into the tails of the test distribution, then the test rejects the null hypothesis. The cut-off between rejecting and not rejecting is determined by the desired level of the test.
When the test level is not specified or is to be left to the reader, a p-value is reported. The p-value is the area left in the tails beyond the observed test statistic. It is also the largest (smallest) level for which the alpha level test would reject (would not reject) the null hypothesis.
Confidence Interval
A confidence interval is an interval based on the estimator with is
constructed to contain the estimand in some specified fraction of samples,
the confidence level. Alternatively, the confidence interval for proportion
p is the interval containing all values of p0 for which a test of H0: p
= p0 would not reject at level alpha. Confidence and alpha sum to 100%.