242 379

STAT 242Q sec 02
ANALYSIS OF EXPERIMENTS, Fall 2007
UConn Storrs Campus, BOUS 160
MON WED 10:00-11:30
Eric Lundquist

[milk into tea] [Guinness Brewery]

Office: BOUS 136
Office Hours: Tue Thu 5:00-6:00, and by appointment
Phone: (860) 486-4084
E-mail: Eric.Lundquist@uconn.edu

Teaching Assistant: Carissa Gross
Office: BOUS 366
Office Hours: Wed 11:30-1:30
E-mail: Carissa.Gross@uconn.edu


READING:
  1. Keppel, Geoffrey & Wickens, Thomas D. (2004). Design and Analysis: A Researcher's Handbook, 4/E. Prentice Hall. ISBN-10: 0135159415 (ISBN-13: 9780135159415)
  2. On-Line Readings and Reserve Readings (see below)

GRADING:
   
  • Homework:
  • 30%   assigned weekly
       
  • Midterm:
  • 35%   WEDNESDAY OCTOBER 10, 10:00 AM
       
  • Final:
  • 35%   WEDNESDAY DECEMBER 12, 10:00 AM


    TOPICS AND READING ASSIGNMENTS: to be updated throughout the semester
    KW = Keppel and Wickens

    CLASS SYLLABUS in Microsoft Word format, should you lose your original. This has been considerably modified in the schedule below.

    TOPIC READING
    Experimental Design KW Ch. 1 [basic issues and terminology]
    Data Description KW Ch. 2 pp. 15-18, 24-25; Ch. 3 pp. 32-34; Ch. 7 pp. 144-145 [histogram, scatterplot; central tendency, dispersion, standardization; normality, skewness and kurtosis]
    The t-test and confidence intervals Howell Ch.7 [excellent treatment of the logic of the t-test, applied to the cases of a single sample mean, two related sample means, and two independent sample means; relation of t to z; confidence intervals described accurately on pp. 181-183]
    KW Ch. 3 pp. 34-36, Ch. 8 pp. 159-161
    and see my Notes on Confidence Intervals [references to Keith (2006) can be ignored, and the interpretation of confidence intervals for the regression coefficient "b" is the same as for the more familiar population mean "μ"]
    Null Hypothesis Significance Testing Howell Ch.4 [excellent and up-to-date treatment of the logic and controversies of hypothesis testing, possibly more accessible than Cohen's (1994) paper]
    KW Ch. 2 pp. 18-22; Ch. 3 pp. 46-48; Ch. 8 pp. 167-169
    Cohen (1994) [criticism of Null Hypothesis Significance Testing]
    Wilkinson and APA Task Force (1999) [recommendations for treatment of data in light of NHST controversy]

    For your curiosity and your future as a researcher, but not for your exam:
    Howell Ch. 5 Excerpt on Bayes's Theorem [provides a brief accurate description of Bayes's Theorem]
    Cohen (1990) [general advice about treatment of data]
    Cowles & Davis (1982) [historical roots of the "p<.05" significance level]
    Between Subjects (Completely Randomized) Designs: One Factor KW Ch. 2 & 3, Ch. 8 pp. 161-162
    Logic Of ANOVA summary
    Effect Size and Power KW Ch. 8 pp. 163-167 (but not "Effect Sizes for Contrasts")
    Assumptions of ANOVA (and t-tests): The Linear Model KW Ch. 7
    MIDTERM REVIEW interim summary
    Previous years' STAT 242 midterms [note that some of these pages are out of order]
    Correlation KW Ch. 15 pp. 312-314
    r = covxy / (sx*sy), where covxy = SPxy / (N-1), and SPxy = Σ(X-Mx)(Y-My)
    Analytical Comparisons Among Means (Single-df Contrasts) KW Ch. 4 sec. 4.1 - 4.5
    Analytic Contrasts summary
    Controlling Type I Errors in Multiple Comparisons (Planned and Post-hoc) KW Ch. 6
    Trend Analysis KW Ch. 4 sec. 4.6 - 4.7; Ch. 5
    Between-Subjects (Completely Randomized) Designs: Two Factors KW Ch. 10 & 11
    Analyzing Interactions KW Ch. 12 & 13
    KW Ch. 14 pp. 303-307, 309-310: Nonorthogonality of the Effects, 14.3 Averaging of Groups and Individuals, and 14.5 Sensitivity to Assumptions (14.4 "Contrasts and Other Analytical Analyses" is optional, being a little heavy on notation for things you wouldn't really do by hand).
    Analysis of Covariance (ANCOVA) KW Ch. 15 pp. 311-312 [Aside from the analogy to post-hoc blocking (pp. 231-232), this chapter will be largely skipped in favor of a regression-based treatment of ANCOVA in the spring semester (STAT 379).]
    Three Factors and Higher Order Factorial Designs: Between-Subjects Designs KW Ch. 21 & 22
    Recognizing Higher Order Interactions From Graphs And Means Tables
    Repeated Measures (Within-Subjects) Designs: One Factor KW Ch. 16 & 17
    Expected Mean Squares (PDF): this topic isn't specific to Repeated Measures Designs, but this is the most obvious place to introduce it; here's a Microsoft Word version in case it's convenient for any reason.
    Repeated Measures (Within-Subjects) Designs: Two Factors KW Ch. 18
    Mixed Designs: One Between, One Repeated Factor KW Ch. 19 & 20
    Finding Sources of Variance (PDF): once you're dealing with combinations of different numbers of between and within factors, it's good to have a general scheme for identifying what the sources of variance are in a given design; here's a Microsoft Word version in case it's convenient for any reason.
    Three Factors and Higher Order Factorial Designs: Repeated Measures and Mixed Designs KW Ch. 23
    Random and Nested Factors KW Ch. 24 & 25
    but read only pp. 530-534!
    Previous years' STAT 242 finals
    Some questions from previous years' STAT 242 midterms are relevant to the final exam material listed above (e.g. contrasts, post-hoc testing, etc.); see in particular: 2004#3, 2003#1(a-d), 2002#3, 2001#2&3&4(b, if you consider factorial designs), 2000#2(b&c)&3

    NOTE ON TERMINOLOGY AND READING
  • For clarification, a completely between-subjects design is sometimes referred to as a "Completely Randomized" design when observations in each cell are all from different participants, randomly sampled from the population and randomly assigned to conditions. Of course, some designs are between-subjects, but do not use random assignment, e.g., in the case of quasi-experiments where gender is a factor. So "Between-Subjects" design is probably the preferable general term. At any rate, the opposite of "Between Subjects" (or of "Completely Randomized") is "Within Subjects" or "Repeated Measures" design. In BOTH Between and Within designs, we are usually dealing with FIXED effects -- not RANDOM effects. So don't misinterpret the phrase "Completely Randomized" as having any implications about whether you're using fixed or random EFFECTS.
  • Beginning around the halfway point in the text, Keppel and Wickens devote much space to detailed analyses of particular cases that are just as easily considered as parts of a general approach, and while the detail may serve you well when consulting the text as a handbook in the future, it's not that useful at the introductory level (note the last paragraph on p. 464). Case in point: there are two full chapters on three-way designs, but aside from the concept of the three-way interaction and how to read the three-way graphs, it's essentially a generalization of the two-way analyses already covered (see p. 507).
  • I recommend that in those later portions of the text you skim over the parts that describe computations: e.g., SS's using bracket terms, contrasts using Ψ's with complicated subscripts, standard errors of t's used to evaluate them. It's certainly preferable that you understand the computations and formalisms, it's just that we'll emphasize how you can combine various SPSS results to achieve the same result. But DO note the many conceptual points and useful recommendations that are offered throughout all the chapters. If you make this distinction successfully, you'll find there are many fewer pages you really need to attend to.
  • Note the error in the last full paragraph of p. 309 (on heterogeneity of variance with unequal sample sizes), where Keppel and Wickens write that "When the smaller groups are the ones with the larger variances, the tests are biased to give too many Type I errors, while when the larger groups have the smaller variances, the tests are biased to give too few Type I errors." First of all, this is a heads-I-win-tails-you-lose situation since clearly the two conditions described are the same: When the smaller groups are the ones with the larger variances, the larger groups MUST be the ones with the smaller variances. Ugh. And then you have to wonder if the silly phrase "too few errors" implies that we strive to make a certain number of errors. I'm pretty sure what they meant to say is that when larger groups have SMALLER variances, the weighted-averaged error variance MSS/AB is biased toward being smaller than it should be, and F will be significant more often than it would be with an accurate larger error term, and thus Type I errors occur more than 5% of the time. When the larger groups have LARGER variances, the bias in computing the error term is toward a larger error MSS/AB, which makes F less likely to be significant than it really should be -- which is not a case of making "too few Type I errors" (the rate is now less than 5% but really, the fewer the better), but of the complementary problem of making too many Type II errors (finding a non-significant F when the difference is really there). When they say "too few Type I errors" they really just mean α has effectively been lowered.


    HOMEWORK ASSIGNMENTS: to be updated throughout the semester
    1. HW1 due 9/5; SPSS formatted data available here
      • Comments:
    2. HW2 due THURSDAY 9/13; SPSS formatted data available here
      • Comments:
    3. HW3 due 9/19
      • Comments: Three of these questions have been covered already, and three will be addressed in Monday's lecture. Reading Howell Ch. 4 (linked above) should give you all you need to complete the assignment. Note that Homework 2 asked you about confidence intervals, and you're asked about them again here. That's because my guess is that whatever you've absorbed about confidence intervals in the past is probably wrong (nothing personal, it's endemic to the field!) and Homework 2 let you take a stab at figuring out what they really mean. But between Howell's Ch. 7 (which you've read) and our discussion on Monday, I think you'll have it right for this homework.
    4. HW4 due 9/26; SPSS formatted data available here
      • Comments:
        Here is some SPSS translation that you either understand already or don't need for this homework, but which may be helpful to know about in the long run:
      • Note that SPSS gives different names to your Sources of Variance in the output: A = "group", S/A = "error". As we'll soon see, the sum of those two gives a Total for both SS and df, and the Total is listed in the output not as just plain "Total", but as "Corrected Total"!
        • The way those labels work is something like this. The "corrected model" row refers to the total of all the factors present in your experiment. For now we have only one factor (A) so that IS the whole model, thus the rows for "corrected model" and "group" have the same information. Soon enough we will also have a second factor (B) and its interaction with the first (A*B), and then the "corrected total" will refer to the three of those effects summed together, and each will be listed separately in its own row in place of the sole factor we now have called "group".
        • In SPSS, the so-called "total" SS (which is NOT the Total we're interested in!) computes the SS around an origin of zero, rather than around the grand mean of all the scores, and its degrees of freedom is the total number of observations. The "corrected total" (the one we ARE interested in!) finds the SS around the grand mean, which is after all an estimate of the population mean, and you may remember that in estimating a parameter from the data we lose a degree of freedom. And indeed, the df for the "corrected total" is the number of observations minus 1. You may think only the "corrected total" makes any sense - who bothers finding the "sum of squared deviations from zero" instead of "... from the mean"? And I agree with you completely, but read on...
        • The "intercept" represents the grand mean of all the observations, i.e., your estimate of the population grand mean, and it will almost always be highly significant, and will always have df = 1: that's the 1 df that you lost above by estimating the population grand mean from your data. What significance test are you doing on it? You're testing whether it's different from 0! Who knows why. Read p. 37 of Keppel and Wickens, you'll see that it's not especially useful or interesting, it's just there for some reason. This seems nonsensical to me, but... It has SS = 135.809 because if your grand mean were, say, 2.1277 (which it is on this homework), its squared deviation from a hypothesized mean of 0 would be 2.1277 squared or 4.5270, and if you summed that number over all 30 of your observations, well it'd be the same for each of them - there's only one grand mean so the 2.1277 and the 0 are the same for everyone. And 4.5270 x 30 = 135.81. Voila! - and no one cares. But there it is (which is pretty much what "voila" means in the first place; those of you who think the word is "viola" are beyond help). Notice that if you add the "intercept" SS and df to the "corrected total" SS and df, you get what SPSS labels the "total" SS and df.
        • Bottom line: it's the "corrected total" you'll care about all semester, so ignore the "intercept" and the (uncorrected) "total".
      • Why do we call the within-groups variance (S/A) effect the "error"? That's because it's the denominator of the F ratio, representing the experimental error (individual differences, measurement error, etc) that is the variability present among subjects who have all received the same treatment but still differ from each other. In more complicated designs the "error" term will not always be S/A; in fact, we will use different error terms to test different effects within the same experiment. Fun stuff.
      • The vertical axis of your means plot is labeled "Estimated Marginal Means", which you should just read as saying "the means of the groups"!
      • The output column labeled "Type III Sums Of Squares" is indeed your SS for each effect; why it's called "Type III" is best saved for STAT 379, though I'll be happy to share before then if you like. Let it be said that Type I SS may be of interest, but you will rarely if ever encounter Type II and Type IV. Don't worry, they're still calculated the same, it's just which data they're calculated from that might differ. For now, don't even think about it at all, just recognize that Type III is what we do here.
      • The R-squared value is printed underneath the "tests of between subjects effects" output box, and there's also something called "adjusted R-squared". The latter is an estimate of the population value that we will ignore until we look at R-squared in multiple regression next semester, at which point all will become clear.
    5. HW5 due MONDAY 10/8; SPSS formatted data available here AND here (you need BOTH HW5Af07.sav and HW5Bf07.sav!); the power analysis program GPower 3 is available here.
      • Comments:
    6. HW6 due 10/24; SPSS formatted data available here
    7. HW7 due FRIDAY 11/2; SPSS formatted data available here
      • Comments:
    8. HW8 due 11/14; SPSS formatted data available here
      • Comments:
    9. HW9 due FRIDAY 11/30; see Recognizing Higher Order Interactions From Graphs And Means Tables
      • Comments:
    10. HW10 due FRIDAY 12/7; SPSS formatted data available here
      • Comments:


    NOTES AND RESOURCES

    Plato (and Greek Philosophy from origins to Aristotle): from Thomas Leahey's textbook on the history of psychology. Note Plato's emphasis on the abstract and universal as being part of an ideal realm that can only be comprehended by the mind (soul), not the senses. Then consider the quite abstract notion of "population" in statistics. It's also interesting to ponder our characterization of all observations as deviations from an ideal (represented by the "mean") that may not even ever actually be observed -- hence the assumption that individual differences represent "error." Statistics and psychology have some pronounced Platonist strains.

    Deriving the estimate of the standard error of the mean: something you don't need to be able to do at all but may be curious about, and if you are, it's explained clearly in section 10.17 of this text by Glass and Hopkins.

    Why the sample variance has a denominator of N-1 instead of N: a proof that dividing the sample sum of squares by N-1 instead of N gives an unbiased estimate (i.e. accurate in the long-run average) of the population variance. This is purely for the mathematically inclined -- others should steer clear. (Believe it or not, I've seen other proofs that are more complicated and thus probably more thorough.) The "expectation" operator notated as E(X) means roughly the long-run average of X or the mean of all X's in the population, but note that doesn't necessarily indicate a mean of some score -- X could be a variance for instance, and then E(X) would be the population value of that variance, as it is in this proof. If that helps clear anything up.

    Confidence Intervals in Howell ch. 7 pp. 181-183
    Notes on the meaning and interpretation of Confidence Intervals: Howell's discussion is very good, so the somewhat lengthy little essay that I've included here is more than I intended to write; still, it may be helpful to hear it expressed in more than one way.

    Bayes's Theorem article in Wikipedia: I'm pretty sure it's legitimate to phrase the theorem this way: The probability of A being true given that B is true is equal to the probability that B actually does occur due to A, divided by the probability that B actually does occur due to any possible reason it might occur -- that is, that B occurs at all under any circumstances. This denominator is sometimes expressed as the sum of two other probabilities: that B occurs due to A, and that B occurs due to every reason other than A, which do in fact account for all occurrences of B since "A and not-A" pretty much covers every possible reason for B. You can substitute the observations of interest into this formula: A = a hypothesis being true, and B = data bearing on that hypothesis. Examples listed on this link are pretty illuminating, if you follow them closely. The trick with Bayesian statistics is coming up with those probabilities that are the ingredients in the formula, e.g., of B occurring due to any possible reason -- it's educated guesswork at best (which can be pretty good after all).
    Bayes's Theorem excerpt from Howell ch. 5: a very good basic treatment.

    Logic Of ANOVA summary

    Understanding ANOVA Visually: a fun bit of Flash animation; related teaching tools are listed at http://www.psych.utah.edu/learn/statsampler.html

    Statistical Power Applet: a visual demonstration of the relations among the various quantities related to power.

    G*Power Home Page: free software for power calculations.

    Correlation article in Wikipedia: whether or not the math explained here is of interest (correlations as cosines, etc.), the two images depicting sets of scatterplots are very important to understand.

    Analytic Contrasts summary

    Keppel's ANOVA notation system (PDF)
    Keppel's ANOVA notation system (Microsoft Word)
    This is a handy summary of how to compute Degrees of Freedom for any Source of Variance. Keppel and Wickens (2004) use an ANOVA notation system that provides a simple way to compute Sums Of Squares: by converting Sources of Variance into Degrees of Freedom, and then into a combination of "bracketed" quantities, where the brackets indicate some further adding and dividing. But since no one in their right mind computes Sums Of Squares by hand, the only remaining useful part of this page is the part describing how to get Degrees of Freedom. That is quite useful though.

    Recognizing Higher Order Interactions From Graphs And Means Tables

    Finding Sources of Variance (PDF)
    Finding Sources of Variance (Microsoft Word)

    Expected Mean Squares (PDF)
    Expected Mean Squares (Microsoft Word)

    Excel spreadsheet for calculating values of the z, t, F, and chi-square distributions and their probabilities

    Table of Selected Values of the t Distribution:

  • In the absence of SPSS, Excel (TDIST and TINV functions), or other relevant software, use this table to find the value of t that cuts off a certain percentage of the area under the curve, which corresponds to the probability of obtaining a t of that size or larger. Since t is symmetric it doesn't matter whether it's positive or negative (i.e., whether it's in the upper or lower tail); all that counts is the absolute value which represents the obtained score's distance from the null hypothesis value in units of estimated standard errors -- analogous to a z-score which uses KNOWN standard errors or standard deviations as its units. The many curves representing the t distribution differ depending on the degrees of freedom or df, with few df giving a curve that is flatter with longer tails than the standard normal distribution (or z distribution); with more and more df, the t distribution looks more and more like the z distribution. (Note that with infinite df, which means an infinite sample size, the values for t are identical to those you'd find in the z distribution.)
  • Read the row corresponding to the correct df: for analyzing means the df are n-1 for a single sample, and for a 2 sample means comparison the df are the sum of each sample's df (or N-2, where N is the total number of observations from both groups). In correlation and regression the df are the number of observations minus the number of predictors, minus 1 (or N-k-1). The commonly used proportions listed in this version of the table are conveniently identified by two different column headings, based on whether you want the proportion of interest to be located entirely in one tail, or split between the upper and lower tails. See the diagram accompanying the table to clarify this. ALWAYS use the two-tailed version, and thus the headings under "proportion in two tails combined" -- so the 1 df value for p=.05 is 12.706, not 6.314. (One-tailed tests of so-called "directional hypotheses" map p-values onto smaller required values of t, making it easier to declare results significant, but this procedure has always been controversial and I rarely see a situation that legitimately calls for it. How often is it really the case that one group's mean MUST be higher than the other's, and it's inconceivable that their sizes could be reversed?) As an example, the t value for the p<.01 cutoff for the difference between the means of two samples of size n=10 would be 2.878. The df would be (10-1) + (10-1) = 18, and the appropriate column would be the one under 0.01 as you read the "proportion in two tails combined" headings. If your obtained t is larger than 2.878 then it clearly cuts off an even smaller proportion of the area than .01, and thus you can say the t you obtained has p<.01. (Any statistical software will tell you precisely what the p-value for your t actually is.)
  • Note that if the particular df you're looking for don't appear in the table, you should use the next LOWER df -- do NOT round df UP even if that higher df value is closer to yours. Another table with more values included appears here, and many more are available on the web. Many of these, for instance this one, will give the complementary proportion of the area for values SMALLER than t, and will do so only for one tail -- thus to find the example value of 2.878 you'd have to look for 18 df and then the 99.5% cutoff value, because p=.01 corresponds to a total of 1% of the area being more extreme and you have to split that 1% into 0.5% in the upper tail and 0.5% in the lower.

    Table of Selected Values of the F Distribution:

  • In the absence of SPSS, Excel, or other relevant software, use this table to find the value of F that cuts off a certain percentage of the area under the curve, which corresponds to the probability of obtaining an F of that size or larger. The F distribution has only one tail to consider, in the sense that the extreme values of interest are UPPER values only. The distribution's shape differs according to both the number of groups (or predictors) being analyzed, and the number of observations being made, and so picking out the relevant member of the family of F distributions requires two numbers specifying its df (one for the numerator df and one for the denominator df). Reproducing all the percentage cutoff points for the area under the curve (corresponding to the probabilities) for all possible combinations of these df would be very unwieldy. Thus only the most common cutoff values -- 5%, 10%, and 1% -- are included in this version of the table. They are organized such that the columns represent different numerator df up to 20 (appropriate for 21 group means in ANOVA, or 20 predictor variables in regression, which should be plenty), and the rows represent all values of the denominator df from 1 to 100.
  • Consulting the section of the table appropriate for the p-value you wish to examine, you find the row and column corresponding to your numerator and denominator df and the value at that entry is the upper "critical value": the value of F beyond which the given percentage of the area under the curve is cut off. For instance, the value for the p<.01 cutoff for the difference between the means of two samples of size n=10 would be 8.285. Familiarity with ANOVA df would make it apparent that the numerator df would be [number of groups] - 1 = 2-1 = 1, and the denominator df would be the sum of the df within each group, or (10-1) + (10-1) = 18. The entry in the p=.01 portion of the table under numerator df (called "ν1") and denominator df (called "ν2") is 8.285, meaning that for those df the area under the curve beyond the value of 8.285 on the horizontal axis is 1% of the total, and the probability of randomly sampling scores that lead to that high an F value when there is no difference between the populations means is 1%. If your obtained F is larger than 8.285 then it clearly cuts off an even smaller proportion of the area than .01, and thus you can say the F you obtained has p<.01. (Any statistical software will tell you precisely what the p-value for your F actually is.)
  • For 2 groups, either F or t can be used to yield exactly the same probability; in comparing just two groups the numerator df will always be 1 and the denominator df will be the same as the df for t. F then is the square of t -- that is, within rounding error, 8.285 is the square of 2.878.
  • Note that if the particular numerator and/or denominator df you're looking for don't appear in the table, you should use the next LOWER df -- do NOT round df UP even if that higher df value is closer to yours. A printable pdf version of the F distribution table for p=.05 and p=.01 values with numerator df up to 10 and all denominator df up to 100 is here. More versions of tables for F and other distributions appear here and at various other easily located web sites. Many web pages such as this one will calculate a p-value for any given F and df, and others will calculate F given df and a p-value, etc. But if you have access to the internet, chances are you also have access to Excel which will do the same with its FDIST, FINV, TDIST, and TINV functions, etc., or SPSS which displays all p-values for its analyses automatically.
    Supplemental readings in statistics and psychology:

  • Some useful papers:
  • Gravetter, F. J., & Wallnau, L. B. (2006) Statistics for the Behavioral Sciences (7th ed.). Belmont, CA: Wadsworth/Thomson: a very clear introductory level statistics text.
  • Howell, David C. (2007). Statistical Methods for Psychology (6th Ed.). Thomson-Wadsworth. (ISBN-10: 0495012874; ISBN-13: 9780495012870): an introductory text of exceptional clarity and accuracy, for the grad or advanced undergrad level:
  • Keith, Timothy Z. (2006). Multiple Regression and Beyond. Allyn & Bacon. ISBN: 0205326447 (ISBN-13: 9780205326440): used for STAT 379 Spring 2007/2008.
  • Grimm, Lawrence G. and Yarnold, Paul R., eds. (1994). Reading and Understanding Multivariate Statistics. APA. (ISBN: 1-55798-273-2; ISBN-13: 978-1-55798-273-5): used for STAT 379 Spring 2007/2008.
  • Grimm, Lawrence G. and Yarnold, Paul R., eds. (2000). Reading and Understanding MORE Multivariate Statistics. APA. (ISBN: 1-55798-698-3; ISBN 13: 978-1-55798-698-6): companion volume to the 1994 book.
  • Pedhazur, Elazar J. (1997). Multiple Regression in Behavioral Research (3rd Ed.) Thomson-Wadsworth. (ISBN-10: 0030728312; ISBN-13: 9780030728310): an advanced text and one of the best references on multiple regression and related procedures.
  • Keppel, Geoffrey & Wickens, Thomas D. (2004). Design and Analysis: A Researcher's Handbook, 4/E. Prentice Hall. ISBN-10: 0135159415 (ISBN-13: 9780135159415): used for STAT 242 Fall 2007.
  • Maxwell, S. E., & Delaney, H. D. (2004). Designing experiments and analysing data: A model comparison perspective (2nd ed.). Mahwah, NJ: Erlbaum.(ISBN/ISSN: 0-8058-3718-3; ISBN13: 978-0-8058-3718-6): an advanced text on experimental design and ANOVA.


    Some important figures in the history of statistics:

  • Abraham De Moivre around 1730 derived the normal distribution as the limit of the binary distribution when the number of binary decisions (e.g., coin tosses) is infinite.
  • Johann Carl Friedrich Gauss often gets credit for discovering the normal distribution because in 1809 he proved that it described errors of measurement (in astronomy, etc.), which is why the normal distribution is sometimes called the Gaussian distribution.
  • Adolphe Quetelet in 1835 first applied the normal distribution to biological and behavioral traits rather than merely to measurement error, describing the concept of "the average man"; he also invented the Quetelet Index which today we usually refer to as the Body Mass Index (BMI).
  • Francis Galton invented the concepts of correlation and regression around 1886. He also read and wrote at age 2-1/2, went ballooning and did experiments with electricity for fun, mapped previously unexplored African territories, taught soldiers camping procedures and how to deal with wild animals and "savages," tried to objectively determine which part of Britain had the most attractive women, studied the efficacy of prayer empirically, observed the amount of fidgeting at scientific lectures to measure the degree of boredom, invented fingerprinting and weather maps along with the meteorological terms "highs," "lows," and "fronts," coined the phrase "nature and nurture," and pioneered mental testing, twin studies of heritability, the composite photograph, the study of mental imagery, the free-association technique for probing unconscious thought processes, the psychological survey questionnaire, and... umm... eugenics. Oops.
  • Karl Pearson founded modern statistics beginning in the 1890's, inventing the chi-square distribution and test and coining the term "standard deviation" among others; he formalized the calculation of the correlation coefficient (where Galton had arrived at it graphically) and so that calculation bears his name today.
  • George Udny Yule worked on the concepts and mathematics of partial correlation and regression in the 1890's, making multiple regression as we know it possible.
  • William Sealy Gosset in 1908 worked out the distribution of sample means ("standard error" in modern terminology) for cases where the population standard deviation is unknown -- hence he is the inventor of the t-test.
  • Ronald Fisher was a key figure in bridging the gap between the Darwinian theory of natural selection and its underlying mechanism of Mendelian genetics; from about 1915 onwards he also invented experimental design as we know it today, and developed Analysis Of Variance (ANOVA) as a generalization of Gosset's work to more than two groups (Snedecor in his influential early textbook named the 'F' statistic for Fisher).
  • Jerzy Neyman and Egon Pearson (son of Karl) invented and refined many of the concepts of null hypothesis significance testing in the 1930's (e.g. the alternative hypothesis, power, Type II error, confidence intervals), though Fisher had a constant ongoing argument with everything they did -- mainly because it wasn't the way HE did it.


    If you're wondering about classes being canceled due to weather, see http://news.uconn.edu/emergency_closings.php or call 486-3768.