About the Book
About the Author: Richard D. De Veaux is an internationally known educator and consultant. He has taught at the Wharton School and the Princeton University School of Engineering, where he won a "Lifetime Award for Dedication and Excellence in Teaching." He is the C. Carlisle and M. Tippit Professor of Statistics at Williams College, where he has taught since 1994. Dick has won both the Wilcoxon and Shewell awards from the American Society for Quality. He is a fellow of the American Statistical Association (ASA) and an elected member of the International Statistical Institute (ISI). In 2008, he was named Statistician of the Year by the Boston Chapter of the ASA. Dick is also well known in industry, where for more than 30 years he has consulted for such Fortune 500 companies as American Express, Hewlett-Packard, Alcoa, DuPont, Pillsbury, General Electric, and Chemical Bank. Because he consulted with Mickey Hart on his book Planet Drum, he has also sometimes been called the "Official Statistician for the Grateful Dead." His real-world experiences and anecdotes illustrate many of this book's chapters.
Dick holds degrees from Princeton University in Civil Engineering (B.S.E.) and Mathematics (A.B.) and from Stanford University in Dance Education (M.A.) and Statistics (Ph.D.), where he studied dance with Inga Weiss and Statistics with Persi Diaconis. His research focuses on the analysis of large data sets and data mining in science and industry.
In his spare time, he is an avid cyclist and swimmer. He also is the founder of the "Diminished Faculty," an a cappella Doo-Wop quartet at Williams College, and sings bass in the college concert choir and with the Choeur Vittoria of Paris. Dick is the father of four children.
Paul F. Velleman has an international reputation for innovative Statistics education. He is the author and designer of the multimedia Statistics program
ActivStats, for which he was awarded the EDUCOM Medal for innovative uses of computers in teaching statistics, and the ICTCM Award for Innovation in Using Technology in College Mathematics. He also developed the award-winning statistics program
Data Desk, and the Internet site Data and Story Library (DASL) (ASL.datadesk.com), which provides data sets for teaching Statistics. Paul's understanding of using and teaching with technology informs much of this book's approach.
Paul has taught Statistics at Cornell University since 1975, where he was awarded the MacIntyre Award for Exemplary Teaching. He holds an A.B. from Dartmouth College in Mathematics and Social Science, and M.S. and Ph.D. degrees in Statistics from Princeton University, where he studied with John Tukey. His research often deals with statistical graphics and data analysis methods. Paul co-authored (with David Hoaglin)
ABCs of Exploratory Data Analysis. Paul is a Fellow of the American Statistical Association and of the American Association for the Advancement of Science. Paul is the father of two boys.
David E. Bock taught mathematics at Ithaca High School for 35 years. He has taught Statistics at Ithaca High School, Tompkins-Cortland Community College, Ithaca College, and Cornell University. Dave has won numerous teaching awards, including the MAA's Edyth May Sliffe Award for Distinguished High School Mathematics Teaching (twice), Cornell University's Outstanding Educator Award (three times), and has been a finalist for New York State Teacher of the Year.
Dave holds degrees from the University at Albany in Mathematics (B.A.) and Statistics/Education (M.S.). Dave has been a reader and table leader for the AP Statistics exam, serves as a Statistics consultant to the College Board, and leads workshops and institutes for AP Statistics teachers. He has served as K-12 Education and Outreach Coordinator and a senior lecturer for the Mathematics Department at Cornell University. His understanding of how students learn informs much of this book's approach.
Dave and his wife relax by biking or hiking, spending much of their free time in Canada, the Rockies, or the Blue Ridge Mountains. They have a son, a daughter, and four grandchildren.
Table of Contents: I: EXPLORING AND UNDERSTANDING DATA
1. Stats Starts Here
1.1 What Is Statistics?
1.2 Data
1.3 Variables
1.4 Models
2. Displaying and Describing Data
2.1 Summarizing and Displaying a Categorical Variable
2.2 Displaying a Quantitative Variable
2.3 Shape
2.4 Center
2.5 Spread
3. Relationships Between Categorical Variables–Contingency Tables
3.1 Contingency Tables
3.2 Conditional Distributions
3.3 Displaying Contingency Tables
3.4 Three Categorical Variables
4. Understanding and Comparing Distributions
4.1 Displays for Comparing Groups
4.2 Outliers
4.3 Re-Expressing Data: A First Look
5. The Standard Deviation as a Ruler and the Normal Model
5.1 Using the Standard Deviation to Standardize Values
5.2 Shifting and Scaling
5.3 Normal Models
5.4 Working with Normal Percentiles
5.5 Normal Probability Plots
Review of Part I: Exploring and Understanding Data
II. EXPLORING RELATIONSHIPS BETWEEN VARIABLES
6. Scatterplots, Association, and Correlation
6.1 Scatterplots
6.2 Correlation
6.3 Warning: Correlation ≠ Causation
6.4 Straightening Scatterplots
7. Linear Regression
7.1 Least Squares: The Line of “Best Fit”
7.2 The Linear Model
7.3 Finding the Least Squares Line
7.4 Regression to the Mean
7.5 Examining the Residuals
7.6 R2: The Variation Accounted for by the Model
7.7 Regression Assumptions and Conditions
8. Regression Wisdom
8.1 Examining Residuals
8.2 Extrapolation: Reaching Beyond the Data
8.3 Outliers, Leverage, and Influence
8.4 Lurking Variables and Causation
8.5 Working with Summary Values
8.6 Straightening Scatterplots: The Three Goals
8.7 Finding a Good Re-Expression
9. Multiple Regression
9.1 What Is Multiple Regression?
9.2 Interpreting Multiple Regression Coefficients
9.3 The Multiple Regression Model: Assumptions and Conditions
9.4 Partial Regression Plots
9.5 Indicator Variables
Review of Part II: Exploring Relationships Between Variables
III. GATHERING DATA
10. Sample Surveys
10.1 The Three Big Ideas of Sampling
10.2 Populations and Parameters
10.3 Simple Random Samples
10.4 Other Sampling Designs
10.5 From the Population to the Sample: You Can't Always Get What You Want
10.6 The Valid Survey
10.7 Common Sampling Mistakes, or How to Sample Badly
11. Experiments and Observational Studies
11.1 Observational Studies
11.2 Randomized, Comparative Experiments
11.3 The Four Principles of Experimental Design
11.4 Control Groups
11.5 Blocking
11.6 Confounding
Review of Part III: Gathering Data
IV. RANDOMNESS AND PROBABILITY
12. From Randomness to Probability
12.1 Random Phenomena
12.2 Modeling Probability
12.3 Formal Probability
13. Probability Rules!
13.1 The General Addition Rule
13.2 Conditional Probability and the General Multiplication Rule
13.3 Independence
13.4 Picturing Probability: Tables, Venn Diagrams, and Trees
13.5 Reversing the Conditioning and Bayes' Rule
14. Random Variables
14.1 Center: The Expected Value
14.2 Spread: The Standard Deviation
14.3 Shifting and Combining Random Variables
14.4 Continuous Random Variables
15. Probability Models
15.1 Bernoulli Trials
15.2 The Geometric Model
15.3 The Binomial Model
15.4 Approximating the Binomial with a Normal Model
15.5 The Continuity Correction
15.6 The Poisson Model
15.7 Other Continuous Random Variables: The Uniform and the Exponential
Review of Part IV: Randomness and Probability
V. INFERENCE FOR ONE PARAMETER
16. Sampling Distribution Models and Confidence Intervals for Proportions
16.1 The Sampling Distribution Model for a Proportion
16.2 When Does the Normal Model Work? Assumptions and Conditions
16.3 A Confidence Interval for a Proportion
16.4 Interpreting Confidence Intervals: What Does 95% Confidence Really Mean?
16.5 Margin of Error: Certainty vs. Precision
16.6 Choosing the Sample Size
17. Confidence Intervals for Means
17.1 The Central Limit Theorem
17.2 A Confidence Interval for the Mean
17.3 Interpreting Confidence Intervals
17.4 Picking Our Interval up by Our Bootstraps
17.5 Thoughts About Confidence Intervals
18. Testing Hypotheses
18.1 Hypotheses
18.2 P-Values
18.3 The Reasoning of Hypothesis Testing
18.4 A Hypothesis Test for the Mean
18.5 Intervals and Tests
18.6 P-Values and Decisions: What to Tell About a Hypothesis Test
19. More About Tests and Intervals
19.1 Interpreting P-Values
19.2 Alpha Levels and Critical Values
19.3 Practical vs. Statistical Significance
19.4 Errors
Review of Part V: Inference for One Parameter
VI. INFERENCE FOR RELATIONSHIPS
20. Comparing Groups
20.1 A Confidence Interval for the Difference Between Two Proportions
20.2 Assumptions and Conditions for Comparing Proportions
20.3 The Two-Sample z-Test: Testing for the Difference Between Proportions
20.4 A Confidence Interval for the Difference Between Two Means
20.5 The Two-Sample t-Test: Testing for the Difference Between Two Means
20.6 Randomization Tests and Confidence Intervals for Two Means
20.7 Pooling
20.8 The Standard Deviation of a Difference
21. Paired Samples and Blocks
21.1 Paired Data
21.2 The Paired t-Test
21.3 Confidence Intervals for Matched Pairs
21.4 Blocking
22. Comparing Counts
22.1 Goodness-of-Fit Tests
22.2 Chi-Square Test of Homogeneity
22.3 Examining the Residuals
22.4 Chi-Square Test of Independence
23. Inferences for Regression
23.1 The Regression Model
23.2 Assumptions and Conditions
23.3 Regression Inference and Intuition
23.4 The Regression Table
23.5 Multiple Regression Inference
23.6 Confidence and Prediction Intervals
23.7 Logistic Regression
23.8 More About Regression
Review of Part VI: Inference for Relationships
VII. INFERENCE WHEN VARIABLES ARE RELATED
24. Multiple Regression Wisdom
24.1 Multiple Regression Inference
24.2 Comparing Multiple Regression Model
24.3 Indicators
24.4 Diagnosing Regression Models: Looking at the Cases
24.5 Building Multiple Regression Models
25. Analysis of Variance
25.1 Testing Whether the Means of Several Groups Are Equal
25.2 The ANOVA Table
25.3 Assumptions and Conditions
25.4 Comparing Means
25.5 ANOVA on Observational Data
26. Multifactor Analysis of Variance
26.1 A Two Factor ANOVA Model
26.2 Assumptions and Conditions
26.3 Interactions
27. Statistics and Data Science
27.1 Introduction to Data Mining
Review of Part VII: Inference When Variables Are Related
Parts I - V Cumulative Review Exercises
Appendices
Answers
Credits
Indexes
Tables and Selected Formulas