Statistical Regression and Classification: From Linear Models to Machine Learning takes an innovative look at the traditional statistical regression course, presenting a contemporary treatment in line with today’s applications and users. It is intended for courses with students from statistics, computer science, engineering and economics. Here one takes a modern look at regression, in ways such as the following:
- The book supplements classical linear and generalized linear models with introductory material on machine learning methods.
- Recognizing that classification is the focus of many contemporary applications, the book covers this topic in detail, especially the multiclass case.
- In view of the voluminous nature of many modern datasets, there is a chapter on Big Data.
- There is much more hands-on involvement of computer usage.
Yet this book is indeed on regression methods, as opposed to one on statistical learning. The primary methodology used is linear and generalized linear parametric models, covering both the Description and
Prediction goals of regression methods. The author is just as interested in Description applications of regression, such as measuring the gender wage gap in Silicon Valley, as in forecasting tomorrow’s demand for bike rentals. An entire chapter is devoted to measuring such effects, including discussion of Simpson’s Paradox, multiple inference, and causation issues. Similarly, there is an entire chapter of parametric model fit, making use of both residual analysis and assessment via nonparametric analysis.
Norman Matloff is a professor of computer science at the University of California, Davis, and was a founder of the Statistics Department at that institution. His current research focus is on recommender systems, and applications of regression methods to small area estimation and bias reduction in observational studies. He is on the editorial boards of the Journal of Statistical Computation and the R Journal. An award-winning teacher, he is the author of The Art of R Programming and Parallel Computation in Data Science: With Examples in R, C++ and CUDA.