Applied Predictive Modeling
Abstract
This is a book on data analysis with a specific focus on the practice of
predictive modeling. The term predictive modeling may stir associations such
as machine learning, pattern recognition, and data mining. Indeed, these associations
are appropriate and the methods implied by these terms are an
integral piece of the predictive modeling process. But predictive modeling
encompasses much more than the tools and techniques for uncovering patterns
within data. The practice of predictive modeling defines the process of
developing a model in a way that we can understand and quantify the model’s
prediction accuracy on future, yet-to-be-seen data. The entire process is the
focus of this book.
We intend this work to be a practitioner’s guide to the predictive modeling
process and a place where one can come to learn about the approach
and to gain intuition about the many commonly used and modern, powerful
models. A host of statistical and mathematical techniques are discussed, but
our motivation in almost every case is to describe the techniques in a way
that helps develop intuition for its strengths and weaknesses instead of its
mathematical genesis and underpinnings. For the most part we avoid complex
equations, although there are a few necessary exceptions. For more theoretical
treatments of predictive modeling, we suggest Hastie et al. (2008) and
Bishop (2006). For this text, the reader should have some knowledge of basic
statistics, including variance, correlation, simple linear regression, and basic
hypothesis testing (e.g. p-values and test statistics).
The predictive modeling process is inherently hands-on. But during our research
for this work we found that many articles and texts prevent the reader
from reproducing the results either because the data were not freely available
or because the software was inaccessible or only available for purchase.
Buckheit and Donoho (1995) provide a relevant critique of the traditional
scholarly veil: