Regression Modeling Strategies
Abstract
There are many books that are excellent sources of knowledge about
individual statistical tools (survival models, general linear models, etc.), but
the art of data analysis is about choosing and using multiple tools. In the
words of Chatfield [100, p. 420] “. . . students typically know the technical details
of regression for example, but not necessarily when and how to apply it.
This argues the need for a better balance in the literature and in statistical
teaching between techniques and problem solving strategies.” Whether analyzing
risk factors, adjusting for biases in observational studies, or developing
predictive models, there are common problems that few regression texts address.
For example, there are missing data in the majority of datasets one is
likely to encounter (other than those used in textbooks!) but most regression
texts do not include methods for dealing with such data effectively, and most
texts on missing data do not cover regression modeling.
This book links standard regression modeling approaches with
• methods for relaxing linearity assumptions that still allow one to easily
obtain predictions and confidence limits for future observations, and to do
formal hypothesis tests,
• non-additive modeling approaches not requiring the assumption that
interactions are always linear × linear,
• methods for imputing missing data and for penalizing variances for incomplete
data,
• methods for handling large numbers of predictors without resorting to
problematic stepwise variable selection techniques,
• data reduction methods (unsupervised learning methods, some of which
are based on multivariate psychometric techniques too seldom used in
statistics) that help with the problem of“too many variables to analyze and
not enough observations” as well as making the model more interpretable
when there are predictor variables containing overlapping information,
• methods for quantifying predictive accuracy of a fitted model,