The Data Science Design Manual
Abstract
Making sense of the world around us requires obtaining and analyzing data from
our environment. Several technology trends have recently collided, providing
new opportunities to apply our data analysis savvy to greater challenges than
ever before.
Computer storage capacity has increased exponentially; indeed remembering
has become so cheap that it is almost impossible to get computer systems to forget.
Sensing devices increasingly monitor everything that can be observed: video
streams, social media interactions, and the position of anything that moves.
Cloud computing enables us to harness the power of massive numbers of machines
to manipulate this data. Indeed, hundreds of computers are summoned
each time you do a Google search, scrutinizing all of your previous activity just
to decide which is the best ad to show you next.
The result of all this has been the birth of data science, a new eld devoted
to maximizing value from vast collections of information. As a discipline, data
science sits somewhere at the intersection of statistics, computer science, and
machine learning, but it is building a distinct heft and character of its own.
This book serves as an introduction to data science, focusing on the skills and
principles needed to build systems for collecting, analyzing, and interpreting
data.
My professional experience as a researcher and instructor convinces me that
one major challenge of data science is that it is considerably more subtle than it
looks. Any student who has ever computed their grade point average (GPA) can
be said to have done rudimentary statistics, just as drawing a simple scatter plot
lets you add experience in data visualization to your resume. But meaningfully
analyzing and interpreting data requires both technical expertise and wisdom.
That so many people do these basics so badly provides my inspiration for writing
this book.