Bioinformatics for Evolutionary Biologists
Abstract
Evolutionary biologists have two types of ancestors: naturalists such as Charles
Darwin (1809–1892) and theoreticians such as Ronald A. Fisher (1890–1962). The
intellectual descendants of these two scientists have traditionally formed quite
separate tribes. However, the distinction between naturalists and theoreticians is
rapidly fading these days: Many naturalists spend most of their time in front of
computers analyzing their data, and quite a few theoreticians are starting to collect
their own data. The reason for this coalescence between theory and experiment is
that two hitherto expensive technologies have become so cheap, they are now
essentially free: computing and sequencing. Computing became affordable in the
early 1980s with the advent of the PC. More recently, next generation sequencing
has allowed everyone to sequence the genomes of their favorite organisms.
However, analyzing this data remains difficult.
The difficulties are twofold: conceptual, which method should I use, and practical,
how do I carry out a certain computation. The aim of this book is to help the
reader overcome both difficulties. We do this by posing a series of problems. These
come in two forms, paper and pencil problems, and computer problems. Our choice
of concepts is centered on the analysis of sequences in an evolutionary context. The
aim here is to give the reader a look under the hood of the programs applied in the
computer problems. The computer problems are solved in the same environment
used for decades by scientists, the UNIX command line, also known as the shell.
This is available on all three major desktop operating systems, Windows, Linux,
and OS-X. Like any skill worth learning, using the shell takes practice. The
computer problems are designed to give the reader plenty of opportunity for that.