CSCI1950L: Algorithmic Foundations of Computational Biology

Semester:

Spring 2011

Schedule:

Tuesdays and Thursdays 2:30-3:50pm (K)

Location:

CIT Swig Boardroom

People:

Office Hours:

Tentative Syllabus

CIT Lubrano • Tuesday and Thursday, 2:30-3:50pm
Prof. Sorin Istrail
401-863-6196 • sorin@cs.brown.edu
Office Hours: TBA or by appointment

  1. Introduction. Comparative genomics: genomes (DNA, RNA and protein sequence), protein structures (geometry), gene regulation (logic, systems). The nature and complexity of bio-molecular data. The intertwining of algorithms and statistics in the design of genomics tools. The “Gold-Bug” – a metaphor for Bioinformatics.
  2. Sequence Alignment. Local and global alignment. Dynamic Programming algorithms. Edit graph theory. The fundamental Dynamic Programming recurrence. The Smith-Waterman algorithm. Probability and statistical significance of alignment. Substitution matrices, evolutionary models, and information theory. The PAM matrices of Margaret Dayhoff, the “mother and father” of Bioinformatics. Statistical assumptions for bio-molecular data. Statistics hypothesis testing. How Sir R.A. Fisher caught Mendel “cheating.”
  3. Genome Sequencing and Assembly. Shotgun sequencing and genome assembly algorithms. DeBrujin graphs and Eulerian paths based assembly algorithms. Poisson statistical theory of Eulerian paths assembly. Lander-Waterman statistical theory of genome sequencing and assembly. Next generation sequencing and assembly algorithms for Illumina, 454, SOLID.
  4. BLAST, String Matching Algorithms and Genomic Database Search. The BLAST algorithm and data structures. An outline of the Karlin-Altshuler BLAST statistical theory. Algorithmic speed up: a linear time approximation of the quadratic Smith-Waterman algorithm. Finite-automata speed ups. Burrows-Wheeler transforms and string compression. Short-reads mapping to assemblies.
  5. Hidden Markov Chains Algorithms and Gene Prediction. The three fundamental problems and their algorithms: Model Evaluation, Decoding, and Learning. Gene prediction algorithms.
  6. Genomic Regulation. Regulatory motifs. Transcription factors. Position weight matrices algorithms. Sea urchin - the First Genome of genomic regulation. A visit to the Sea Urchin Assembly. Suffix-trees data structure and algorithms. Gene regulatory networks. Logic functions of genomic cis-regulatory code. Davidson vs. von Neumann: an information processing parallel between the genomic regulatory system and the brain.
  7. SNPs and Haplotypes and the Human Genome. Single Nucleotide Polymorphism. Haplotypes. HAPMAP. Linkage Disequilibrium. Tagging SNPs and the Minimum Informative Subset of SNPs Problem. Guilt by association: the genome-wide association study (GWAS) research program.