Thesis Proposal
"Algorithms for Analyzing Human Genome Rearrangements"
Crystal Kahn
Friday, December 11, 2009 at 10:00 A.M.
Room 241, Swig Boardroom (CIT 2nd floor)
The human genome exhibits a rich structure resulting from a long history of genomic changes, including single base-pair mutations and larger scale rearrangements such as inversions, deletions, translocations, and segmental duplications. The number and order of the genomic changes that resulted in the present-day human genome is not known, but can sometimes be inferred by comparison to the genomes of other species. In particular, genome rearrangements are modeled as operations on signed strings of characters representing blocks of conserved sequence. Genome rearrangement distance measures quantify the similarity between two or more genome sequences by counting the minimum number of rearrangement operations needed to transform one sequence into another. The development of efficient algorithms for computing genome rearrangement distances has been instrumental in computing both phylogenies for sets of known genomes (such as present-day species or gene families) and for constructing ancestral genome sequences.
In this thesis, I propose to develop algorithms to study recent genome rearrangements in human and cancer genomes. I introduce a novel measure, called duplication distance, to quantify the similarity between two genomic regions containing segmental duplications. I give an efficient algorithm to compute the duplication distance between a pair of signed strings and provide several generalizations of duplication distance that also measure inversion and deletion operations. I demonstrate the utility of the duplication distance measure in constructing the evolutionary history of segmental duplications in the human genome. Further, I propose to develop algorithms to infer an unknown mixture of genomes from an incomplete set of rearrangement measurements. This problem is motivated by recent cancer genome sequencing experiments that measure breakpoints caused by somatic mutations.
Host: Ben Raphael