Abstract
Eukaryotic genomes display segmental patterns of variation in various properties, including
GC content and degree of evolutionary conservation. DNA segmentation algorithms are
aimed at identifying statistically significant boundaries between such segments. Such algorithms
may provide a means of discovering new classes of functional elements in eukaryotic
genomes. This paper presents a model and an algorithm for Bayesian DNA segmentation
and considers the feasibility of using it to segment whole eukaryotic genomes. The algorithm
is tested on a range of simulated and real DNA sequences, and the following conclusions
are drawn. Firstly, the algorithm correctly identifies non-segmented sequence, and
can thus be used to reject the null hypothesis of uniformity in the property of interest.
Secondly, estimates of the number and locations of change-points produced by the algorithm
are robust to variations in algorithm parameters and initial starting conditions
and correspond to real features in the data. Thirdly, the algorithm is successfully used
to segment human chromosome 1 according to GC content, thus demonstrating the feasibility
of Bayesian segmentation of eukaryotic genomes. The software described in this
paper is available from the author's website (
Keywords
Get full access to this article
View all access options for this article.
