Abstract
Abstract
Transcription factors (TFs) play a crucial role in gene regulation by binding to specific regulatory sequences. The sequence motifs recognized by a TF can be described in terms of position frequency matrices. Searching for motif matches with a given position frequency matrix is achieved by employing a predefined score cutoff and subsequently counting the number of matches above this cutoff. In this article, we approximate the distribution of the number of motif matches based on a novel dynamic programming approach, which accounts for higher order sequence background (e.g., as is characteristic for CpG islands) and overlapping motif matches on both DNA strands. A comparison with our previously published compound Poisson approximation and a binomial approximation demonstrates that in particular for relaxed score thresholds, the dynamic programming approach yields more accurate results.
Get full access to this article
View all access options for this article.
