The Goulden-Jackson cluster method is a powerful method to calculate the probability of occurrences of a pattern or set of patterns in a sequence. If the patterns contain wildcard characters, however, the size of the connector matrix grows exponentially with the number of wildcards. Here we show that average correlation
c
(z) is a good predicator of hitting probability q
n
, and the generalized correlation function ĉ(z) can be used to approximate
c
(z) efficiently.We apply the method to the problem of optimal multiple spaced seed selection
for homology search. We reexamine the concept of optimal sensitivity of spaced seeds and show that it is better to select optimal seeds based on some average properties, such as
c
(1), which is the expectation of the first hitting length. Higher order approximations can also be constructed easily. Tests on arbitrary large genomic data with multiple seeds show that the optimal multiple seeds selected by the methods are indeed more sensitive. The methods provide a theoretical background on which various empirical observations can be unified and further heuristic search methods can be developed.