Abstract
BLAST is the most popular bioinformatics tool and is used to run millions of queries
each day. However, evaluating such queries is slow, taking typically minutes on modern
workstations. Therefore, continuing evolution of BLAST—by improving its algorithms and
optimizations—is essential to improve search times in the face of exponentially increasing
collection sizes. We present an optimization to the first stage of the BLAST algorithm
specifically designed for protein search. It produces the same results as NCBI-BLAST but in
around 59% of the time on Intel-based platforms; we also present results for other popular
architectures. Overall, this is a saving of around 15% of the total typical BLAST search
time. Our approach uses a deterministic finite automaton (DFA), inspired by the original
scheme used in the 1990 BLAST algorithm. The techniques are optimized for modern hardware,
making careful use of cache-conscious approaches to improve speed. Our optimized
DFA approach has been integrated into a new version of BLAST that is freely available for
download at
Get full access to this article
View all access options for this article.
