Abstract
Keywords
Introduction
In typical population genetic as well as comparative studies of genes and genomes the amplification of homologous genes, especially from non-model organisms, bases on PCR primers deduced from multi species alignments of available sequences or a corresponding consensus sequence. This approach finds its use in studies that aim to reveal the selective regimes that act on certain genes within a defined phylogeny.1,2
Moreover, the amplification of paralogous genes within one species, like the olfactory receptor gene family in human or the Low-molecular-weight glutenin subunit gene family in bread wheat,3,4 is based on multi sequence alignments as well. In summary, conserved regions in alignments of available sequences serve as template for the prediction of primers that are used to amplify unknown sequences (Fig. 1).

Methodology of primer design.
However, with a growing number of sequences or an increasing degree of sequence divergence, even conserved regions within an alignment may exhibit sequence variants or alternatively may exhibit unwanted sequence traits, hence PCR-amplification will depend on degenerate primers to ensure coverage of all sequences. An increasing level of degeneracy in primer sequences will in turn boost the probability for mispriming, especially in the light of a potential presence of paralogous genes, pseudo genes or multicopy transposable elements (TEs). On top of that, testing degenerate primers is circumstantial since it requires separate testing of each possible primer sequence. This complicates precluding mispriming and bears a challenge in connection with the controlling of primer characteristics and the assembly of proper degenerate primer pairs.
Today, sophisticated tools for primer prediction (eg, Primer3Plus and OligoFaktory) are available providing a comprehensive range of functions including the possibility to controll primer specificity via database search of primer sequences.5,6 However, they only support the prediction of nondegenerate primers based on single nondegenerate input sequences. To overcome this shortcoming, several different tools have been developed to predict degenerate primers based on multiple sequence alignments among them being the most popular GeneFisher2, CODEHOP, HYDEN, PriFi, Primaclade and Greene SCPrimer.7–14 However, these approaches cannot rival functionality of the first-mentioned tools. Beside more or less extensive restrictions in functional range, they do not support a comparison with reference databases to ensure specificity. Table 1 lists a selection of available parameter settings for popular primer prediction tools. Note that some of these tools were developed to adress special biological issues and may hold many features not listed in the table. I developed a freely available primer pair prediction software, easyPAC (easy Primer prediction from Alignments and Consensus sequences), that for the first time combines all features and routines that are required to design specific degenerate primers that underwent all commonly applied primer and primer pair test procedures and that are optionally mapped against an arbitrary number of user defined reference files that may contain paralogous genes, TE sequences or even whole genomes to ensure primer specificity.
Comparison of available parameter settings for different popular primer prediction tools.
Algorithm and Implementation
The accepted input format for alignment and reference file(s) is FASTA. For each sequence the full IUPAC code is accepted. Instead of using a multi sequence alignment, the program alternatively accepts one degenerate sequence (consensus sequence). Users have full control over all customary primer- and primer-pair parameters such as primer length (default: 18nt-28nt), Tm (default: 52 °C–65 °C), GC-content (default: 20%–80%), maximum 3′-end complementarity (default: 3), maximum 3′-end stability (default: 6), maximum number of poly-X/polyXY (default: 4/3) and the maximum degeneracy (default: 64), which is measured by the number of possible sequence combinations of a degenerate primer and in the following referred to as the ambiguity factor (AF). Users can also declare an arbitrary number of regions to be excluded from primer search and optionally use a predefined forward or reverse primer to find an adequate reverse or forward primer respectively.
Basically, the easyPAC algorithm first assembles all possible primer candidates and then performs a series of tests starting with basic tests so that many candidates are already rejected when easyPAC performs tests that require more computational power.
The workflow starts with reading of the alignment from which a consensus sequence is created (this step is omitted if the user provides a single consensus sequence). Since easyPAC in the following processes solely the consensus sequence, the number of sequences in the input alignment is not limited and has virtually no influence on computation time. The consensus sequence is then split in all possible individual sequences ranging from user defined minimum to maximum primer size. Sequences that are located within the declared target or excluded regions or that are too far from the target to yield a PCR product of proper size are rejected. Then easyPAC computes the AF for each of the remaining sequences (where 1 is multiplied with 2/3/4 for every RYSWKM/BDHV/N within the primer sequence) and primers with an improper AF are rejected.
In the next steps, all primer sequences that do not have the user defined 3′-end sequence or exhibit too high 3′-end stability are sorted out. For the remaining candidates, Tm (by default calculated by the base-stacking method), 15 GC-content, poly-X/XY content, and 3′-end self complementarity (which also avoids hairpin formation) are computed and primers that do not meet the user defined requirements are discarded.
If desired, easyPAC will now map primer candidate sequences to one or more reference files (which can contain for instance sequences of paralogous genes and/or TE sequences) and scrap those, that match to any sequence of the reference file in sense or antisense orientation. Alternatively the user can use the option ‘Allow primer to match once in reference’ and supply a whole genome of the species in question. In this case, primers matching more than once to the reference in sense or antisense orientation are rejected. This will ensure primer specificity and obviates the need for subsequent primer BLAST. A reference genome should not be provided if primers are intended for the amplification of paralogous genes within one species. By default, primer mapping is performed by implementation of the SeqMap algorithm which is very fast (eg, much faster than BLAST) since SeqMap is especially designed to map short sequences to large references. 16 However, SeqMap will only map to nondegenerate characters (ATGC) within the reference. So this option is recommended if the reference is large and contains no or very few degenerate positions. Alternatively, mapping can be performed using an internal algorithm which is notably slower but will map eg, an A to any of the following characters: ARWMD-HVN. This option is recommended if the reference is small and degenerated (eg, a consensus sequence of a large number of paralogous genes). Depending on the applied mapping algorithm, a specific number of mismatches including insertions or deletions can be tolerated.
Finally, easyPAC will sort forward and reverse primer candidates by their quality and assemble proper primer pairs on the basis of maximum allowed ΔTm, maximum product size and maximum 3′-end pair complementarity. Proper primer pairs are output in order of increasing AF sum and PCR product size.
One major improvement introduced by easyPAC is the possibility of testing degenerate primers. To this end, every degenerate primer candidate is used to rebuild a group of all possible sequence combinations prior to every applied test procedure. Every sequence within this group is then tested separately and the degenerate primer candidate will only be accepted if all of its possible sequence combinations passed the entire test procedure including optional mapping against a reference (Fig. 2).

easyPAC work flow.
Beside a textual output and for further improvement of usability, easyPAC creates a graphical output which ensures comprehensibility of primer pair selection. The graphical output contains the alignment with the corresponding consensus sequence, a color-coded indication of sequence conservation, an alignment annotation assigning target sequence, regions excluded from primer search, simple repeats, matches to reference and internal duplications and finally the best primer pairs and their location within the alignment (Fig. 3).

easyPAC output.
By default, easyPAC uses rather less rigorous parameter settings to maximize the number of suggested primers which anyway will be sorted by their quality in the final output. Using more stringent settings will not necessarily produce better results but may accelerate computation time since many primer sequences will be discarded during the initial test procedures.
Tests of the Program
An alignment containing genomic DNA sequences of TDRD1 exon 11 (87 bp ± 1000 bp of adjacent intronic sequence) from Human (
easyPAC was executed using the default settings. The coordinates of exon 11 were used to declare the target start and target end and a file containing primate TE sequences obtained from Repbase served as reference file. 17 In total, easyPAC considered 12384 mostly degenerate primer sequences. The number of primer candidates that passed each test are shown in Table 2.
Number of remaining primer candidates after each applied test.
Primer prediction was finished after 50 seconds (computation carried out on a standard desktop PC with 2,67 GHz and 4 GB RAM) and 19 primer pairs were suggested of which the first primer pair was used for subsequent PCR amplification (Forward primer: 5′-TCAAAGGATGCTTGAGRGGATGGT-3′, Tm: 55.8 °C–57.8 °C, degeneracy: 2-fold. Reverse primer: 5'-CAGTGMTAAAGYTGYGCCTTTGTTTA-3′, Tm: 52.9 °C–58.8 °C, degeneracy: 8-fold). Figure 4 shows the degree of sequence conservation along the alignment as well as the position of exon 11 and the position of the primers suggested by easyPAC.

Conservation of the alignment.
Given an alignment that covered the primate phylogeny from New World monkeys to Old World monkeys and hominids, the aim was to PCR amplify TDRD1 exon 11 of other representatives of these groups including Goeldi's marmoset (

Phylogenetic relationship of the species involved in this study.
All initial PCRs yielded a specific product of approximately 500 bp in length which corresponds to the expected size of the desired DNA fragment (Fig. 6). The PCR products were cloned and sequenced in both directions. The obtained sequences were aligned and TDRD1 exon 11 sequence could be found in all amplicons. TDRD1 exon 11 sequences of

PCR amplification of TDRD1 exon 11 using genomic DNA from 10 primate species and primers suggested by easyPAC.
Conclusion
The introduced software easyPAC can be used for fast prediction of specific and tested PCR primers from alignments exhibiting a high degree of sequence divergence. easyPAC is the first freely available primer design software that incorporates testing and mapping of degenerate primers, thus simplifying primer design enormously by reducing the required time for primer design from several hours to a few minutes with simultaneous maximization of result quality.
Although exhibiting up to 8-fold degeneracy, the predicted primers were found to perform well in wet lab experiments under the conditions suggested by the software. TDRD1 exon 11 could be PCR amplified and sequenced from all reference species and de novo sequenced from four other representative primate species. This shows, that easyPAC is perfectly applicable for upcoming comparative studies of homologous and paralogous genes.
Material and Methods
The sequence of human TDRD1 exon 11 (87 bp) including 1000 bp of 3′- and 5′-flanking intronic sequence respectively was used as basis sequence for an alignment including
DNA of the respective primate species was isolated from different tissues with the QIAmp DNA Mini Kit (QIAGEN). PCR of TDRD1 exon 11 was performed using the Taq PCR Core Kit (QIAGEN, [60″ denaturing at 95 °C, 40″ annealing at 52 °C, 60″ elongation at 72 °C] × 35). The PCR products were ligated into pGEM-T vector (Promega) and transformed to
Availability and System Requirements
easyPAC is freely available at http://www.uni-mainz.de/FB/Biologie/Anthropologie/472_ENG_HTML.php and also at http://sourceforge.net/projects/easypac/.
easyPAC is written in Perl. The software is provided as Perl script which contains the source code and which will run on any platform (Windows, Macintosh, Linux) but requires the installation of a Perl distribution. Normally Perl is preinstalled on Macintosh and Linux systems. However, the installation of additional modules may be required (available at the Comprehensive Perl Archive Network: www.cpan.org) if they are not already part of the installed Perl distribution. Details can be found in the easyPAC documentation. There is also an executable file of easyPAC (easyPAC.exe) that runs on Windows systems without the need for Perl installation.
Funding
This work was supported by the “Schwerpunkt rechnergestuetzte Forschungsmethoden in den Naturwissenschaften” (SRFN), Johannes Gutenberg-University, Mainz.
Supplementary Data
easyPAC.zip easyPAC.tar.gz
Both files contain the compressed easyPAC folder including the original Perl script (easyPAC.pl), the easyPAC documentation (easyPAC_documentation. pdf), an executable file for Windows systems (easyPAC.exe, tested for Windows 7 and Windows XP) and all files required for the execution of easyPAC.
easyPAC_results_image.png
easyPAC_results_text.txt
These files contain the original easyPAC output (picture and text) for primer search as described in the Tests of the program section.
A video abstract by the authors of this paper is available. video-abstract8870.mov
