Methods for constructing the phylogenetic tree using the nucleotide or amino acid sequences may largely be classified into the distance-matrix methods and the character-state methods. In the distance-matrix method, the distance matrix, which consists of evolutionary distances (number of nucleotide or amino acid substitutions) between all possible pairs of sequences analyzed, is generated, and the phylogenetic tree fittest to the matrix is chosen. On the other hand, in the character-state method, the sequences are compared directly, and the phylogenetic tree fittest to the assumed pattern of nucleotide or amino acid substitution is chosen
In CLUSTALW, the phylogenetic tree is constructed by using the neighbor-joining (NJ) method, which belongs to the distance-matrix method. When the nucleotide sequences are analyzed, the p distance method, Kimura method, Tamura method, Tajima-Nei method, Gojobori-Ishii-Nei method, Tamura-Nei method, and so on, are available for estimating the number of nucleotide substitutions between sequences. These methods are different in the pattern (model) of nucleotide substitution assumed for estimating the evolutionary distance.
Generally, the bases T (U) and C have a pyrimidine, and A and G have a purine in their chemical structure, and the physicochemical properties are similar within each group. In fact, the rates of nucleotide substitution between T and C and between A and G (transitions) are empirically known to be greater than those of the other types of substitutions (transversions). In addition, since the equilibrium frequencies of T, C, A, and G are usually different in a genome, the rate of nucleotide substitution appears to be dependent on the frequency of the base to which the original base is substituting. Another mechanisms are also considered to make the rate of each nucleotide substitution (T -> C, A -> G, etc.) different.
These arguments suggest that assuming complex patterns of nucleotide substitution allows for accurate estimation of the numbers of nucleotide substitutions. However, the more complex models contain a greater number of parameters to be estimated, and the variances (standard errors) of the estimates become larger as the number of parameters increases. Since the parameter values are estimated from the sequence data analyzed, the accuracy of the estimates depends on the number of sequences, sequence length, and sequence divergence, etc. Therefore, the pattern of nucleotide substitution suitable for the analysis of sequences depends on the sequence data analyzed, and some methods are available for finding the fittest model for given sequence data.
In CLUSTALW, the default method used for estimating the number of nucleotide substitutions is the Kimura method, because this method is one of the most widely used methods. However, if the fittest model to the sequence data analyzed is different from the Kimura model, it is possible that incorrect results are obtained. In such cases, it may be useful to try another models in the analysis.
Similarly, the p distance method and Kimura method are available for estimating the number of amino acid substitutions between sequences in CLUSTALW. (Here the Kimura method for estimating the number of amino acid substitutions is totally different from the Kimura method for estimating the number of nucleotide substitutions.) The default method is the Kimura method, but the p distance method may also be useful for some data.