jebl.evolution.distances
Class HKYDistanceMatrix
java.lang.Object
jebl.evolution.distances.BasicDistanceMatrix
jebl.evolution.distances.HKYDistanceMatrix
- All Implemented Interfaces:
- DistanceMatrix
public class HKYDistanceMatrix
- extends BasicDistanceMatrix
Compute HKY corrected distance matrix
- Version:
- $Id: HKYDistanceMatrix.java 1036 2009-11-17 03:45:48Z matt_kearse $
Adapted from BEAST source code.
The code in this file appeared originally in the file F84DistanceMatrix, and the comment said (as above) HKY.
Initially I thought it was one model under two different names, but that was my ignorance. While similar,
there is an small difference which I was able to understance only by examining the transition
probability and rate matrices for both models.
Simply put, HKY assumes *the same* ratio of transitions/transversions for all bases, while F84 has a
different ratio for Purines and Pyrimidines. The confusion stems from the fact that those two different ratios
depend on just one parameter. If Kappa is the ratio for HKY then for F84
Kappa(Purine aka A,G) = 1 + Kappa/(pi(A)+pi(G)) and Kappa(Pyrimidine aka C,T) = 1 + Kappr/(pi(C)+pi(T)).
This difference simplifies the estimation of evolutionary distance and Kappa from F84, since some
entries in the HKY transition matrix contain expressions involving exp(-t alpha), where alpha depend on both Kappa
and wheather the element is a Purine or a Pyrimidine.
pi(A,C,G,T) = stationary frequencies
F84 HKY Tamura-Nei
K(AG) = 1 + Kappa/(pi(A)+pi(G)) Kappa a1
K(CT) = 1 + Kappa/(pi(C)+pi(T)) Kappa a2
b = 1 1 beta
Rate Matrix
| - b K(AG) b | | pi(A) 0 0 0 |
Q = | b - b K(CT) | | 0 pi(C) 0 0 |
| K(AG) 1 - b | | 0 0 pi(G) 0 |
| b K(CT) b - | | 0 0 0 pi(T) |
Transition Probability Matrix
PI(A) = PI(G) = pi(A) + pi(G)
PI(C) = PI(T) = pi(C) + pi(T)
Alpha(j) = 1 + Kappa F84
1 + PI(j)*(k-1) HKY
P(transversion) = pi,(i+1)%4 = pi,(i+3)%4 = pi(i) * (1 - exp(-t))
P(transition) = pi,(i+2)%4 = pi(i) + pi(j)(1/PI(j) - 1) * exp(-t) - pi(i)/PI(i) * exp(-t * Alpha(i))
Pi,i = pi(i) + pi(i)(1/PI(i) - 1) * exp(-t) - (pi(i)/PI(i) - 1) * exp(-t * Alpha(i))
From the above it is easy to see that for F84 Sum(all transversions) = 2 Sum(p(i)) (1 - exp(-t)) = 2(1-exp(-t))
which gives the best estimate of t as -log(1 - Sum(all transversions)/2).
When there are no transversions (say when sequences are very short or distance is very small) this estimate is 0
even when sequences are not identical. I am not sure if there is an easy way to fix this since in this case
Kappa is confused with t and only t*(Kappa+1) can be estimated.
The code in this file estimates the "evolutionary distance", which is
(2*Kappa*(pi(A)*pi(G) + pi(T)*pi(C)) + 2*(pi(A) + PI(C))*beta) * t. (*)
This raises a question since the stationary frequencies are estimated from all the sequences, but
transition/transversion rates are done for each pair individually. The result is that distances are not neccesarily
scaled correctly for the matrix as a whole. I am still trying to figure this out.
(*) The original code had a bug which counted only A<-->G substitutions while C<-->T where ignored.
- Author:
- Andrew Rambaut, Joseph Heled
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
HKYDistanceMatrix
public HKYDistanceMatrix(Alignment alignment,
ProgressListener progress,
boolean useTwiceMaximumDistanceWhenPairwiseDistanceNotCalculatable)
throws CannotBuildDistanceMatrixException
- Throws:
CannotBuildDistanceMatrixException
HKYDistanceMatrix
public HKYDistanceMatrix(Alignment alignment,
ProgressListener progress)
throws CannotBuildDistanceMatrixException
- Throws:
CannotBuildDistanceMatrixException
http://code.google.com/p/jebl2/