P a ir w is e S e q u e n c e A lig n m e n t
O u tl in e
In tr o d u c ti o n
G lo b a l A lig n m e n t
The Basic AlgorithmL o c a l A lig n m e n t
S e m ig lo b a l A lig n m e n t
In tr o d u c ti o n
S e q u e n c e s im ila ri ty i s a n i n d ic a to r o f h o m o lo g y
T h e re a re o th e r u s e s f o r s e q u e n c e
T h e re a re o th e r u s e s f o r s e q u e n c e s im ila ri ty
D a ta b a s e q u e ri e s
C o m p a ra ti v e g e n o m ic s
…
In tr o d u c ti o n
E x a m p le :
GACGGATTAG GATCGGAATAGGA-CGGATTAG GATCGGAATAG GA-CGGA-TTAG9*1 + 3*(-2) = 3
9*1 -2 -1= 6
S c o ri n g
Match +1 mismatch -1 Gap penalty -2GA-CGGA-TTAG GATCGGAA-TAG
In tr o d u c ti o n
T h e s e q u e n c e s m a y h a v e d if fe re n t s iz e s .
W e d e fi n e a n a lig n m e n t a s t h e i n s e rt io n o f s p a c e s in a rb it ra ry l o c a ti o n s a lo n g t h e s p a c e s in a rb it ra ry l o c a ti o n s a lo n g t h e s e q u e n c e s s o t h a t th e y e n d u p w it h t h e s a m e s iz e .
In g e n e ra l , th e re m a y b e m a n y a lig n m e n ts w it h m a x im u m s c o re .
D if fe re n t a lig n m e n t
G lo b a l A lig n m e n t
L o c a l A lig n m e n t
L o c a l A lig n m e n t
S e m ig lo b a l A lig n m e n t
G lo b a l A lig n m e n t
U s in g D y n a m ic P ro g ra m m in g
Reuses the results of previous computationsE x a m p le ( tw o s e q u e n c e x , y ):
E x a m p le ( tw o s e q u e n c e x , y ): x : A G C y : A A A C
S c o ri n g f u n c ti o n :
M a tc h + 1
m is m a tc h - 1
G a p p e n a lt y - 2
G lo b a l A lig n m e n t - A A A C
yjStep 1 : forming a matrix F(i,j)
- 0 -2 -4 -6 -8 A -2 G -4 C -6
xi
G lo b a l A lig n m e n t
F(i-1,j-1)F(i-1,j) Move ahead in bothx ialigned to gap F(i,j-1)F(i,j)
match(x i,y j)gap penalty gap penalty y jaligned to gap While building the table, keep track of where optimal score came from, reverse arrows
G lo b a l A lig n m e n t F (i -1 ,j -1 ) + m a tc h (x
i,y
j)
F ( i , j ) = m a x o f F (i -1 ,j ) + g a p p e n a lt y F (i ,j -1 ) + g a p p e n a lt y F (i ,j -1 ) + g a p p e n a lt y
G lo b a l A lig n m e n t
Seq1:AAAC Seq2:AGC- A A A C - 0 -2 -4 -6 -8
A
- 0 -2 -4 -6 -8 A -2 G -4 C -6
A - A- -A- A -A A-
A A
G lo b a l A lig n m e n t
-AAACyj F(1,1) : ↘: 0 + 1 = 1 ↓ : -2 + (-2) = -4 → : -2 + (-2) = -4 -0-2-4-6-8 A-2 G-4 C-6
xi
→ : -2 + (-2) = -4
1
G lo b a l A lig n m e n t
-AAACyj F(1,2) : ↘: -2 + 1 = -1 ↓ : -4 + (-2) = -6 → : 1 + (-2) = -1 -0-2-4-6-8 A-21 G-4 C-6
xi
→ : 1 + (-2) = -1
-1
G lo b a l A lig n m e n t
-AAACyj -0-2-4-6-8 A-21-1-3-5 G-4-10-2-4 C-6-3-2-1-1
xi
G lo b a l A lig n m e n t
-AAACyjStep 2 : trace back Aligned Sequences : X :CGA- -0-2-4-6-8 A-21-1-3-5 G-4-10-2-4 C-6-3-2-1-1
xi
X : Y :C CG AA AA
-
G lo b a l A lig n m e n t
-AAACyjStep 2 : trace back Aligned Sequences : X :C-GA -0-2-4-6-8 A-21-1-3-5 G-4-10-2-4 C-6-3-2-1-1
xi
X : Y :C C
- AG AAA
G lo b a l A lig n m e n t
A rr o w p re fe re n c e b y
F o r in s ta n c e , w h e n a lig n in g x = A T A T w it h y = T A T A , w e g e t
–ATATra th e r th a n
ATAT– TATA––TATAG lo b a l A lig n m e n t
S u m m a ry
U s e s r e c u rs io n t o f ill i n i n te rm e d ia te re s u lt s t a b le re s u lt s t a b le
U s e s O (n m ) s p a c e a n d t im e
O(n2 ) algorithm Feasible for moderate sized sequences, but not for aligning whole genomes.L o c a l A lig n m e n t
A l o c a l a lig n m e n t b e tw e e n x a n d y i s a n a lig n m e n t b e tw e e n a s u b s tr in g o f x a n d a s u b s tr in g o f y . 00 F (i -1 ,j -1 ) + m a tc h (x
i,y
j)
F ( i , j ) = m a x o f F (i -1 ,j ) + g a p p e n a lt y F (i ,j -1 ) + g a p p e n a lt y
L o c a l A lig n m e n t
-CGAAGTTG -000000000 A0 G0G0 A0 C0 G0 T0 C0L o c a l A lig n m e n t
-CGAAGTTG -000000000 A000110000 G001002001G001002001 A000210100 C010010000 G002002001 T000100310 C010000120L o c a l A lig n m e n t
-CGAAGTTG -000000000 A000110000 G001002001G001002001 A000210100 C010010000 G002002001 T000100310 C010000120L o c a l A lig n m e n t
-CGAAGTTG -000000000 A000110000 G001002001G001002001 A000210100 C010010000 G002002001 T000100310 C010000120L o c a l A lig n m e n t
-CGAAGTTG -000000000 A000110000 G001002001G001002001 A000210100 C010010000 G002002001 T000100310 C010000120L o c a l A lig n m e n t
-CGAAGTTG -000000000 A000110000 G001002001G001002001 A000210100 C010010000 G002002001 T000100310 C010000120L o c a l A lig n m e n t
-CGAAGTTG -000000000 A000110000Local alignment : G A A G T G A C G T G001002001 A000210100 C010010000 G002002001 T000100310 C010000120