• 검색 결과가 없습니다.

S tr in g M a tc hi ng

N/A
N/A
Protected

Academic year: 2021

Share "S tr in g M a tc hi ng"

Copied!
21
0
0

로드 중.... (전체 텍스트 보기)

전체 글

(1)

S tr ing M at chi ng A lgor it hm s D ae -K i K ang

(2)

S tr in g M a tc hi ng



In p ut

A[1…n]:Text string P[1…m]: Pattern stringP[1…m]: Pattern string m<< n 

G oa l

Determine if A[1…n] includes P[1…m]

(3)

N a ïv e M a tc hi ng o r S tu p id M a tc hi ng na iv eM a tc hi ng (A [ ], P [ ]) {

▷n: length of A[ ],m: length of P[ ]

fo r i ← 1 to n -m + 1 { fo r i ← 1 to n -m + 1 { if (P[ 1 … m ] = A [i… i+ m -1 ]) th e n T h e re i s a m a tc h a t A [i] ; } }  R u n n in g t im e : O (m n )

(4)

tporaostacyobobA[ ] raos raos

P[ ]1. 2.

H o w N a ïv e M a tc h in g W o rk s ? H o w N a ïv e M a tc h in g W o rk s ? H o w N a ïv e M a tc h in g W o rk s ? H o w N a ïv e M a tc h in g W o rk s ?

raos raos

3. 12.

(5)

...abcdabcd... abcdabcwz

A[ ] P[ ]1.

mismatch

W h e n N a ïv e M a tc h in g i s i n e ff ic ie n t? W h e n N a ïv e M a tc h in g i s i n e ff ic ie n t? W h e n N a ïv e M a tc h in g i s i n e ff ic ie n t? W h e n N a ïv e M a tc h in g i s i n e ff ic ie n t?

abcdabcwz abcdabcwz

P[ ] abcdabcwz

abcdabcwz abcdabcwz

1. 2. 3. 4. 5.

(6)

Ra b in -K a rp A lg o ri th m



C ha ng e a p a tt e rn i nt o a n um b e r a nd c ha ng e a c o m p a re d p o rt io n o f a t ex t st ri ng i nt o a n um b e r



N ow , c o m p a re t w o n um b e rs ( in st e a d o f co m p a ri ng t w o s tr in g s)



S tr in g  N um b e r



S tr in g  N um b e r

Depends on the cardinality of a character set Example: = {a, b, c, d, e} |∑| = 5 Convert a, b, c, d, e into0, 1, 2, 3, 4 String“cadwill be2*52 +0*51 +3*50 = 28

(7)

O v e rh e a d o f th e C o n v e rs io n O v e rh e a d o f th e C o n v e rs io n O v e rh e a d o f th e C o n v e rs io n O v e rh e a d o f th e C o n v e rs io n • C o n v e rs io n t im e o f A [i …i + m -1 ]

–a i = A[i+m-1] +d(A[i+m-2] +d(A[i+m-3] +d(… +d (A[i]))…) –Θ(m)time complexity –Total matching of A[1…n] takesΘ(mn)–Total matching of A[1…n] takesΘ(mn) –No better than naïve matching

• H o rn e r’ s r u le R e g a rd le s s o f m , c a lc u la te t h e n u m b e r a s f o ll o w s :

–a i=d(a i-1–dm-1 A[i-1]) + A[i+m-1] –We calculate dm-1 once –Two multiplications and two additions

(8)

acebbceeaabceedb

eeaabp= 4*54 + 4*53 + 0*52 + 0*51 + 1 = 3001 a 1=0*54 +2*53 +4*52 +1*51 +1 = 356 acebbceeaabceedb

P[ ] A[ ]

O n e m a tc h in g e x a m p le O n e m a tc h in g e x a m p le O n e m a tc h in g e x a m p le O n e m a tc h in g e x a m p le

a2= 5(a1-0*54 )+2 = 1782

acebbceeaabceedb a 3=5(a 2-2*54 )+4 = 2664

acebbceeaabceedb

a7=5(a6-2*54 )+1 = 3001

acebbceeaabceedb

(9)

B a s ic B a s ic B a s ic B a s ic R a b in K a rp A lg o ri th m R a b in K a rp A lg o ri th m R a b in K a rp A lg o ri th m R a b in K a rp A lg o ri th m

basicRabinKarp(A, P, d, q) { n: length of A[ ], m: length of P[ ] p←0; a 1←0; forforforfori← 1totototom{▷calculatea 1 p← dp+ P[i];p← dp+ P[i]; a 1← da 1+ A[i]; } forforforfori← 1totototon-m+1{ ifififif(i≠ 1) thenthenthenthena i← d(a i-1–dm-1 A[i-1]) + A[i+m-1]; ifififif(p=a i) thenthenthenthenmatched at A[i]; } } Total running time: Total running time: ΘΘ((nn))

(10)

P ro b le m o f b a s ic R a b in K a rp A lg o ri th m P ro b le m o f b a s ic R a b in K a rp A lg o ri th m P ro b le m o f b a s ic R a b in K a rp A lg o ri th m P ro b le m o f b a s ic R a b in K a rp A lg o ri th m • D e p e n d in g o n | Σ | a n d m , a

i

c a n b e b ig

–bigger than CPU register size –and causes overflow

• So lu ti o n

–Limit the size of a iusing modulo operation –a i=d(a i-1–dm-1 A[i-1]) + A[i+m-1]  b i= (d(b i-1–(dm-1 modq)A[i-1]) + A[i+m-1]) modq –q is a big prime number, but dqshould be fit in a register

(11)

acebbceeaabceedb

eeaabp= (4*54 + 4*53 + 0*52 + 0*51 + 1) mod 113 = 63 a 1= (0*54 +2*53 +4*52 +1*51 +1) mod 113 = 17 acebbceeaabceedb

P[ ] A[ ]

M a tc h in g u s in g m o d u lo M a tc h in g u s in g m o d u lo M a tc h in g u s in g m o d u lo M a tc h in g u s in g m o d u lo

a 2= (5(a 1-0*(54 mod 113))+2) mod 113 = 87

acebbceeaabceedb a 3= (5(a 2-2*(54 mod 113))+4) mod 113 = 65

acebbceeaabceedb

a 7= (5(a 6-2*(54 mod 113))+1) mod 113 = 63

acebbceeaabceedb

(12)

R a b in R a b in R a b in R a b in ---- K a rp A lg o ri th m K a rp A lg o ri th m K a rp A lg o ri th m K a rp A lg o ri th m

RabinKarp(A, P, d, q) { n: size ofA[ ], m: size of P[ ] p←0; b 1←0; forforforfori← 1totototom{▷b 1 p← (dp+ P[i]) modq; b← (db+ A[i]) modq;b 1← (db 1+ A[i]) modq; } h← dm-1 modq; forforforfori← 1totototon-m+1{ ifififif(i≠ 1) thenthenthenthenb i← (d(b i-1–hA[i-1]) + A[i+m-1]) mod q; ifififif(p=b i) thenthenthenthen ifififif(P[1…m] = A[i…i+m-1])thenthenthenthen there is a match at A[i]; } } Average running time: Average running time: ΘΘ((nn))

(13)

A ut o m a ta B a se d M a tc hi ng



Fi ni te A ut o m a ta



Fi ni te sym b ol s, s ta tes a nd t ra ns iti on



Fi ve tu p le : (Q , q 0 , A , ∑ , δ)

Q : states Q0 : start stateQ0 : start state A : accepted states ∑ : input alphabet δ: state transition diagram 

S ta te s re p re se nt a s na p sh o t o f m a tc hi ng p ro ce ss

(14)

A ut o m a ta t ha t ch e ck s a b a b a ca

aaaabbc

a a aa

0 1 2 3 4 5 6 7

aaaabb b

c b

S : d v g a n b b a c ta b a b a a b a b a c a b a b a b a c a a g b k

(15)

0 1

abc 021

001

StateInput symbol 00 00 00

dez

Im p le m e n ta ti o n o f A u to m a ta Im p le m e n ta ti o n o f A u to m a ta Im p le m e n ta ti o n o f A u to m a ta Im p le m e n ta ti o n o f A u to m a ta

0 1

abc 1000 1200

ElseStateInput symbol 21 3 4 5 6 7021

007

641

005

041

003

021 000000

0 000000

0 0000000

2

1 3 4 5 6 7

1200 3000 1400 5000 1460 7000 1200

(16)

A lg o ri th m t h a t c h e c k s m a tc h in g u s in g a u to m a ta A lg o ri th m t h a t c h e c k s m a tc h in g u s in g a u to m a ta A lg o ri th m t h a t c h e c k s m a tc h in g u s in g a u to m a ta A lg o ri th m t h a t c h e c k s m a tc h in g u s in g a u to m a ta

FA-Matcher (A, δ, f ) f: accepted state { n: length of A[ ] q← 0;q← 0; forforforfori← 1totototon{ q← δ(q, A[i]); ifififif(q=f) thenthenthenthenwe found a match at A[i-m+1]; } } Total running time: Θ(n+ ||m)

(17)

C o m p u te C o m p u te C o m p u te C o m p u te T ra n s it io n F u n c ti o n T ra n s it io n F u n c ti o n T ra n s it io n F u n c ti o n T ra n s it io n F u n c ti o n

Compute_Transition_Function(P, ∑) { m|P|; length of pattern for for for for q← 0totototomdodododo for for for for each character a∑dodododo k← min(m+1,q+2); repeat repeat repeat repeat k← k -1repeat repeat repeat repeat k← k -1 until until until until P[1..k] is a suffix of P[1..q]a; δ(q, a) ← k; } Running time: Running time: ΘΘ((||m))

(18)

K M P (K nu th -M o rr is -P ra tt ) A lg o ri th m



Si m ila r to a ut om a ta b a sed m a tc hi ng



C om m on a lit ies

Prepare states for mismatch Simpler than automata based matchSimpler than automata based match ...abcdabcd... abcdabcwz abcdabcwz

A[ ] P[ ]

(19)

abcdabcwz

12345678 π[8] = 4

9 P[ ]

P re p a re r e tu rn p o in ts f o r e a c h m is m a tc h P re p a re r e tu rn p o in ts f o r e a c h m is m a tc h P re p a re r e tu rn p o in ts f o r e a c h m is m a tc h P re p a re r e tu rn p o in ts f o r e a c h m is m a tc h

Matched upto “abcdabc”, failed at “w” “abc” at 1,2,3 and “abc” at 5,6,7 are the same. 0111123411

12345678910 π[ ] abcdabcwz

So, we compare mismatched text char with P[4] For each index of pattern, we prepare the return point.

(20)

K M P A lg o ri th m K M P A lg o ri th m K M P A lg o ri th m K M P A lg o ri th m

KMP(A[ ], P[ ]) { preprocessing(P); i← 1;index pointer of text j← 1;index pointer of pattern n: size of A[ ], m: size of P[ ] whilewhilewhilewhile(i≤n) {whilewhilewhilewhile(i≤n) { ifififif(j= 0ororororA[i] = P[j]) thenthenthenthen{ i++; j++; } elseelseelseelsej← π[j]; ifififif(j=m+1) thenthenthenthen{ There is a match at A[i-m]; j← π[j]; } } } Running time: Running time: ΘΘ((nn))

(21)

P re p ro c e s s in g P re p ro c e s s in g P re p ro c e s s in g P re p ro c e s s in g

preprocessing(P) { m← |P|; ▷length of pattern π[1] ← 0; k← 0; for for for for q← 2totototomdodododo whilewhilewhilewhile(k > 0) andandandand(P[k+1] ≠P[q]) dowhilewhilewhilewhile(k > 0) andandandand(P[k+1] ≠P[q]) do k← π[k]; ifififif(P[k+1] = P[q])thenthenthenthenk ←k+1; π[q]← k; returnreturnreturnreturnπ; } Running time: Running time: ΘΘ((mm))

참조

관련 문서

This solution includes Smart Monitoring, which checks the status of miners and the mining in real-time, and provides the information necessary for optimal decision

In addition, Wallace and Froum 20) reported that the bone formation rate was superior in the cases using a barrier membrane in comparison with the

And that result interpreted that water hydrolysis of adhesive resin cause porphyrin value loss, so that adhesive resin was degradated.. Thus, by using the

The study has a meaning in that it critically examined existing schema presented using the schema theory and presented specific reading education methods

Experimental alveolar ridge augmentation by distraction osteogenesis using a simple device that permits secondary implant placement. Anteroinferior distraction of

Based on the experimental results, the execution time in the matching process for 36 fingerprint minutiae, 200 chaff minutiae and 34 authentication fingerprint

In future studies focused on antioxidant effect and active oxygen reduction in resistance exercise using thera-band will be conducted in-depth research that

For class flow, autotelic experience of the first and second graders was higher than that of the third graders, and matching of behavior with consciousness