Directed evolution: simple idea, complex to put in practice

(1)

Directed evolution: simple idea, complex to put in practice

10 PRODUCE A MUTANT SPECTRUM OF SELF-REPRODUCING TEMPLATES

20 SEPARATE AND CLONE INDIVIDUAL MUTANTS 30 AMPLIFY CLONES

40 EXPRESS CLONES

50 TEST FOR OPTIMAL PHENOTYPES 60 IDENTIFY OPTIMAL GENOTYPES

70 RETURN TO 10 WITH A SAMPLE OF OPTIMAL GENOTYPES

M. Eigen, W. Gardiner (1984), Pure Appl. Chem. 56, 967-978.

(2)

• All possible mutants >> number of atoms in the universe

• Bacterial transformation rarely yields more than 10 ⁶ colonies.

Impossible to make all mutants

– 5,700 ways to change one amino acid (19*300)

– 16,200,000 ways to change two amino acids (19300)(19299)/2 _variants

M

= 19

^M

300!

(300 − M )!M ! for a 300 aa protein where M = number of amino acids that differ

Bosley & Ostermeier (2005) Mathematical expressions useful in the

construction, description and evaluation of protein libraries, Biomol. Eng., 22, 57.

(3)

Find one substitution that improves the enzyme

• test all single substitutions (systematic saturation mutagenesis)

(find best mutant, rare improvement, unexpected location)

• test some random single substitutions

(improvements are common; error prone PCR yields only ~10-20% of possibilities)

• focus changes at locations most likely for success

(find acceptable mutant with a minimum of screening)

(4)

Systematic saturation mutagenesis

• Systematic complete library of all possible single

substitution mutants: saturation mutagenesis using NNK primers at every position. [a lot of work!]

• Example: increase enantioselectivity of a nitrilase from E

~15 to >100 with Ala190His.

• Error-prone PCR would have missed it. Ala (GCN) to His CAU/C requires 2 or 3 nucleotide substitutions

DeSantis et al. (2003), Creation of a productive, highly enantioselective nitrilase through gene site saturation mutagenesis (GSSM), J. Am. Chem. Soc., 125, 11476-11477.

15N nitrile hydrolysis

NC C

¹⁵

N

OH

NC COOH

OH

HOOC C

¹⁵

N

OH

(S), m/z = 130 pseudo-prochiral (R), m/z = 129

Lipitor

^®

precursor

14N nitrile hydrolysis

(5)

Random Mutagenesis:

Error-prone PCR

NOT completely random, NOT complete (16-29% of ideal)

Incomplete library - use when you believe multiple solutions exist.

1. Codon bias 1. Completely random codons (64) do not code for the 20 amino acids equally (e.g., 4 for Gly, 1 for Trp) due to codon bias of genetic code.

2. Error-prone PCR does not yield completely random codons.

a) Polymerase bias. Polymerases favor some nucleotides substitutions over others.

b) Codon bias 2. One substitution in a codon is more likely than two substitutions, which is in turn more likely than three

substitutions.

(6)

• Temperature cycle

• 1. denature (break double strands) 95°C, 30 sec

• 2. annealing (bind primer) 55°C, 30 sec

• 3. elongation (synthesize new DNA strand) 72°C, 60 sec

• Each cycle increases [DNA] 2-fold. 2 ³⁰

= 10 ⁹

• Uses thermostable DNA polymerase (bacteria from hot spring in Yellowstone)

What happens in the tube

5'

5' 3'

3'

1 Denaturation

5'

5' 3'

3'

+

5'

3' 3'

5'

2 Annealing

5'

3' 5'

3'

3 Elongation

5'

3' 3'

3' 5' 5'

3' 5'

1

5' 3'

3' 5' 3' 5'

3' 5'

2

+ +

&

3

5' 3'

3' 5'

5' 3'

3' 5'

1 , 2 & 3

Exponential growth of short product

• PCR animation

www.dnalc.org/ddnalc/resources/pcr.html

(7)

Error-prone PCR ingredients

• Diversify PCR Random Mutagenesis (Clontech) Taq polymerase, control of mutation rate by Mn

²⁺

and dGTP concentration

• GeneMorph (Stratagene): control mutation rate by amount of

template (less template, more copying, which leads to more errors)

Caldwell & Joyce (1994) Mutagenic PCR, PCR Methods Appl., 3, S136.

2

DNA Shuffling

Parent sequence

DNase treatment

Denaturization

PCR w/o primers

Shuffled produkt Random fragments

Extension via polymerase

Error Prone PCR

" Analysis of large protein regions

" PCR conditions enhance mismatches

" Saturating mutagenesis virtually impossible

" Combination of mutations

Error Prone PCR

Taq

Mg²⁺

dCTP, dTTP dGTP, dATP

lacks 3E 5E exonuclease activity posseses intrinsic error rate

Error Prone PCR

Taq

Mg²⁺

promotes misincorporation

Error Prone PCR

Taq

Mg²⁺

stabilizes non-complementary base pairs

Error Prone PCR

Mn²⁺

Taq

Mg²⁺

stabilizes non-complementary base pairs

reduces the specificity of the polymerase

low annealing temperature

template amount, cycle number

(8)

Codon bias 1: 64 codons translate to 20 amino acids unequally

• Four codons yield Gly, but only two yield Phe and only one yields Trp.

• Completely random

DNA codons favor some

amino acids over others.

(9)

Types of nucleotide substitutions

• Transition (purine to purine; A, G) or pyrimidine to pyrimidine; C, T)

• Tranversions (purine to pyrimidine or pyrimidine to purine)

Exercise: Write out all the possibilities to show that random muta- genesis should yield twice as many transversions as transitions.

N N N

N

NH₂

NH N N

N

O

NH₂

N N NH₂

O

NH N O

O

R R

A

C T

G

(10)

Polymerases biased for some substitutions over others

• Both Taq and Mutazyme I favor transitions over transversions (Ts/Tv > 0.5)

• Taq replaces more AT with GC than vice versa (GC content of DNA increases). Mutazyme I does the reverse.

• Taq makes ~4x more mutations at A and T than at G and C.

Mutazyme I does the reverse.

• Mutazyme II is a mixture of Taq and Mutazyme I which minimizes mutational bias.

8 GeneMorph II EZClone Domain Mutagenesis Kit

T^ABLEII

Mutational Spectra of Mutazyme and Taq DNA Polymerases

Type(s) of mutations

Mutazyme II DNA polymerase^a

Mutazyme I

DNA polymerase^a

Taq DNA polymerase (Reference 5)^b

Bias Indicators

Ts/Tv 0.9 1.2 0.8

AT GC/GC AT 0.6 0.2 1.9

A N, T N 50.7% 25.6% 75.9%

G N, C N 43.8% 72.5% 19.6%

Transitions

A G, T C 17.5% 10.3% 27.6%

G A, C T 25.5% 43.7% 13.6%

Transversions

A T, T A 28.5% 11.1% 40.9%

A C, T G 4.7% 4.2% 7.3%

G C, C G 4.1% 8.8% 1.4%

G T, C A 14.1% 20.0% 4.5%

Insertions and Deletions

Insertions 0.7% 0.8% 0.3%

Deletions 4.8% 1.1% 4.2%

Mutation Frequency

Mutations/kb (per PCR)^c 3P16 (per PCR) <1 to 7 (per PCR) 4.9 (per PCR)

a The Mutazyme DNA polymerases were used with the corresponding GeneMorph random mutagenesis kits.

b The Taq DNA polymerase was used with Mn²⁺-containing buffer and unbalanced dNTP concentrations, which are mutagenic conditions for Taq DNA polymerase.

c Initial target amounts of 16 pg to 1 g (Mutazyme II DNA polymerase), 1 pg to 100 ng (Mutazyme I DNA polymerase), and 0.01 nM template (Taq DNA polymerase) were used to generate data.

As shown in Table II, error-prone enzymes generally favor transitions over transversions, as shown by Ts/Tv ratios greater than 0.5, with Mutazyme II and Taq exhibiting a somewhat higher tendency to create transversions over transitions and Mutazyme I exhibiting a greater tendency for introducing transitions over transversions. Examining transition mutation frequencies shows that Mutazyme II produces AT GC and GC AT mutations with similar rates (AT GC/GC AT ratio = 0.6), while Mutazyme I is 4 times more likely to generate GC AT transitions over AT GC transitions, and Taq is 2 times more likely to introduce AT GC transitions over GC AT transitions. In addition, Mutazyme II DNA polymerase introduces mutations at A’s and T’s only slightly more frequently than G’s and C’s. In contrast, Mutazyme I is nearly 3 times more likely to mutate G’s and C’s, while Taq under error-prone conditions is 4 times more likely to mutate A’s and T’s than G’s and C’s.

0.5 1.0 50%

50%

(11)

Codon bias 2: Some aa substitutions require 1 nucleotide change; others 2.

G G A Gly

mutation at 1 ^st position

C G A Arg

A G A stop

T G A Arg

• Single nucleotide change at GGA codon (Gly) yields not 9, but only 4 amino acid substitutions.

• Average: 5.7 amino acids accessible by a single nucleotide change. Two nucleotide changes are much less likely.

mutation at 3 ^rd position mutation at

2 ^nd position

G C A Ala

G A A Glu

G T A Val

G G G Gly

G G C Gly

G G T Gly

(12)

Expected result of epPCR

• Ideal: 19*300 = 5700 protein variants

Assume 19 condon substitution at each codon (not three nucleotides randomly): 19*300 = 5700 at DNA level;

screen 4.6*5700 = 26,200 colonies)

• Codon bias. Only 5.7 amino acids accessible by a

single nucleotide substitution. 5.7*300 = 1710 (29%)

(This value also accounts for synonymous amino acids codons.)

• Unequal distribution requires screening ~8 x more colonies to find rare ones.

Polymerase bias. Taq polymerase favors transitions ~2 x over transversions and mutations at AT ~4 x over mutations at GC. Estimate ~8 x bias.

(screen 4.681710 = 68,000 colonies)

• Screening 26,000 colonies will find only

(26/68)*(1710/5700) = ~11% of ideal number!

(13)

epPCR successes & failures

• Success when many solutions exist.

Increase the stability of a peroxidase for laundry applications.

- Both rational design and epPCR identified Glu239 to eliminate an electrostatic repulsion and Met242 which can be easily oxidized.

- Error prone PCR found three other substitutions, which contribute to stability, but it is not clear why.

• Failure when only a few solutions exist.

Error-prone PCR failed to expand the substrate range of esterases/

lipase to tertiary alcohols likely because the solution requires introducing two adjacent glycine residues in the oxyanion loop.

Cherry et al. (1999) Directed evolution of a fungal peroxidase, Nature Biotechnol., 17, 379-384.

Henke et al. (2002) Activity of lipases and esterases towards tertiary alcohols: insights into

structure-function relationships, Angew. Chem. Intl. Ed., 41, 3211-3213.

(14)

Saturation Mutagenesis

- strategies to encode all amino acids using synthetic oligonucleotides

- predicting the number of colonies that must be screened

(15)

Randomizing synthetic oligonucleotides

!"#$ $%& '&()# *+ ,%-.% $%& /01 -$#&23 -# "()45'-6&4 47"-)8

#+)$%&#-#9 ()4 #&.5)42+ $%& '&$%545258+ 35" -).5":5"($-)8 $%&

#+)$%&$-. 52-85)7.2&5$-4&; <%&#& $,5 -##7&# ,-22 *& 4-#.7##&4

#&:("($&2+9 (2$%578% #5'& -##7&# "(-#&4 *+ 5)& .() *& 4&(2$

,-$% *+ $%& 5$%&" ()4 =-.& =&"#(;

!"# $%&'"#$($ )* +,&-).(/#- )0(1)&230#)'(-#$

<%& =(27& 53 52-85)7.2&5$-4&>*(#&4 '7$(8&)&#-# -# $%($ .5)$"52 5=&" $%& .%&'-#$"+ 53 /01 #+)$%&#-# (225,# .5':2&$& .5)$"52 5=&" $%& 2&=&29 -4&)$-$+ ()4 :5#-$-5) 53 "()45'-6($-5); <%7#9 -3 () 52-85)7.2&5$-4& .() *& #+)$%&#-6&4 (# ( '-?$7"&9 5" -3 ( )7'*&" 53 #+)$%&$-. 52-85)7.2&5$-4&# .() *& '-?&49 $%&) $%-#

.() *& -).5":5"($&4 4-"&.$2+ -)$5 ( .5':2&$& 8&)& #&@7&).&;

<%&"& ("& ( ,-4& "()8& 53 $&.%)-@7&# 3"5' $%& !&24 53 .5'*-)($5"-(2 .%&'-#$"+ $%($ ("& (=(-2(*2& $5 ( .5'*-)($5"-(2

*-5258-#$; A)4&&49 $%& *-5258-#$ %(# () (4=()$(8& 5=&" $%&

.%&'-#$ (# ( '-?$7"& 53 8&)&# .() *& "&(4-2+ #&:("($&4 35"

()(2+#-# *+ $"()#35"'($-5) -)$5 *(.$&"-(2 .&22# ()4 -#52($-5) 53

#-)82& $"()#35"'&4 .525)-&#;

<%& #+)$%&#-# 53 4&8&)&"($& 52-85)7.2&5$-4&# -# ,&22

&#$(*2-#%&4B #+)$%&$-. :"-'&"# -).5":5"($-)8 '-?$7"&# 53 ()+

.5'*-)($-5) 53 $%& 357" )($7"(2 *(#&# ($ ()+ :5#-$-5) .() *&

5"4&"&4 4-"&.$2+ 3"5' '5#$ #7::2-&"#; C7.% :-&.&# 53 #+)$%&$-.

/01 .() *& 7#&4 $5 .5':2&$&2+ "()45'-6& ( #:&.-!. :5#-$-5) ,-$%-) ( 8&)&; <%& #+)$%&#-# 53 D45:&4E 52-85)7.2&5$-4&#9 ,%&"& ( #'(22 :"5:5"$-5) %(=& ( '7$($-5) ($ ( #:&.-!. :5#-$-5) 5" :5#-$-5)#9 -# ( #2-8%$2+ '5"& #:&.-(2-#$ :"5.&##9 *7$

52-85)7.2&5$-4&# 53 $%-# $+:& .() *& 5"4&"&4 3"5' '5#$

#7::2-&"#; <%&#& ("& 7#&4 $5 8&)&"($& 2-*"("-&# ,%&"& $%&

"()45'-6($-5) -# #:"&(4 57$ *7$ #$-22 $("8&$# $%5#& :5#-$-5)#

$%($ ("& 45:&4 -) $%& :"-'&"#; 1)+ #+)$%&$-. :"5.&## ,%&"&

( )7'*&" 53 "&(8&)$# ("& 7#&4 (# '-?$7"&# -# #7#.&:$-*2& $5

*-(# ("-#-)8 3"5' 8"&($&" -).5":5"($-5) 53 5)& "&(8&)$ $%()

()5$%&"; F7()$-$($-=& #$74-&# -)4-.($& $%($ ,%&"& #+)$%&#-#

-# .("&3722+ .5)$"522&4 ()4G5" 7#&# 5:$-'-6&4 "&(8&)$#

H&;8; <"()#8&)5'-.E# DI"&.-#-5) 07.2&5$-4& J-?EK9 $%-# *-(#

-# #'(22 -) #+)$%&$-. /01 2-*"("-&# HLM9LNK; A$ #%5724 *& )5$&4

$%($ $%-# "&2($-=& 2(.O 53 *-(# -# )5$ '(-)$(-)&4 ,%&) $%&#&

2-*"("-&# ("& .25)&49 (2$%578% $%& "&(#5) 35" $%-# -# )5$ .2&("

HLNK;

1)5$%&" *-(# :"5*2&' ("-#&# 47& $5 $%& '-#'($.% *&$,&&)

$%& *(#&>*+>*(#& #+)$%&#-# 53 $%& 52-85)7.2&5$-4& ()4 $%&

$"-:2&$ )($7"& 53 $%& 8&)&$-. .54&; <5 "()45'-6& ( .545) #5

$%($ -$ .() &).54& (22 PM ('-)5 (.-4#9 ( '-?$7"& 53 (22 357"

*(#&# -# "&@7-"&4 ($ $%& !"#$ $,5 :5#-$-5)# ()4 ($ 2&(#$ $%"&&

*(#&# -) $%& $%-"4 :5#-$-5); <%-# -) $7") 2&(4# $5 ( 35"' 53 .545)

*-(# (# $%&"& ("& #-? $-'&# (# '()+ .545)# 35" #5'& ('-)5 (.-4#9 #7.% (# #&"-)&9 $%() 5$%&"# #7.% (# $"+:$5:%() ()4 '&$%-5)-)&; A) (44-$-5)9 $%&"& -# $%& :5$&)$-(2 35" $%&

-)$"547.$-5) 53 #$5: .545)#; <%-# .() *& (=5-4&4 *+ 2-'-$-)8

$%& '-?$7"& 53 *(#&# ($ $%& $%-"4 :5#-$-5) 53 $%& .545) $5 < ()4 Q9 *7$ $%-# '&()# $%($ .545)# 35" ( "()8& 53 ('-)5 (.-4# ,-22 )5$ *& :"&#&)$ HR-8; LK; 1 .5':"5'-#& -# $5 "()45'-6& $%&

.545) ,-$% <9 Q 5" S -) $%& !)(2 :5#-$-5)9 8-=-)8 5)2+ 5)& #$5:

.545) -) &=&"+ TU :"-'&"#9 ()4 &).54-)8 (22 PM ('-)5 (.-4# 5"

00SG< 5" 00SGQ ,%-.% :"5=-4& (22 ('-)5 (.-4# ,-$% #2-8%$2+

'5"& .5''5) #$5: .545)#; 1)5$%&" "&#72$ 53 $%-# 35"' 53 .545) *-(# -# $%($ -$ -# 4-3!.72$ $5 -)#&"$ .545)# 35" ( #7*#&$ 53 ('-)5 (.-4# -3 $%-# -# 4&#-"(*2&;

1 )7'*&" 53 #527$-5)# %(=& *&&) 4&=&25:&4 $5 $%-# :"5*2&';

<%& #-':2&#$ #527$-5) -# $5 #+)$%&#-6& $%& /01 35" &(.%

4&#-"&4 '7$($-5) #&:("($&2+; R5" "&2($-=&2+ #'(22 2-*"("-&# $%&

3(22-)8 .5#$ 53 52-85)7.2&5$-4& #+)$%&#-# '(O&# $%-# :5##-*2&

,-$% $%& #-6& 53 $%& 2-*"("+ 2-'-$&4 *+ $%& #-6& 53 $%& *748&$ ()4 )5$ *+ $&.%)-.(2 .5)#-4&"($-5)#; <%& 52-85)7.2&5$-4&# .() $%&)

&-$%&" *& '-?&4 5" 7#&4 #&:("($&2+ $5 .5)#$"7.$ $%& 8&)&

4(12+# 56 1::"5(.%&# $5 "()45'-6-)8 #+)$%&$-. /01; V?(':2&# #%5, "()45'-6($-5) 53 5)& .545) ,-$% '-?&4 )7.2&5$-4&# H0009 00<GQ9 00SG< 5" 00<G SGQK ()4 ,-$% $"-)7.2&5$-4& :%5#:%5"('-4-$&#; C+)$%&#-# -) (22 $%"&& .(#&# .5''&).&# .5)=&)$-5)(22+ L! 53 $%& "()45'-6&4 .545); 1$ $%& L!>&)4 53 $%&

"()45'-6&4 .545) H7K (22 357" )7.2&5$-4&#9 H8K ( '-?$7"& 53 < ()4 Q9 H9K ( '-?$7"& 53 S ()4 < 5" H:K ( '-?$7"& 53 <9 S ()4 Q .() *& (44&4; A) &(.% .(#& ( '-?$7"& 53 (22 357" )7.2&5$-4&# -# (44&4 ($ &(.% 53 $%& "&'(-)-)8 $,5 :5#-$-5)#; W(=-)8 ( '-?$7"& 53 S ()4 Q ($ $%& L!>&)4 53 $%& .545) ,-22 :"5=-4& LP .545)#9 (22 PM ('-)5 (.-4# ()4 5)& #$5: .545); H;K Q5)=&"#&2+9 $%& .545) .() *& #+)$%&#-6&4 *+ $%& 4-"&.$ (44-$-5) 53 ( '-?$7"& 53 PM $"-)7.2&5$-4& :%5#:%5">

('-4-$&# -) 5)& #$&:; 1X1Y<ZI "&:"&#&)$ PM :"&#+)$%&#-6&4 L>)$ .545)#9 5)& $5 .54& 35" &(.% ('-)5 (.-4;

!"#$ !"#$%&# '#&() *%)%+,#-. /001. 23$4 5/. !34 1

NNK best, need 20 primers

(16)

How many colonies to screen to test each different mutant?

P

_i

= 1− (1− F

_i

)

^T

Pi = probability that sequence i among the transformants (colonies) tested

Fi = frequency at which sequence i is present in the library T = number of transformants (colonies) tested

Exercise 1: Show that screening 146 colonies ensure with a 99% probability that you have tested every mutant in an

NNK library. (K = G or T)

(17)

Rule: Oversample 4.6-fold a for 99% probability

Exercise: Rearrange the equation P i = 1-(1-F i ) ^T to T· F i = -ln(1-P i )

using the approximation ln(1-F i ) ~-F i when F i <<1.

• ^{T· F} ⁱ = number of transformants to screen x frequency of sequence i in library

• ^{For P} ⁱ = 99%, T· F i = -ln(0.01) = 4.6; T = 4.6(1/F i ) must screen 4.6 times more than library size

library size = number of unique DNA sequences = 1/frequency of rarest sequence = 1/

Fi

(18)

Group problems

1. How many colonies must you screen for an NNK saturation mutagenesis at one position for a 90% probability of testing

each mutant?

2. How many colonies did the GSSM experiment require for

the 330 amino acid nitrilase for 90% probability?

(19)

3. If you make an NNN library at one position instead of an

NNK library, how many more colonies will you need to screen for 90% probability of testing each mutant?

4. If you make twenty primers that each code for one amino acid, how many colonies must you screen to have a 90%

probability of testing each mutant?

(20)

Directed evolution: simple idea, complex to put in practice