Directed evolution: simple idea, complex to put in practice
10 PRODUCE A MUTANT SPECTRUM OF SELF-REPRODUCING TEMPLATES
20 SEPARATE AND CLONE INDIVIDUAL MUTANTS 30 AMPLIFY CLONES
40 EXPRESS CLONES
50 TEST FOR OPTIMAL PHENOTYPES 60 IDENTIFY OPTIMAL GENOTYPES
70 RETURN TO 10 WITH A SAMPLE OF OPTIMAL GENOTYPES
M. Eigen, W. Gardiner (1984), Pure Appl. Chem. 56, 967-978.
• All possible mutants >> number of atoms in the universe
• Bacterial transformation rarely yields more than 10 6 colonies.
Impossible to make all mutants
– 5,700 ways to change one amino acid (19*300)
– 16,200,000 ways to change two amino acids (19*300)(19*299)/2 variants
M
= 19
M300!
(300 − M )!M ! for a 300 aa protein where M = number of amino acids that differ
Bosley & Ostermeier (2005) Mathematical expressions useful in the
construction, description and evaluation of protein libraries, Biomol. Eng., 22, 57.
Find one substitution that improves the enzyme
• test all single substitutions (systematic saturation mutagenesis)
(find best mutant, rare improvement, unexpected location)
• test some random single substitutions
(improvements are common; error prone PCR yields only ~10-20% of possibilities)
• focus changes at locations most likely for success
(find acceptable mutant with a minimum of screening)
Systematic saturation mutagenesis
• Systematic complete library of all possible single
substitution mutants: saturation mutagenesis using NNK primers at every position. [a lot of work!]
• Example: increase enantioselectivity of a nitrilase from E
~15 to >100 with Ala190His.
• Error-prone PCR would have missed it. Ala (GCN) to His CAU/C requires 2 or 3 nucleotide substitutions
DeSantis et al. (2003), Creation of a productive, highly enantioselective nitrilase through gene site saturation mutagenesis (GSSM), J. Am. Chem. Soc., 125, 11476-11477.
15N nitrile hydrolysis
NC C
15N
OH
NC COOH
OH
HOOC C
15N
OH
(S), m/z = 130 pseudo-prochiral (R), m/z = 129
Lipitor
®precursor
14N nitrile hydrolysis
Random Mutagenesis:
Error-prone PCR
NOT completely random, NOT complete (16-29% of ideal)
Incomplete library - use when you believe multiple solutions exist.
1. Codon bias 1. Completely random codons (64) do not code for the 20 amino acids equally (e.g., 4 for Gly, 1 for Trp) due to codon bias of genetic code.
2. Error-prone PCR does not yield completely random codons.
a) Polymerase bias. Polymerases favor some nucleotides substitutions over others.
b) Codon bias 2. One substitution in a codon is more likely than two substitutions, which is in turn more likely than three
substitutions.
• Temperature cycle
• 1. denature (break double strands) 95°C, 30 sec
• 2. annealing (bind primer) 55°C, 30 sec
• 3. elongation (synthesize new DNA strand) 72°C, 60 sec
• Each cycle increases [DNA] 2-fold. 2 30
= 10 9
• Uses thermostable DNA polymerase (bacteria from hot spring in Yellowstone)
What happens in the tube
5'
5' 3'
3'
1 Denaturation
5'
5' 3'
3'
+
5'
3' 3'
5'
2 Annealing
5'
3' 5'
3'
3 Elongation
5'
3' 3'
3' 5' 5'
3' 5'
1
5' 3'
3' 5' 3' 5'
3' 5'
2
+ +
&
3
5' 3'
3' 5'
3' 5'
5' 3'
5' 3'
3' 5'
3' 5'
3' 5'
1 , 2 & 3
1 , 2 & 3
Exponential growth of short product
• PCR animation
www.dnalc.org/ddnalc/resources/pcr.html
Error-prone PCR ingredients
• Diversify PCR Random Mutagenesis (Clontech) Taq polymerase, control of mutation rate by Mn
2+and dGTP concentration
• GeneMorph (Stratagene): control mutation rate by amount of
template (less template, more copying, which leads to more errors)
Caldwell & Joyce (1994) Mutagenic PCR, PCR Methods Appl., 3, S136.
2
DNA Shuffling
Parent sequence
DNase treatment
Denaturization
PCR w/o primers
Shuffled produkt Random fragments
Extension via polymerase
Error Prone PCR
" Analysis of large protein regions
" PCR conditions enhance mismatches
" Saturating mutagenesis virtually impossible
" Combination of mutations
Error Prone PCR
Taq
Mg2+
dCTP, dTTP dGTP, dATP
lacks 3E 5E exonuclease activity posseses intrinsic error rate
Error Prone PCR
Taq
Mg2+
dCTP, dTTP dGTP, dATP
lacks 3E 5E exonuclease activity posseses intrinsic error rate
promotes misincorporation
Error Prone PCR
Taq
Mg2+
dCTP, dTTP dGTP, dATP
lacks 3E 5E exonuclease activity posseses intrinsic error rate
stabilizes non-complementary base pairs
promotes misincorporation
Error Prone PCR
Mn2+
Taq
Mg2+
dCTP, dTTP dGTP, dATP
lacks 3E 5E exonuclease activity posseses intrinsic error rate
stabilizes non-complementary base pairs
promotes misincorporation
reduces the specificity of the polymerase
low annealing temperature
template amount, cycle number
Codon bias 1: 64 codons translate to 20 amino acids unequally
• Four codons yield Gly, but only two yield Phe and only one yields Trp.
• Completely random
DNA codons favor some
amino acids over others.
Types of nucleotide substitutions
• Transition (purine to purine; A, G) or pyrimidine to pyrimidine; C, T)
• Tranversions (purine to pyrimidine or pyrimidine to purine)
Exercise: Write out all the possibilities to show that random muta- genesis should yield twice as many transversions as transitions.
N N N
N
NH2
NH N N
N
O
NH2
N N NH2
O
NH N O
O
R R
R R
A
C T
G
Polymerases biased for some substitutions over others
• Both Taq and Mutazyme I favor transitions over transversions (Ts/Tv > 0.5)
• Taq replaces more AT with GC than vice versa (GC content of DNA increases). Mutazyme I does the reverse.
• Taq makes ~4x more mutations at A and T than at G and C.
Mutazyme I does the reverse.
• Mutazyme II is a mixture of Taq and Mutazyme I which minimizes mutational bias.
8 GeneMorph II EZClone Domain Mutagenesis Kit
TABLE II
Mutational Spectra of Mutazyme and Taq DNA Polymerases
Type(s) of mutations
Mutazyme II DNA polymerasea
Mutazyme I
DNA polymerasea
Taq DNA polymerase (Reference 5) b
Bias Indicators
Ts/Tv 0.9 1.2 0.8
AT GC/GC AT 0.6 0.2 1.9
A N, T N 50.7% 25.6% 75.9%
G N, C N 43.8% 72.5% 19.6%
Transitions
A G, T C 17.5% 10.3% 27.6%
G A, C T 25.5% 43.7% 13.6%
Transversions
A T, T A 28.5% 11.1% 40.9%
A C, T G 4.7% 4.2% 7.3%
G C, C G 4.1% 8.8% 1.4%
G T, C A 14.1% 20.0% 4.5%
Insertions and Deletions
Insertions 0.7% 0.8% 0.3%
Deletions 4.8% 1.1% 4.2%
Mutation Frequency
Mutations/kb (per PCR)c 3P16 (per PCR) <1 to 7 (per PCR) 4.9 (per PCR)
a The Mutazyme DNA polymerases were used with the corresponding GeneMorph random mutagenesis kits.
b The Taq DNA polymerase was used with Mn2+-containing buffer and unbalanced dNTP concentrations, which are mutagenic conditions for Taq DNA polymerase.
c Initial target amounts of 16 pg to 1 g (Mutazyme II DNA polymerase), 1 pg to 100 ng (Mutazyme I DNA polymerase), and 0.01 nM template (Taq DNA polymerase) were used to generate data.
As shown in Table II, error-prone enzymes generally favor transitions over transversions, as shown by Ts/Tv ratios greater than 0.5, with Mutazyme II and Taq exhibiting a somewhat higher tendency to create transversions over transitions and Mutazyme I exhibiting a greater tendency for introducing transitions over transversions. Examining transition mutation frequencies shows that Mutazyme II produces AT GC and GC AT mutations with similar rates (AT GC/GC AT ratio = 0.6), while Mutazyme I is 4 times more likely to generate GC AT transitions over AT GC transitions, and Taq is 2 times more likely to introduce AT GC transitions over GC AT transitions. In addition, Mutazyme II DNA polymerase introduces mutations at A’s and T’s only slightly more frequently than G’s and C’s. In contrast, Mutazyme I is nearly 3 times more likely to mutate G’s and C’s, while Taq under error-prone conditions is 4 times more likely to mutate A’s and T’s than G’s and C’s.
0.5 1.0 50%
50%
Codon bias 2: Some aa substitutions require 1 nucleotide change; others 2.
G G A Gly
mutation at 1 st position
C G A Arg
A G A stop
T G A Arg
• Single nucleotide change at GGA codon (Gly) yields not 9, but only 4 amino acid substitutions.
• Average: 5.7 amino acids accessible by a single nucleotide change. Two nucleotide changes are much less likely.
mutation at 3 rd position mutation at
2 nd position
G C A Ala
G A A Glu
G T A Val
G G G Gly
G G C Gly
G G T Gly
Expected result of epPCR
• Ideal: 19*300 = 5700 protein variants
Assume 19 condon substitution at each codon (not three nucleotides randomly): 19*300 = 5700 at DNA level;
screen 4.6*5700 = 26,200 colonies)
• Codon bias. Only 5.7 amino acids accessible by a
single nucleotide substitution. 5.7*300 = 1710 (29%)
(This value also accounts for synonymous amino acids codons.)
• Unequal distribution requires screening ~8 x more colonies to find rare ones.
Polymerase bias. Taq polymerase favors transitions ~2 x over transversions and mutations at AT ~4 x over mutations at GC. Estimate ~8 x bias.
(screen 4.6*8*1710 = 68,000 colonies)
• Screening 26,000 colonies will find only
(26/68)*(1710/5700) = ~11% of ideal number!
epPCR successes & failures
• Success when many solutions exist.
Increase the stability of a peroxidase for laundry applications.
- Both rational design and epPCR identified Glu239 to eliminate an electrostatic repulsion and Met242 which can be easily oxidized.
- Error prone PCR found three other substitutions, which contribute to stability, but it is not clear why.
• Failure when only a few solutions exist.
Error-prone PCR failed to expand the substrate range of esterases/
lipase to tertiary alcohols likely because the solution requires introducing two adjacent glycine residues in the oxyanion loop.
Cherry et al. (1999) Directed evolution of a fungal peroxidase, Nature Biotechnol., 17, 379-384.
Henke et al. (2002) Activity of lipases and esterases towards tertiary alcohols: insights into
structure-function relationships, Angew. Chem. Intl. Ed., 41, 3211-3213.
Saturation Mutagenesis
- strategies to encode all amino acids using synthetic oligonucleotides
- predicting the number of colonies that must be screened
Randomizing synthetic oligonucleotides
!"#$ $%& '&()# *+ ,%-.% $%& /01 -$#&23 -# "()45'-6&4 47"-)8
#+)$%&#-#9 ()4 #&.5)42+ $%& '&$%545258+ 35" -).5":5"($-)8 $%&
#+)$%&$-. 52-85)7.2&5$-4&; <%&#& $,5 -##7&# ,-22 *& 4-#.7##&4
#&:("($&2+9 (2$%578% #5'& -##7&# "(-#&4 *+ 5)& .() *& 4&(2$
,-$% *+ $%& 5$%&" ()4 =-.& =&"#(;
!"# $%&'"#$($ )* +,&-).(/#- )0(1)&230#)'(-#$
<%& =(27& 53 52-85)7.2&5$-4&>*(#&4 '7$(8&)&#-# -# $%($ .5)$"52 5=&" $%& .%&'-#$"+ 53 /01 #+)$%&#-# (225,# .5':2&$& .5)$"52 5=&" $%& 2&=&29 -4&)$-$+ ()4 :5#-$-5) 53 "()45'-6($-5); <%7#9 -3 () 52-85)7.2&5$-4& .() *& #+)$%&#-6&4 (# ( '-?$7"&9 5" -3 ( )7'*&" 53 #+)$%&$-. 52-85)7.2&5$-4&# .() *& '-?&49 $%&) $%-#
.() *& -).5":5"($&4 4-"&.$2+ -)$5 ( .5':2&$& 8&)& #&@7&).&;
<%&"& ("& ( ,-4& "()8& 53 $&.%)-@7&# 3"5' $%& !&24 53 .5'*-)($5"-(2 .%&'-#$"+ $%($ ("& (=(-2(*2& $5 ( .5'*-)($5"-(2
*-5258-#$; A)4&&49 $%& *-5258-#$ %(# () (4=()$(8& 5=&" $%&
.%&'-#$ (# ( '-?$7"& 53 8&)&# .() *& "&(4-2+ #&:("($&4 35"
()(2+#-# *+ $"()#35"'($-5) -)$5 *(.$&"-(2 .&22# ()4 -#52($-5) 53
#-)82& $"()#35"'&4 .525)-&#;
<%& #+)$%&#-# 53 4&8&)&"($& 52-85)7.2&5$-4&# -# ,&22
&#$(*2-#%&4B #+)$%&$-. :"-'&"# -).5":5"($-)8 '-?$7"&# 53 ()+
.5'*-)($-5) 53 $%& 357" )($7"(2 *(#&# ($ ()+ :5#-$-5) .() *&
5"4&"&4 4-"&.$2+ 3"5' '5#$ #7::2-&"#; C7.% :-&.&# 53 #+)$%&$-.
/01 .() *& 7#&4 $5 .5':2&$&2+ "()45'-6& ( #:&.-!. :5#-$-5) ,-$%-) ( 8&)&; <%& #+)$%&#-# 53 D45:&4E 52-85)7.2&5$-4	 ,%&"& ( #'(22 :"5:5"$-5) %(=& ( '7$($-5) ($ ( #:&.-!. :5#-$-5) 5" :5#-$-5)#9 -# ( #2-8%$2+ '5"& #:&.-(2-#$ :"5.&##9 *7$
52-85)7.2&5$-4&# 53 $%-# $+:& .() *& 5"4&"&4 3"5' '5#$
#7::2-&"#; <%&#& ("& 7#&4 $5 8&)&"($& 2-*"("-&# ,%&"& $%&
"()45'-6($-5) -# #:"&(4 57$ *7$ #$-22 $("8&$# $%5#& :5#-$-5)#
$%($ ("& 45:&4 -) $%& :"-'&"#; 1)+ #+)$%&$-. :"5.&## ,%&"&
( )7'*&" 53 "&(8&)$# ("& 7#&4 (# '-?$7"&# -# #7#.&:$-*2& $5
*-(# ("-#-)8 3"5' 8"&($&" -).5":5"($-5) 53 5)& "&(8&)$ $%()
()5$%&"; F7()$-$($-=& #$74-&# -)4-.($& $%($ ,%&"& #+)$%&#-#
-# .("&3722+ .5)$"522&4 ()4G5" 7#&# 5:$-'-6&4 "&(8&)$#
H&;8; <"()#8&)5'-.E# DI"&.-#-5) 07.2&5$-4& J-?EK9 $%-# *-(#
-# #'(22 -) #+)$%&$-. /01 2-*"("-&# HLM9LNK; A$ #%5724 *& )5$&4
$%($ $%-# "&2($-=& 2(.O 53 *-(# -# )5$ '(-)$(-)&4 ,%&) $%&#&
2-*"("-&# ("& .25)&49 (2$%578% $%& "&(#5) 35" $%-# -# )5$ .2&("
HLNK;
1)5$%&" *-(# :"5*2&' ("-#&# 47& $5 $%& '-#'($.% *&$,&&)
$%& *(#&>*+>*(#& #+)$%&#-# 53 $%& 52-85)7.2&5$-4& ()4 $%&
$"-:2&$ )($7"& 53 $%& 8&)&$-. .54&; <5 "()45'-6& ( .545) #5
$%($ -$ .() &).54& (22 PM ('-)5 (.-4#9 ( '-?$7"& 53 (22 357"
*(#&# -# "&@7-"&4 ($ $%& !"#$ $,5 :5#-$-5)# ()4 ($ 2&(#$ $%"&&
*(#&# -) $%& $%-"4 :5#-$-5); <%-# -) $7") 2&(4# $5 ( 35"' 53 .545)
*-(# (# $%&"& ("& #-? $-'&# (# '()+ .545)# 35" #5'& ('-)5 (.-4#9 #7.% (# #&"-)&9 $%() 5$%&"# #7.% (# $"+:$5:%() ()4 '&$%-5)-)&; A) (44-$-5)9 $%&"& -# $%& :5$&)$-(2 35" $%&
-)$"547.$-5) 53 #$5: .545)#; <%-# .() *& (=5-4&4 *+ 2-'-$-)8
$%& '-?$7"& 53 *(#&# ($ $%& $%-"4 :5#-$-5) 53 $%& .545) $5 < ()4 Q9 *7$ $%-# '&()# $%($ .545)# 35" ( "()8& 53 ('-)5 (.-4# ,-22 )5$ *& :"&#&)$ HR-8; LK; 1 .5':"5'-#& -# $5 "()45'-6& $%&
.545) ,-$% <9 Q 5" S -) $%& !)(2 :5#-$-5)9 8-=-)8 5)2+ 5)& #$5:
.545) -) &=&"+ TU :"-'&"#9 ()4 &).54-)8 (22 PM ('-)5 (.-4# 5"
00SG< 5" 00SGQ ,%-.% :"5=-4& (22 ('-)5 (.-4# ,-$% #2-8%$2+
'5"& .5''5) #$5: .545)#; 1)5$%&" "H$ 53 $%-# 35"' 53 .545) *-(# -# $%($ -$ -# 4-3!.72$ $5 -)#&"$ .545)# 35" ( #7*#&$ 53 ('-)5 (.-4# -3 $%-# -# 4&#-"(*2&;
1 )7'*&" 53 #527$-5)# %(=& *&&) 4&=&25:&4 $5 $%-# :"5*2&';
<%& #-':2&#$ #527$-5) -# $5 #+)$%&#-6& $%& /01 35" &(.%
4&#-"&4 '7$($-5) #&:("($&2+; R5" "&2($-=&2+ #'(22 2-*"("-&# $%&
3(22-)8 .5#$ 53 52-85)7.2&5$-4& #+)$%&#-# '(O&# $%-# :5##-*2&
,-$% $%& #-6& 53 $%& 2-*"("+ 2-'-$&4 *+ $%& #-6& 53 $%& *748&$ ()4 )5$ *+ $&.%)-.(2 .5)#-4&"($-5)#; <%& 52-85)7.2&5$-4&# .() $%&)
&-$%&" *& '-?&4 5" 7#&4 #&:("($&2+ $5 .5)#$"7.$ $%& 8&)&
4(12+# 56 1::"5(.%&# $5 "()45'-6-)8 #+)$%&$-. /01; V?(':2&# #%5, "()45'-6($-5) 53 5)& .545) ,-$% '-?&4 )7.2&5$-4&# H0009 00<GQ9 00SG< 5" 00<G SGQK ()4 ,-$% $"-)7.2&5$-4& :%5#:%5"('-4-$&#; C+)$%&#-# -) (22 $%"&& .(#&# .5''&).&# .5)=&)$-5)(22+ L! 53 $%& "()45'-6&4 .545); 1$ $%& L!>&)4 53 $%&
"()45'-6&4 .545) H7K (22 357" )7.2&5$-4	 H8K ( '-?$7"& 53 < ()4 Q9 H9K ( '-?$7"& 53 S ()4 < 5" H:K ( '-?$7"& 53 <9 S ()4 Q .() *& (44&4; A) &(.% .(#& ( '-?$7"& 53 (22 357" )7.2&5$-4&# -# (44&4 ($ &(.% 53 $%& "&'(-)-)8 $,5 :5#-$-5)#; W(=-)8 ( '-?$7"& 53 S ()4 Q ($ $%& L!>&)4 53 $%& .545) ,-22 :"5=-4& LP .545)#9 (22 PM ('-)5 (.-4# ()4 5)& #$5: .545); H;K Q5)=&"#&2+9 $%& .545) .() *& #+)$%&#-6&4 *+ $%& 4-"&.$ (44-$-5) 53 ( '-?$7"& 53 PM $"-)7.2&5$-4& :%5#:%5">
('-4-$&# -) 5)& #$&:; 1X1Y<ZI "&:"&#&)$ PM :"&#+)$%&#-6&4 L>)$ .545)#9 5)& $5 .54& 35" &(.% ('-)5 (.-4;
!"#$ !"#$%&# '#&() *%)%+,#-. /001. 23$4 5/. !34 1