저작자표시-비영리-변경금지 2.0 대한민국 이용자는 아래의 조건을 따르는 경우에 한하여 자유롭게 l 이 저작물을 복제, 배포, 전송, 전시, 공연 및 방송할 수 있습니다. 다음과 같은 조건을 따라야 합니다: l 귀하는, 이 저작물의 재이용이나 배포의 경우, 이 저작물에 적용된 이용허락조건 을 명확하게 나타내어야 합니다. l 저작권자로부터 별도의 허가를 받으면 이러한 조건들은 적용되지 않습니다. 저작권법에 따른 이용자의 권리는 위의 내용에 의하여 영향을 받지 않습니다. 이것은 이용허락규약(Legal Code)을 이해하기 쉽게 요약한 것입니다. Disclaimer 저작자표시. 귀하는 원저작자를 표시하여야 합니다. 비영리. 귀하는 이 저작물을 영리 목적으로 이용할 수 없습니다. 변경금지. 귀하는 이 저작물을 개작, 변형 또는 가공할 수 없습니다.
1
Comparative analysis of the complete
chloroplast genomes and 45S nrDNAs of
six Lonicera species
SHIN-JAE KANG
DEPARTMENT OF PLANT SCIENCE
THE GRADUATE SCHOOL OF SEOUL NATIONAL UNIVERSITY
ABSTRACT
The genus Lonicera belongs to the family Caprifoliaceae and comprises approximately 210 species distributed in East Asia. Many Lonicera species such as L. japonica and L. maackii are ornamental shrub plants, and used as herbal medicines. Despite their usefulness, its genetic, genomics and molecular phylogenetics are rarely reported. Here, we collected six Lonicera species from Medicinal Plant
2
Garden, College of Pharmacy, Seoul National University, and produced 2.7 - 4.1 Gbp of whole genome sequencing (WGS) data. We obtained complete sequences of chloroplast genome and 45S nuclear ribosomal DNA (45S nrDNA) sequences using de novo assembly of Low-Coverage Whole genome sequence (dnaLCW). The chloroplast genome of the six Lonicera species ranged from 154,892 to 155,318 bp and showed high similarities (97.4%) each other. There were 114 genes in L. insularis, L. sachalinensis, L. praeflorens and L. maackii, 113 in L. vesicaria, and 112 in L. japonica. Comparative analysis of chloroplast genome and 45S nrDNA sequences revealed 17~2,261 single nucleotide polymorphisms (SNPs) and 5~278 insertion and deletions (InDels) between species in chloroplast, and a total of 45 SNPs and 4 InDels in 45S nrDNA. Furthermore, 266 large repetitive sequences and 288 simple sequence repeats (SSRs) were detected among six chloroplast genomes. In addition, we found several chloroplast protein-coding genes that showed high Ka/Ks values or highly conserved among six Lonicera species. Four genes, psaJ, rbcL, rps18 and ycf2 genes that showed high Ka/Ks values, might be positively selected in genus Lonicera. On the other hand, four genes in large single copy (LSC), psbI, psbZ, psbL and petZ genes, were highly conserved in six Lonicera species. Estimation of divergence time and phylogenetic relationship of six Lonicera species revealed that L. japonica was diverged first from the common ancestor of six Lonicera species (9.19-10.74 MYA), and L. insularis and L. sachalinensis were recently diverged (0-0.03 MYA). Phylogenetic trees based on chloroplast genomes and 45S nrDNA sequences showed a similar topology that L. insularis and L. sachalinensis were the closest, and they consist a clade with L. maackii. Moreover, phylogenetic analysis with related species in Dipsacales revealed that Lonicera species clustered with genera Patrinia and Kolkwitzia of the same family Caprifoliaceae, as expected. A total of seven molecular markers were developed from polymorphic sites such as SNPs, InDels and copy number variation (CNV) of
3
tandem repeat (TR) in chloroplast genome. We could successfully discriminate six Lonicera species using these developed markers. The chloroplast genome and 45S nrDNA sequences of six Lonicera species along with DNA markers produced in this study will provide valuable information for further genetic diversity studies and authentication of six Lonicera species.
Keywords : Lonicera, Lonicera insularis, Lonicera sachalinensis, Lonicera
praeflorens, Lonicera maackii, Lonicera vesicaria,Lonicera japonica, chloroplast genome, 45S nrDNA, dnaLCW
IV
CONTENTS
ABSTRACT ... I CONTENTS ... IV LIST OF TABLES ... VI LIST OF FIGURES ... VII LIST OF ABBREVIATIONS ... VIII
INTRODUCTION ... 1
MATERIALS AND METHODS... 4
1. Plant materials ... 4
2.DNA extraction and Whole-Genome shotgun sequencing ... 4
3. Chloroplast genome and 45S nrDNA assembly ... 5
4. Gene and structure annotation ... 5
5.Comparative analysis and Characterization of simple sequence repeats and large sequence repeats ... 6
6.Development and validation of molecular markers ... 6
7. Estimation of divergence time and phylogenetic analysis ... 7
RESULTS... 8
1. Complete chloroplast genome and 45S nrDNA sequences of six Lonicera species ... 8
2.Comparative analysis of chloroplast genomes of six Lonicera species15 3. Characterization of SSRs and repetitive sequences among six Lonicera species ... 21
4. Sequence variations of 45S nrDNA sequences of six Lonicera species30 5. Divergence time estimation and phylogenetic relationship among six Lonicera species ... 35
6. Phylogenetic relationship within Dipsacales ... 42
V
DISCUSSION ... 49
1. Complete chloroplast genome and 45S nrDNA sequences of six Lonciera species derived from low-coverage whole-genome NGS data ... 49 2. Comparative analysis of chloroplast genome and 45S nrDNA sequences
among six Lonicera species ... 50 3. Repetitive sequences in Lonicera chloroplast genomes ... 51 4. Divergence and phylogenetic analysis based on chloroplast genome and 45S nrDNA sequences of the Lonicera species ... 52 5. Development of molecular markers for authentication of six Lonicera species ... 53
REFERENCES ... 54 ABSTRACT IN KOREAN ... 59
VI
LIST OF TABLES
Table 1.
Statistics of WGS and assembly summary for six Lonicera
species ... 10
Table 2.
Summary of SNPs and InDels found in chloroplast genomes
among the six Lonicera species ... 17
Table 3.
SSRs comparison in chloroplast genomes of six Lonicera species
... 23
Table 4.
CNVs of TR units in chloroplast genomes among six Lonicera
species ... 27
Table 5.
Summary of SNPs and InDels found in 45S nrDNA sequences
among the six Lonicera species ... 31
Table 6.
Summary of SNPs and InDels found between three hetero types
of L. insularis 45S nrDNA sequences ... 32
Table 7.
Summary of SNPs and InDels found between three hetero types
of L. sachalinensis 45S nrDNA sequences ... 33
Table 8.
Summary of SNPs and InDels found between three hetero types
of L. maackii 45S nrDNA sequences ... 34
Table 9.
Mean Ks values and estimated divergence time of six Lonicera
species ... 37
Table 10. Median Ks values and estimated divergence time of six Lonicera
species ... 38
Table 11. Information of developed molecular markers in this study for six
Lonicera discrimination ... 45
Table 12. Marker combinations for each Lonicera species ... 48
VII
LIST OF FIGURES
Figure 1. Complete chloroplast genome map of six Lonicera species12
Figure 2. Complete 45S nrDNA sequence assembly of six Lonicera species
and polymorphic sites for each species ... 13
Figure 3. Comparison of chloroplast genome sequences of six Lonicera
species using mVISTA program with L. insularis as a reference
... 19
Figure 4. Comparison of the border positions of LSC, SSC and IR regions
across six Lonicera plastid genomes ... 20
Figure 5. Repeat structure analysis in six Lonicera chloroplast genomes
... 26
Figure 6. Summary of Ka and Ks values among the 77 conserved
protein-coding genes in six Lonicera species ... 39
Figure 7. Ka and Ks values of candidate genes involved in positive
selection between six Lonicera species ... 40
Figure 8. Phylogenetic tree and divergence time of six Lonicera species
... 41
Figure 9. Phylogenetic analysis of Lonicera species with related species in
Dipsacales ... 43
Figure 10. Validation of seven developed molecular markers from InDel and
VIII
LIST OF ABBREVIATIONS
45S nrDNA
45S nuclear ribosomal DNA
CNV
Copy number variation
CTAB
Cetyltrimethylammonium bromide
IGS
Intergenic spacer
InDel
Insertions or Deletions
IR
Inverted repeat
ITS
Internal transcribed spacer
LSC
Large single copy
NGS
Next generation sequencing
PE
Paired-end
rRNA
ribosomal RNA
SSC
Small single copy
SNP
Single nucleotide polymorphism
TR
Tandem repeat
WGS
Whole genome sequence
dnaLCW
de novo assembly of Low-Coverage Whole
genome sequence
SSR
Simple sequence repeat
TRF
Tandem Repeats Finder
1
INTRODUCTION
Lonicera is the largest genus in Caprifoliaceae and separated into two subgenera, Caprifolium and Lonicera. It comprises approximately 210 species that are mainly distributed in East Asia. Among them, about 100 species are in China, 25 in Japan and 30 in Korea. Many Lonicera species have been widely used as herbal medicines and ornamental shrub plants, and contain loniceroside which is a triterpenoid saponin, and known for anti-inflammatory effects (Son et al. 1994; Liu et al. 2012).
For example, L. japonica, called golden-and-silver honeysuckle or Japanese honeysuckle, has been widely used in traditional herbal medicine (Peng et al. 2000), and its flower bud also has been prescribed to treat some infectious diseases due to its anti-inflammatory and antiviral effects (Chang and Hsu 1992). These effects come from many active compounds identified from the stems and leaves of L. japonica (Shang et al. 2011).
L. insularis and L. vesicaria are Korean endemic ornamental shrubs, and their flower color changes from white to yellow. Previous studies have reported that a new compound, argininosecologanin, was identified from the roots of L. insularis (Kang et al. 2018), and L. vesicaria contains many antioxidant compounds such as anthocyanin and flavonoids (Lee et al. 2016).
L. maackii is a woody perennial shrub which grow up to 5 m in height and sprout earlier in spring. It is native to Northeastern Asia and has been widely used for ornamental purpose. In the late 1800s, L. maackii was imported to the Eastern United States (Forman 2011). However, L. maackii has been treated as an invasive plant due to its allelochemical that can suppress seed germination of other plants (Bauer et al. 2012).
Through cytogenetic works, diverse ploidy distribution were reported in Lonicera species. Most Lonicera species have conserved chromosome number of x = 9, and 2n = 18 to 54 (Ammal and Saunders 1952; Chen et al. 2017).
2
Chloroplast is a plant-specific organelle located in a cell and conducts photosynthesis and carbon fixation. In most higher plants, the chloroplast genome is a double stranded circular DNA ranging from 120 to 217 kb, and exists in a high copy number. Chloroplast genomes are generally highly conserved, and have quadripartite structure with one large single copy (LSC) region, one small single copy (SSC) region and two inverted repeat blocks (IRs) (Palmer 1985).
45S nuclear ribosomal DNA (45S nrDNA) units are located in nucleolus organizer region (Goffinet et al. 2005) and have many copies in tandem repeats (Huang et al. 2017). A 45S nrDNA unit is composed of 18S, 5.8S and 26S transcribed subunits and separated by two internally transcribed spacers (ITS) regions, ITS1 and ITS2.
Chloroplast genome and 45S nrDNA sequences are useful target for genetic diversity study and phylogenetic analysis due to many polymorphisms in inter-species level, but there are also few polymorphisms in intra-inter-species level (Kim et al. 2015). In addition, molecular markers derived from chloroplast genome can be efficient tool for identifying plant species because of highly conserved genes and characteristic of maternal inheritance (Kim et al. 2015). Many studies for molecular marker and phylogenetic analysis were conducted based on intergenic spacer (IGS) regions in chloroplast genome and ITS regions in 45S nrDNA sequence (Kim Y-D and Kim 1999; Theis et al. 2008; Jeong et al. 2014). Furthermore, chloroplast genomes have been used for elucidating history of plant evolution owing to characteristic of low rate of nucleotide substitution (Wolfe et al. 1989; Wilson et al. 1990).
Although some divergence and phylogenetic relationship studies of Lonicera species using molecular markers derived from a few chloroplast and nuclear DNA sequences have been reported (Theis et al. 2008; Smith 2009; NAKAJI et al. 2015),
3
genetic diversity and taxonomical classification study about Lonicera species are still limited.
Since the emergence of next-generation sequencing technologies with rapid development, more than 1500 complete chloroplast genomes are available in Genbank (https://www.ncbi.nlm.nih.gov/genbank/), and of these, so far, only one chloroplast genome sequence in Lonicera genus have been reported (He et al. 2017). On this account, this study was conducted to generate the complete chloroplast genomes and 45S nrDNA sequences of six Lonicera species, L. insularis, L. sachalinensis, L. praeflorens, L. maackii, L. vesicaria and L. japonica. Through comparative analyses, I present the genetic diversity of six Lonicera species based on chloroplast genome and 45S nrDNA sequences, and developed molecular markers based on InDels and SNPs in the chloroplast genomes for the identification of each six Lonicera species.
4
MATERIALS AND METHODS
1. Plant materials
The six Lonicera species were provided by Medicinal Plant Garden, College of Pharmacy, Seoul National University, Koyang, Korea.
2. DNA extraction and Whole-Genome shotgun sequencing
Individual leaves and roots of each species were stored at -70℃ until use. Leaves or roots were ground using a mortar and pestle with liquid nitrogen, and then total genomic DNA was extracted using a modified cetyltrimethylammonium bromide (CTAB) protocol (Allen et al. 2006). The quality and concentration of extracted DNA was measured by agarose gel electrophoresis and UV-spectrophotometer (Thermo Scientific Nanodrop ND-1000), respectively. Paired-end (PE) libraries were sequenced using Illumina Miseq genome analyzer by Lab Genomics Inc., Seongnam, Korea according to the standard protocol provided by the manufacturer. Whole genome shotgun (WGS) sequencing data of 2.7 - 4.1 Gbp were generated from six Lonicera species.
5
3. Chloroplast genome and 45S nrDNA assembly
Chloroplast genomes and 45S nrDNA units were assembled by de novo assembly of Low-Coverage Whole genome sequence (dnaLCW) method using CLC genome assembler program (ver. 4.06 beta, CLC Inc, Rarhus, Denmark) (Kim K et al. 2015) and manual curation. In summary, raw PE reads were trimmed with offset value of 33, and trimmed reads were assembled with overlap distance set ranging 150 to 500 bp, window size set of 32 for chloroplast and 64 for 45S nrDNA. The initial contigs were extracted from assembled reads using MUMmer by mapping to reference chloroplast sequence, KJ170923. The extracted contigs were arranged and merged into a single draft sequence by comparison with reference sequence, KJ170923.
4. Gene and structure annotation
The genes in chloroplast genome were annotated using DOGMA program (http://dogma.ccbb.utexas.edu/) (Wyman et al. 2004) and GeSeq (https://chlorobox.mpimp-golm.mpg.de/geseq.html) (Tillich et al. 2017), and then manually curated based on BLAST searches. The chloroplast circular maps were generated using OGDRAW (http://ogdraw.mpimp-golm.mpg.de/) program (Lohse et al. 2007). The structure of 45S nrDNA unit was predicted by RNAmmer (http://www.cbs.dtu.dk/services/RNAmmer/) (Lagesen et al. 2007).
6
5. Comparative analysis and Characterization of simple sequence
repeats and large sequence repeats
Comparative analysis of complete chloroplast genomes and 45S nrDNA sequences among six Lonicera species were conducted using in-house script, MAFFT (Katoh and Standley 2013) and mVISTA program to identify the inter-species polymorphism.
Simple sequence repeats (SSRs) were identified using microsatellite search module, MISA (http://pgrc.ipk-gatersleben.de/misa/) (Thiel et al. 2003) with thresholds of ten repeat units for mononucleotides SSRs, five repeat units for di-, four repeat units for tri- and three repeat units for tetra-, penta- and hexanucleotides SSRs.
Repeats including tandem, dispersed, complementary, palindromic repeats were investigated using REPuter (Kurtz et al. 2001) and Tandem Repeats Finder (TRF) (Benson 1999)program with parameter setting of minimum repeat size of 10 bp, and the identity of repeats ≥80% for REPuter, and parameter of 2, 7 and 7 for match, mismatch, and InDel for TRF. All identified repeats were manually curated.
6. Development and validation of molecular markers
To validate inter-species polymorphism derived from chloroplast genomes, and authenticate the six Lonicera species, co-dominant and dominant markers were designed based on polymorphic regions such as InDels and SNPs using Primer-Blast program (Ye et al. 2012). The PCR amplification was performed as follows: 7 minutes at 94℃, 35 cycles of 94℃ for 20 ~ 30 sec, 58 ~ 64℃ for 20 ~ 30 sec, 72℃ for 20 ~ 30 sec; and final extension at 72℃ for 7 min. The PCR products were then separated by agarose gel to identify polymorphisms.
7
7. Estimation of divergence time and phylogenetic analysis
To elucidate the phylogenetic relationship among six Lonicera species, we analyzed not only the complete chloroplast genomes, but also the complete 45S nrDNA sequences of six Lonicera species.
Together, phylogenetic analysis with relative species in Dipsacales was investigated using complete chloroplast genomes. Additional complete chloroplast genome sequences of eight species (L. japonica Chinese, Patrinia saniculifolia, Kolkwitzia amabilis, Viburnum utile, Sambucus williamsii, Sinadoxa corydalifolia, Adoxa moschatellina, Tetradoxa omeiensis) were provided from Genbank and used for analysis. All phylogenetic trees were generated by neighbor-joining method with 1000 bootstrap values in MEGA6.0 program (Tamura et al. 2013).
Divergence time was calculated based on Ks value. Ka and Ks values are the rates of non-synonymous and synonymous substitution per site, respectively. 77 protein-coding genes conserved in the chloroplast genomes of six Lonicera species were extracted and concatenated. Mean and median values of Ka and Ks were calculated using PAML program by pair-wise comparison of shared protein-coding genes of six species. Divergence time was represented by T= Ks/2λ, where λ is 1.0 x 10-9 indicating substitution rate per site per year.
8
RESULTS
1. Complete chloroplast genome and 45S nrDNA sequences of six
Lonicera species
Whole genome sequencing data of six Lonicera species were generated by Illumina Miseq platform, and ranged from 2.7 to 4.1 Gbp for each species. Complete chloroplast genome and 45S nrDNA sequences of six Lonicera species were obtained by dnaLCW method (Table 1).
The complete chloroplast genomes were assembled by combining the primary chloroplast genome sequence contigs from WGS data for each of the species. The complete chloroplast genomes of six Lonicera species in single molecule were successfully obtained by combining three to four initial contigs, and manually curated. The complete chloroplast genome sequences of the six Lonicera species ranged from 154,892 to 155,318 bp in length, and they showed typical quadripartite structure with the large single copy (LSC), small single copy (SSC) and a pair of inverted repeat (IRa and IRb) regions (Figure 1). The length of the LSC regions ranged from 88,229 to 89,202 bp, and the SSC regions ranged from 18,612 to 18,929 bp, and two inverted repeat regions ranged from 23,718 to 24,060 bp. The average coverage of raw reads for chloroplast genome assembly ranged from 134.0× to 784.0×.
The 45S nrDNA sequence comprised 18S, 5.8S and 26S gene clusters with internal transcribed spacer sequences (ITS1 and ITS2) between genes, and an intergenic spacer region (IGS). The IGS region could not be assembled in this study, due to large gaps at G-C rich regions, as previously reported (Kim et al. 2015). The complete 45S nrDNA of each species consisted of one or two contigs. The length of the 45S nrDNA sequences of six Lonicera species ranged from 5,832 to 5,836 bp. L. vesicaria and L. japonica had only one type of 45S nrDNA sequence, whereas
9
heterotypes of 45S nrDNA sequences were confirmed in L. insularis, L. sachalinensis, L. praeflorens and L. maackii with 3, 7, 2 and 2 types, respectively (Figure 2). We categorized major and minor types by considering mapping depth per position and polymorphic sites within reads. The average coverage of raw reads for 45S `nrDNA sequence assembly ranged from 155.2× to 536.5×.
10
Table 1. Statistics of WGS and assembly summary for six Lonicera species
Feature L. insularis L. sachalinensis L. praeflorens L. maackii L. vesicaria L. japonica
Sequencing information
No. of raw read 4,941,334 4,764,738 4,342,742 4,920,926 5,596,064 6,308,194
No. of trimmed read 4,662,540 4,339,126 4,024,338 4,640,648 4,712,150 5,029,201
No. of trimmed bases 1,211,552,506 1,098,408,065 1,040,146,882 1,188,483,775 1,164,886,321 1,178,414,508 Chloroplast genome
Average read depth 634.83 214.83 165.39 784.00 134.00 668.84
Genome size (bp) 155,124 155,123 154,892 155,318 155,182 155,060
Large single copy 88,230 88,229 88,353 89,202 89,096 88,853
Small single copy 18,774 18,774 18,929 18,680 18,612 18,653
Inverted repeat 24,060 24,060 23,805 23,718 23,737 23,777
Number of genes 114 114 114 114 113 109
Protein-coding genes 80 80 80 80 79 77
Structure RNAs 34 34 34 34 34 32
GC contents (%) 38.35 38.34 38.31 38.47 38.39 38.59
11
Table 1. Continued
45S nrDNA
Average read depth 371.52 204.90 446.42 536.46 155.21 113.84
Coding region length 5,834 5,832 5,836 5,833 5,832 5,835
18S 1,809 1,809 1,809 1,809 1,809 1,809
ITS1 230 228 229 229 228 228
5.8S 164 164 164 164 164 164
ITS2 232 232 235 232 232 232
12
Figure 1. Complete chloroplast genome maps of six Lonicera species. Chloroplast
genome maps were generated by OGDRAW. Genes shown on the outside of the map are transcribed clockwise, on the other hand, genes on the inside are transcribed counter-clockwise. The four parts of chloroplast genome and GC contents are indicated on the inner circle. Blue and red bars in inner-circle represent SNP and InDel variations among six Lonicera species, respectively. A gene with black star is not presented in L. vesicaria and L. japonica, also a gene with blue star is not presented in L. japonica.
14
Figure 2. Complete 45S nrDNA sequence assembly of six Lonicera species and
polymorphic sites for each species. (A) L. insularis, (B) L. sachalinensis, (C) L. praeflorens, (D) L. maackii, (E) L. vesicaria, (F) L. japonica (a) mapping depth of raw PE reads on the assembled 45S nrDNA and GC content, windowsize is 100 bp. (b) polymorphic regions between heterotype sequences.
15
2. Comparative analysis of chloroplast genomes of six Lonicera species.
The gene annotation of six Lonicera species revealed that L. insularis, L. sachalinensis, L. praeflorens and L. maackii contains a total of 114 genes : 80 protein-coding, 30 transfer RNA (tRNA) and 4 ribosomal RNA (rRNA) genes. L. vesicaria and L. japonica contains a total of 113 and 112 genes, respectively : 79 and 78 protein-coding, 30 tRNA and 4 rRNA genes. Some genes were pseudogenized in Lonicera species : ycf15 gene in six Lonicera species, accD gene in L. vesicaria and L. japonica, rpoA gene in L. japonica.
To investigate genetic diversity of chloroplast genome of six Lonicera species, multiple alignment was performed. We identified 17~2,261 SNPs and 5~278 InDels between species (Table 2). The lowest numbers of SNPs and InDels (17 and 5) were identified between L. insularis and L. sachalinensis; meanwhile, the highest numbers of SNPs (2,261) were identified between L. vesicaria and L. japonica and the highest numbers of InDels (278) were identified between L. insularis and L. japonica (Table 2). The sequence identity plot of six chloroplast genomes was constructed by mVISTA program, using the L. insularis annotation as a reference (Figure 3). The six Lonicera species showed high similarity with each other (97.4%), and genic regions were more conserved than intergenic regions, as we expected.
We also compared the borders of LSC, SSC and two IR regions among six chloroplast genomes (Figure 4). The rpl23 gene spanned the LSC and IRB regions with approximately 120 bp in IR region for all six Lonicera species. The junction of IRB and SSC existed between trnN and ndhF genes except for L. japonica. The distance between trnN gene and IRB/SSC junction position ranged from 833 to 1220 bp. The junction of SSC and IRA existed between ycf1 and trnN genes, showing different distance of 208 ~ 523 bp and 832 ~ 1219 bp in length, respectively. The trnH gene was exactly located in the border region of IRA and LSC for five Lonicera species, whereas L praeflorens had a 20 bp gap between trnH and IRA/LSC junction.
16
Table 2. Summary of SNPs and InDels found in chloroplast genomes among the six Lonicera species
Species InDel Li Ls Lp Lm Lv Lj SNP Li / 5 246 153 247 278 Ls 17 / 246 156 245 277 Lp 1450 1439 / 227 235 271 Lm 754 743 1426 / 223 266 Lv 1550 1539 1446 1490 / 268 Lj 1964 1953 2072 1958 2261 /
18
Figure 3. Comparison of chloroplast genome sequences of six Lonicera species using mVISTA program with L. insularis as a
reference. Genic regions were annotated by GeSeq, tRNAscan. Red and black arrowheads indicate polymorphic regions for molecular markers development. Red arrowheads represent the regions for InDel marker development and black arrowhead for SNP marker. Li, L. insularis; Ls, L. sachalinensis; Lp, L. praeflorens; Lm, L. maackii; Lv, L. vesicaria; Lj, L. japonica.
19
Figure 4. Comparison of the border positions of LSC, SSC and IR regions across six Lonicera plastid genomes. The arrow boxes
20
3. Characterization of SSRs and repetitive sequences among six
Lonicera species
Comparative analyses of repeat were carried out with one IR region to avoid redundancy. Copy number variations of SSRs were identified among the chloroplast genome of six Lonicera species (Table 3). The longest SSRs were hexamer with 24 bp in length. The most abundant SSRs were mononucleotide with A and T. The lowest number of SSRs (47) were identified in L. sachalinensis and L. japonica, whereas the highest number of SSRs (50) were identified in L. praeflorens. L. japonica contained the highest number of homopolymers (32), but no pentapolymers. L. insularis, L. sachalinensis and L. praeflorens had 5 dipolymers; lower than L. maackii (7), higher than both L. vesicaria and L. japonica (4). L. maackii had one tripolymers; lower than all other Lonicera species (2). L. insularis, L. sachalinensis and L. praeflorens had 8 tetrapolymers; lower than L. vesicaria (9) and L. maackii (10), higher than L. japonica (7). L. praeflorens and L. vesicaria had 4 and one pentapolymers, respectively, but no other Lonicera species. L. maackii and L. vesicaria had one hexapolymers; lower than L. insularis, L. sachalinensis and L. japonica (2).
Repeat sequences were grouped into four types: tandem, dispersed, palindromic, complement. A total of 32~55 repetitive sequences were identified in chloroplast genomes of each individual Lonicera species (Figure 5A), including tandem (68.8%), dispersed (18.0%), palindromic (11.7%) and complement (1.5%) (Figure 5C). Repeat length ranged from 10 to 77 bp across chloroplast genomes of the six Lonicera species (Figure 5B). The longest repeat was tandem and found in L. vesicaria. Dispersed and palindromic repeats ranged from 18~74 bp and 21~30 bp, repectively. Most repeats were located in intergenic spaces (IGS) regions (55.64%) and coding sequence (CDS) regions (35.34%) (Figure 5D). Some repeats were found in intron regions (9.02%).
21
We further characterized copy number variation (CNV) of various tandem repeat (TR) units which is one of the important genomic resources for genetic diversity analysis (Table 4). A total of 52 TRs were identified that ranged from 10 to 77 bp in length. Of the 52 TR sequences, 23 were located at genic regions, and 29 were in intergenic regions. Three of those from genic regions were found in introns. Among the 52 TRs, 14 did not show any copy number variation between six Lonicera species, whereas 38 showed polymorphisms: 6 were unique to L. insularis and L. sachalinensis (TR4, TR7, TR25, TR39, TR43, TR35); 3 were unique to L. praeflorens (TR9, TR47, TR48); 3 were unique to L. maackii (TR23, TR24, TR49); 8 were unique to L. vesicaria (TR1, TR8, TR10, TR12, TR19, TR21, TR27, TR38); 6 were unique to L. japonica (TR11, TR28, TR42, TR44, TR50, TR52) and 12 were unique and diverse in six Lonicera species (TR3, TR13, TR14, TR15, TR16, TR17, TR20, TR32, TR33, TR36, TR46, TR51)
22
Table 3. SSRs comparison in chloroplast genomes of six Lonicera species
Motif SSR units Number of SSRs
Li Ls Lp Lm Lv Lj Mononucleotide A/T 31 30 30 29 30 29 C/G - - 1 - 1 3 Dinucleotide AT/AT 2 2 3 2 2 2 TA/TA 2 2 2 3 2 2 GA/TC 1 1 - 1 - - TC/GA - - - 1 - - Trinucleotide TTC/GAA 1 1 1 1 1 1 AAT/CTT 1 1 - - - - TTG/CAA - - 1 - 1 - ATA/TAT - - - 1 Tetranucleotide AGAT/ATCT 1 1 1 2 1 1 ATAA/TTAT 2 2 2 2 2 2 CAAT/GTTC 1 1 1 1 1 1 TATC/GATA 1 1 1 1 1 1 TCTT/AAGAC 1 1 1 1 1 1 TTAA/TTAA 1 1 1 1 1 1 ATTT/AAAT 1 1 - - - - TTTA/TAAA - - 1 1 - - TCTA/TAGA - - - 1 - - AAAT/ATTT - - - - 2 -
23
Table 3. Continued Pentanucleotide TATTA/TAATA - - 3 - - - TATAT/ATATA - - 1 - - - TATTC/GAATA - - - - 1 - Hexanucleotide CTTACC/GGTAAG 1 1 - - 1 - TGTTTA/TAAACA 1 1 - - - - TATGGA/TCCATA - - - 1 - - ATTCCA/TGGAAT - - - 1 GGATAG/CTATGG - - - 1 Total SSRs 48 47 50 48 48 4725
Figure 5. Repeat structure analysis in six Lonicera chloroplast genomes. (A) Number of four repeat types in each Lonicera species
chloroplast genome (B) Frequency of repeat sequences by length (C) Frequency of all repeat types (D) Location distribution of all the repeats.
26
Table 4. CNVs of TR units in chloroplast genomes among six Lonicera species
Marker
name No. TR unit sequence
Length (bp) Copy number Position Li Ls Lp Lm Lv Lj TR1 AAAGTTTCCTATTTCTAC 18 1 1 1 1 2 1 rps16-trnQ(UUG) TR2 CTTTCTACTACTAAT 15 2 2 2 2 2 2 trnC(GCA)-petN TR3 AATAAAAAATATAG 14 1 1 0 2 1 3 trnE(UUC)-trnT(GGU) TR4 AATACTACATTATCATCTCCATTGTATTTAAATCGACAAA 40 2 2 1 1 1 1 trnT(GGU)-psbD
TR5 ATGTAATAACTAGATAAATC 20 2 2 2 2 2 2 rps4-trnT(UGU) TR6 TTAGCTACTCATAA 14 3 3 3 3 3 3 trnT(UGU)-trnL(UAA) TR7 CTCCCTAATTATTTATCCT 19 2 2 1 1 1 1 trnL(UAA)-trnF(GAA) TR8 TAATTGAATTTCAATTAAA 19 1 1 1 1 2 1 rbcL-accD
TR9 TCCCCCTCTAATTCAAATGAGTGGTTTTGTGGGAAAAGGGGATTCAAAGAAAGAA 55 1 1 3 1 1 1 rbcL-accD TR10 TATTCTATTTTCTTCTTTAATATTCGATCAA ATTACATATAAAAAGAATATCTTTGTAATTT GATTAAAAAAAAAAG 77 1 1 1 1 2 1 rbcL-accD TR11 GACTCTGAAAGCGATCCTGAGGAGGGTAAC GATAACCCGTTCCAT 45 1 1 0 1 1 2 accD TR12 CGCCTTGAAGCAGATAGACGTTATCGGGAG GGTTACTCTGGTGCTCCTGACGATGAAGTT ACTGAGGAGATT 72 1 1 1 1 2 1 accD TR13 GGGATAGGGATAAGGATA 18 0 0 0 9 1 2 accD TR14 CTGACTATGGAAGTGATACGGATGGCC 27 5 5 7 1 1 2 accD TR15 GGATTACCTTCAAAAAAAAAGAAATCCTGGGGG 33 1 1 3 2 2 2 accD
Lo_i_03 TR16 CTATGGAAGTAATCCTCAGAAGGATGACCCTAG 33 2 2 5 4 3 3 accD
TR17 CTTAATTAAGAATATTAA 18 2 2 1 2 1 1 accD-psaI
27
Table 4. Continued
Lo_i_04 TR19 ATTTAATTAAAA 12 1 1 1 1 2 1 trnP(UGG)-psaJ
Lo_i_04 TR20 ATAAAAGTAATATATAAAAAAAG 23 2 2 3 2 2 1 trnP(UGG)-psaJ
TR21 AAATCCAAGCGACCCTTTCTG 21 5 5 5 5 4 5 rps18 TR22 AACGGCCTTTCCAGCCGAGAT 21 2 2 2 2 2 2 rps18-rpl20 TR23 CCGAACTCAA 10 1 1 0 2 1 1 rps18-rpl20 TR24 TATTCTATTAAACTTGG 17 1 1 1 2 1 1 rps12-clpP TR25 AATAAAGAAAACAAATAAGA 20 2 2 1 1 1 1 petB-petB TR26 AAAAGAAAATCCAGTCAA 18 2 2 2 2 2 2 petD-petD TR27 TCTGAATCTATT 12 2 2 2 2 1 2 rps8-rpl14 TR28 TTTCCTTTCAGTCTATT 17 1 1 1 1 1 2 rpl14-rpl16 TR29 TTAAGATATATCTTGAAT 18 2 2 2 2 2 2 rpl16-rpl16 TR30 AATTCTTCTTGTAAATTCCTCTTT 24 2 2 2 2 2 2 rps3 TR31 TAATCTATTTTTAT 14 2 2 2 2 2 2 rpl22-rps19 TR32 ATATCATAAAGAA 13 2 2 17 2 2 2 trnI(CAU)-ycf2 TR33 TTTGTCTAAGCCACTTCGTTTCTT 24 6 6 5 6 6 4 ycf2 TR34 TGATCCTCCATTTGAACCAGATGA 24 2 2 2 2 2 2 ycf2 TR35 GATAAGAAAAGTGAA 15 2 2 2 2 2 2 ycf2 TR36 GGGGATGGGGTTGTGGAC 18 5 5 4 5 4 4 ycf2 TR37 TGATATTGATGATAGTAAGATTGATGA GAG 30 2 2 2 2 2 2 ycf2
Lo_i_05 TR38 AGGATGATGGAG 12 1 1 1 1 2 1 ycf2
TR39 AAGAGTATGAGCTTC 15 2 2 1 1 1 1 ycf2
TR40 AAGAGGATGATGAAGAGAATGAGG 24 2 2 2 2 2 2 ycf2
TR41 TCCATGGAGCTATGTGTCTAT 21 2 2 2 2 2 2 ndhB-rps7
Lo_i_06 TR42 ATGGATAAGAGGCTCGTGGGAT 22 1 1 1 1 1 2 trnV(GAC)-rrn16
TR43 ATAGAATAACATAATATCATATATAGA
ATAACATAATATTCT 42 6 6 0 0 0 1 trnN(GUU)-ndhF
28
Table 4. Continued
TR45 ATTATTATAAAGTAATATTATAATTAGA
TATTACTTTATAATAATCTATTTTT 53 2 2 1 1 0 0 trnN(GUU)-ndhF
TR46 TTTTTTTTACTTACCTTATT 20 2 2 3 0 2 1 ccsA-ndhD
Lo_i_07 TR47 TGATTAATACTAC 13 1 1 2 1 1 1 psaC-ndhE
TR48 GTTCGCTATTATTTATCTT 19 1 1 2 1 1 1 ndhA-ndhA
Lo_i_08 TR49 TCAGTTAGATTCTTCATTTCTTGT 24 2 2 2 1 2 2 ycf1
TR50 TTCCACTTCCTT 12 1 1 1 1 1 2 ycf1
TR51 TTTTTACAGATTTCTTTGATTCCAACC 27 1 1 2 1 2 1 ycf1
TR52 TTAGAAAATGGATCCACTTTCTGGTCAA
29
4. Sequence variations of 45S nrDNA sequences of six Lonicera species
The 45S nrDNA sequences of the six Lonicera species were compared and analyzed by multiple alignment. Comparison of 45S nrDNA unit sequences among six Lonicera species showed a total of 45 SNPs and 4 InDels considering all types of 45S nrDNA sequences including major and minor types (Table 5). Polymorphisms were rich in ITS1, ITS2 and 26S rDNA regions, and some were found in 18S rDNA region. In case of L. insularis, L. sachalinensis, L. praeflorens and L. maackii, the 45S nrDNA sequences exist as a heterotypes with number of 3, 7, 2 and 2, respectively (Table 6, Table 7, Table 8). Polymorphisms in major and minor types were also rich in ITS1 and ITS2 regions.
30
Table 5. Summary of SNPs and InDels found in 45S nrDNA sequences among the six Lonicera species
Feature Total 18S ITS1 5.8S ITS2 26S
Length (bp) 5832-5836 1809 228-230 164 232-235 3399-3402
SNP 58 4 20 0 18 16
31
Table 6. Summary of SNPs and InDels found between three hetero types of L. insularis 45S nrDNA sequences
Nucleotide positiona Read depthb ITS1 ITS2 26S 1 1 1 1 1 1 1 1 2 2 2 2 2 3 4 8 8 8 8 8 9 9 9 2 2 3 4 4 0 5 8 9 9 9 9 2 3 4 6 8 4 3 5 7 6 6 0 1 5 6 2 4 1 6 1 6 6 4 6 4 Type 1 371.52 G A C C C C A G C G T T G C T Type 2 367.05 G G C - c C C G G A G C C T G T Type 3 341.22 -c G T - c - c G G C C T C C G C C
a Nucleotide position based on 45S nrDNA sequence of L. insularis type1 as a reference sequence. b Read depth by type.
32
Table 7. Summary of SNPs and InDels found between seven hetero types of L. sachalinensis 45S nrDNA sequences
Nucleotide positiona Read depthb ITS1 ITS2 26S 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 3 2 2 2 2 2 3 8 8 8 8 9 9 0 0 0 2 2 2 2 3 3 4 4 4 4 4 4 4 7 0 8 9 9 9 2 3 0 1 3 5 6 8 8 4 5 3 3 3 4 5 5 5 8 7 6 0 1 4 0 9 5 4 2 5 4 1 7 4 4 1 2 4 5 2 4 5 1 4 Type 1 200.70 G G T - c C G C C T - c C C G C G G A T C T C C G G Type 2 187.24 G G T - c T G T T C - c C C G C G G A T C T C C G G Type 3 187.84 - c G T - c G C C C T C C T G C A G A T C T T T G G Type 4 179.53 G A C C C G C C T - c C T G C G T T T C T C C G G Type 5 187.30 G A C C C G C C T - c A C G C G T T C C T C C A C Type 6 190.87 G A C C C G C C T - c C C G T G T T T C G C C G G Type 7 197.69 G G C C C G C C T - c C C T C G G A T A T C C G G
a Nucleotide position based on 45S nrDNA sequence of L. sachalinensis type1 as a reference sequence. b Read depth by type.
33
Table 8. Summary of SNPs and InDels found between two hetero types of L. maackii 45S nrDNA sequences
Nucleotide positiona Read depthb ITS1 ITS2 26S 1 1 2 2 2 5 8 9 3 4 4 7 6 4 9 6 7 2 1 0 5 0 3 9 Type 1 536.46 T C G G C G Type 2 483.24 C T A T T A
a Nucleotide position based on 45S nrDNA sequence of L. maackii type1 as a reference sequence. b Read depth by type.
34
5. Divergence time estimation and phylogenetic relationship among six
Lonicera species
Based on conserved protein-coding sequences among six chloroplast genomes, the mean and median Ks values were 0.0001 ~ 0.0223 and 0.00 ~ 0.0192, respectively (Table 9, Table 10). The lowest Ks value was found between L. insularis and L. sachalinensis, whereas the highest Ks value was found between L. vesicaria and L. japonica. At gene level, the lowest and highest average Ks value was detected in ndhB and petL genes with 0.0007 and 0.0596, respectively. High Ka/Ks ratio of more than 1 was detected in psaJ gene, and rbcL, rps18 and ycf2 genes showed Ka/Ks ratio of over 0.8 from both mean and median values (Figure 6, Figure 7). The psbI, psbZ, psbL and petG genes showed Ka and Ks value of both 0, indicating highly conserved in six Lonicera chloroplast genomes.
The phylogenetic relationship of the six Lonicera species was examined by comparative analysis of the complete chloroplast genomes and the 45S nrDNA sequences (Figure 8). Complete chloroplast genome and all 45S nrDNA sequences including major and minor types were used for phylogenetic analysis.
The phylogenetic tree based on chloroplast genomes revealed that L. japonica is most diverse and grouped into an independent group. L. insularis and L. sachalinensis were the closest, and they belonged to the same subgroup as L. maackii. L. praeflorens and L. vesicaria were classified into another subgroup.
Based on protein-coding sequences, the divergence time between L. insularis and L. sachalinensis could be estimated at approximately 0 ~ 3 MYA (Figure 8A), between L. praeflorens and L. vesicaria at 6.50 ~ 7.61 MYA, between L. japonica and other species at 9.19 ~ 10.75 MYA, and then speciation was considered to have occurred on its own.
The phylogenetic tree based on 45S nrDNA sequences showed a similar pattern with that obtained from chloroplast genomes but more complicated (Figure 8B). The
35
result showed that L. insularis and L. sachalinensis were the closest, and belonged to the subgroup as L. maackii, as with the result of phylogenetic analysis based on chloroplast genomes. L. vesicaria and L. japonica were classified into another subgroup.
36
Table 9. Mean Ks values and estimated divergence time of six Lonicera species
Species Divergence time (MYA)
b Li Ls Lp Lm Lv Lj Mean Ks valuesa Li 0.03 7.95 4.14 7.88 10.49 Ls 0.0001 7.89 4.08 7.81 10.41 Lp 0.0159 0.0158 7.33 7.61 11.44 Lm 0.0083 0.0082 0.0147 7.45 10.20 Lv 0.0158 0.0156 0.0152 0.0149 11.14 Lj 0.0210 0.0208 0.0229 0.0204 0.0223
Abbreviations : Li, L. insularis; Ls, L. sachalinensis; Lp, L. praeflorens; Lm, L. maackii; Lv, L. vesicaria; Lj, L. japonica
aMean Ks values between common protein-coding genes of each species calculated using PAML program. bDivergence time was estimated by Ks/2λ (λ =1.0 × 10−9)
37
Table 10. Median Ks values and estimated divergence time of six Lonicera species
Species Divergence time (MYA)
b Li Ls Lp Lm Lv Lj Median Ks valuesa Li 0.00 7.55 3.50 6.00 9.05 Ls 0.0000 7.50 3.50 6.00 9.05 Lp 0.0151 0.0150 6.80 6.50 9.20 Lm 0.0070 0.0070 0.0136 5.30 9.05 Lv 0.0120 0.0120 0.0130 0.0106 9.60 Lj 0.0181 0.0181 0.0184 0.0181 0.0192
Abbreviations : Li, L. insularis; Ls, L. sachalinensis; Lp, L. praeflorens; Lm, L. maackii; Lv, L. vesicaria; Lj, L. japonica
aMedian Ks values between common protein-coding genes of each species calculated using PAML program. bDivergence time was estimated by Ks/2λ (λ =1.0 × 10−9)
38
Figure 6. Summary of Ka and Ks values among the 77 conserved protein-coding genes in six Lonicera species. The mean Ka and
Ks values are indicated by grey and light grey bars, respectively. Blue and light blue stars represent genes that Ka/Ks values are over 1 and 0.8, each, and could evolve into positive selection. Red stars indicate conserved genes without any substitution.
39
40
Figure 8. Phylogenetic tree and divergence time of six Lonicera species. Phylogenetic trees were generated based on complete
chloroplast genomes (A), 45S nrDNA sequences (B) using MEGA 6.0. The numbers on the nodes indicate bootstrap support values. The numbers under the nodes at (A) represent median and mean divergence time (*, MYA) based on Ks values using PAML 4.9. Bold numbers represent major types of 45S nrDNA sequence, and non-bold numbers are minor types. Li, L. insularis; Ls, L. sachalinensis; Lp, L. praeflorens; Lm, L. maackii; Lv, L. vesicaria; Lj, L. japonica.
41
6. Phylogenetic relationship within Dipsacales
Phylogenetic relationship inferred using complete chloroplast genome sequences from 14 species in Dipsacales indicated that eight genera divided into two monophyletic groups consisting of Caprifoliaceae and Adoxaceae family (Figure 9). In Caprifoliaceae family, Patrinia saniculifolia and Kolkwitzia amabilis were classified into another subgroup, and the six Lonicera species which is completely assembled in this study were grouped with L. japonica (China collection) which is previously reported.
42
Figure 9. Phylogenetic analysis of Lonicera species with related species in Dipsacales. The tree was generated based on complete
chloroplast genome sequences of 14 species and analyzed neighbour-joining method with 1000 bootstrap values in MEGA 6.0. The numbers in the nodes represent bootstrap support values.
43
7. Development and validation of chloroplast genome-based markers
Based on chloroplast genome sequence alignment, the seven markers were developed based on polymorphic sites for discriminating the six Lonicera species and further application for authentication of each species (Table 11, Figure 10). Among those, six markers were derived from CNV-based InDel region, and one marker was from SNP region. Each of these seven markers were successfully amplified by PCR, and each amplicon showed expected PCR product band sizes.
The marker Lo_i_03 was specific to Lonicera species with much different sizes, and derived from a 33 bp tandem repeat in the accD gene (Table 11, Figure 10A). The marker Lo_i_04 was derived from 12 and 23 bp tandem repeat in the trnP - psaJ region, and distinctly amplified in Lonicera species (Table 11, Figure 10B). The marker Lo_i_05 was derived from a 12 bp tandem repeat in the ycf2 gene, and was specific to L. vesicaria (Table 11, Figure 10C). The marker Lo_i_06 was derived from a 22 bp tandem repeat in the trnV - rrn16 region, and specific to L. japonica (Table 11, Figure 10D). The marker Lo_i_07 was derived from a 13 bp tandem repeat in psaC - ndhE region, and specific to L. praeflorens (Table 11, Figure 10E). The marker Lo_i_08 was derived from a 24 bp tandem repeat in ycf1 gene, and specific to L. maackii (Table 11, Figure 10F). Finally, the dominant marker Lo_do_04 was derived from SNP in rps18 - rpl20 region, and specific to L. insularis (Table 11, Figure 10G). Validation results indicated that more than three species were able to be distinguished using each marker Lo_i_03 and Lo_i_04, also seven Lonicera species were successfully discriminated (Table 12).
44
Table 11. Information of developed molecular markers in this study for six Lonicera discrimination
Type Marker
name Primer sequence (5’-3’)
Product (bp)
Location
Li Ls Lp Lm Lv Lj
InDel
Lo_i_03 F AGAGCCTTACCTTGACTATGGA 480 480 579 546 513 513 accD
R ACGGATCCCATACTACCCCC
Lo_i_04 F AAACAAACGCGCTACCAAGC 314 314 338 314 326 295 trnP(UGG)-psaJ
R CCCGAGCATTCCCGAAAAAG
Lo_i_05 F TTTGAAGACGGGGAAGGAGC 200 200 200 200 212 200 ycf2
R TCCTCTTCATCCGCGAAAGG
Lo_i_06 F GAGTGTCACCTTGACGTGGT 186 186 186 186 186 208 trnV(GAC)-rrn16
R TCATATTCGCCCGGAGTTCG
Lo_i_07 F TCAATCGACTTCTGGATTGGGT 236 236 249 236 236 236 psaC-ndhE
R GCCGCTGAAGCAGCTATTGG
Lo_i_08 F AATCGAGCGTTTCTTCGTTTT 220 220 220 196 220 220 ycf1
R GGGCAAATTCTTTACAGACAGAAC
SNP Lo_do_04 F AAACGGAATCGCGTTAGTGTGG 266 Na
a Na Na Na Na rps18-rpl20
R TCGGTTGAGTTCGGATTGGA
46
Figure 10. Validation of seven developed molecular markers from InDel and SNP regions of six Lonicera chloroplast genomes.
Schematic diagrams indicate CNVs of TR units and SNP polymorphism. Tandem repeats are designated by triangles. Direction of genes were represented by pentagons.
47
Table 12. Marker combinations for each Lonicera species
No. Species Lo_i_03 Lo_i_04 Lo_i_05 Lo_i_06 Lo_i_07 Lo_i_08 Lo_do_04
1 L. insularis A A A A A A A AAAAAAA 2 L. sachalinensis A A A A A A B AAAAAAB 3 L. praeflorens B B A A B A B BBAABAB 4 L. maackii C A A A A B B CAAAABB 5 L. vesicaria D C B A A A B DCBAAAB 6 L. japonica D D A B A A B DDABAAB
48
DISCUSSION
1. Complete chloroplast genome and 45S nrDNA sequences of six
Lonicera species derived from low-coverage whole-genome NGS data
Multi-copies of chloroplast genome and 45S nrDNA sequences exist in a plant cytoplasm and nucleus, respectively, which can be well explained why high coverage of reads were generated from small amount of WGS data (Table 1). Moreover, the chloroplast genome has been extensively used in understanding genetic diversity, authentication, and evolution in plants (Kim et al. 2015; Joh et al. 2017; Kim C-K et al. 2018). However, the genus Lonicera lacks such studies. Here, complete chloroplast genome and 45S nrDNA sequences were successfully obtained from the six Lonicera species (Figure 1, Figure 2). The three or four initial contigs that showed high homology to the reference sequence were extracted, and complete chloroplast genomes were generated by overlapping the contigs and manual curation. For the 45S nrDNA sequences, the longest contig which is like the reference sequence was extracted. This result demonstrated that assembly method used in this study is reliable and efficient to obtain complete chloroplast genome and 45S nrDNA sequences, as previously described (Kim et al. 2015). Through this study, the chloroplast genomes of six Lonicera species were completed, five of them except L. japonica were completed for the first time.
49
2. Comparative analysis of chloroplast genome and 45S nrDNA
sequences among six Lonicera species
Most of genes in the chloroplast genomes were shared among six Lonicera species, except for accD and rpoA genes. The accD gene, which is known to encode one of the acetyl-CoA carboxylase enzyme subunits, was pseudogenized in both L. vesicaria and L. japonica. Moreover, The rpoA gene, which encodes a subunit of RNA polymerase, was pseudogenized in L. japonica. Similar gene loss of accD and rpoA genes was found in some plants (Sugiura et al. 2003; Goffinet et al. 2005; Harris et al. 2013; Li J et al. 2016)
Several sequence variations were revealed by comparing the chloroplast genomes and 45S nrDNA sequences of the six Lonicera species. In chloroplast genome, nucleotide substitution has been used to study plant evolution and genome differentiation between species (Wolfe et al. 1987). Also, InDel have been known to play a major role in genome size increase (Britten et al. 2003). Although the six chloroplast genomes showed high similarity (97.4%), abundant polymorphisms such as SNP and InDel were confirmed.
In addition, we found some chloroplast protein-coding genes that showed high Ka/Ks value over 1 and 0.8 and conserved among six Lonicera species. Four genes, psaJ, rbcL, rps18 and ycf2 genes that showed high Ka/Ks value, might be involved in evolution under positive selection in Lonicera genus. On the other hand, four genes in LSC, psbI, psbZ, psbL and petZ genes, were highly conserved in genus Lonicera.
The different 45S nrDNA heterotypes were identified in this study such as those reported in Brassica genomes (Kim C-K et al. 2018). These heterotypes often occurred from hybridization or allopolyploidization, and could provide information about genome history or relationship (Reeder 1985).
50
The abundant variations of chloroplast genomes and 45S nrDNA sequences identified in this study will be valuable for barcoding in six Lonciera species as well as studying genetic diversity in the family Caprifoliaceae.
3. Repetitive sequences in Lonicera chloroplast genomes
Repeat structure which is originated from DNA strand repair mechanisms have been known to relate with genome recombination and divergence (Haberle et al. 2008). We found abundant tandem repeats in some genes such as accD, ycf2 and ycf1 (Table 4). These repetitive sequences could cause divergence in chloroplast genome between species.
SSRs consist of one to six or more nucleotide sequentially repetitive motifs in a head-to-tail structure and have been used to analyze genetic diversity (Kelkar et al. 2010). Furthermore, molecular marker derived from SSR polymorphism is useful for phylogenetic study, genome mapping and gene tagging due to its highly polymorphic features (Reddy et al. 2002). In this study, we found a total of 288 SSRs, varying in numbers and types between six Lonicera species (Table 3).
The abundant and variable repeats identified in six chloroplast genomes could be used to develop molecular markers for identifying the six Lonicera species as well as characterizing genetic diversity of Lonicera.
51
4. Divergence and phylogenetic analysis based on chloroplast genome
and 45S nrDNA sequences of the Lonicera species
Some previous studies carried out phylogenetic analysis using intergenic or coding regions derived from chloroplast genome and nrDNA sequences such as trnL-trnF, rpoB-trnC, petN-psbM, matK and ITS regions (Theis et al. 2008; Jeong et al. 2014). However, phylogenetic study using the complete chloroplast genome and 45S nrDNA sequences have not been reported. So, here, we used whole sequences of the chloroplast genome and 45S nrDNA to study the phylogenetic relationship between six Lonicera species, and carried out phylogenetic study with its related species belonging to Dipsacales using complete chloroplast genomes.
The topologies based on both chloroplast genome and 45S nrDNA sequences among six Lonicera species showed similar pattern. Phylogenetic analysis indicated that L. japonica was diverged first from the common ancestor of six Lonicera species (9.19-10.74 MYA), and ycf2 and rbcL genes might be related to divergence event considering Ka/Ks values (Figure 7A, Figure 7B and Figure 8). Moreover, two major subgroups which clustered L. insularis and L. praeflorens separately diverged 6.53-7.72 MYA and, psaJ and rps18 genes might be related to divergence event between two subgroups (Figure 7C, Figure 7D and Figure 8). In addition, the divergence time that we calculated was coincident with previous study by fossil calibrations (Smith 2009).
Based on the phylogenetic analysis with Lonicera related species, Lonicera species were grouped and close to Patrinia and Kolkwitzia genus which belong to Caprifoliaceae family, as expected. Moreover, based on the branch lengths, L. japonica (China collection) has longer branch and it is thought to be more diversified from their common ancestor than L. japonica (Korea collection). Furthermore,
52
chloroplast genome comparison between Korea and China L. japonica collections showed richer intra-species diversity.
5. Development of molecular markers for authentication of six Lonicera
species
The complete chloroplast genome has a conserved sequence and is known to be reliable and precise target for plant molecular marker because of their multi-copies and high inter-species variations compared with the nuclear genome (Li X et al. 2015). In this study, we developed chloroplast-derived markers based on polymorphic sites and successfully distinguished each species (Figure 10, Table 12). Among seven molecular markers, two markers, Lo_i_03 and Lo_i_04, were derived from CNV of TRs between six Lonicera species, so that four species except for L. insularis and L. sachalinensis could be identified with these two markers (Figure 10A, Figure 10B and Table 12). Four markers, Lo_i_05, Lo_i_06, Lo_i_07 and Lo_i_08, were also CNV of TR-based markers and specific to L. vesicaria, L. japonica, L. praeflorens and L. maackii, respectively (Figure 10C, Figure 10D, Figure 10E, Figure 10F and Table 12). Furthermore, the Lo_do_04 dominant marker was derived from SNP region, and solely amplified in L. insularis (Figure 10G, Table 12). These developed markers will be valuable to authenticate the six Lonicera species and provide useful information for genetic diversity studies.