• 검색 결과가 없습니다.

MinION : New, Long Read, Portable Nucleic Acid Sequencing Device

N/A
N/A
Protected

Academic year: 2021

Share "MinION : New, Long Read, Portable Nucleic Acid Sequencing Device"

Copied!
19
0
0

로드 중.... (전체 텍스트 보기)

전체 글

(1)

Journal of Bacteriology and Virology 2015. Vol. 45, No. 4 p.285 – 303 http://dx.doi.org/10.4167/jbv.2015.45.4.285

MinION

TM

: New, Long Read, Portable Nucleic Acid Sequencing Device

Alan C Ward1,2 and Wonyong Kim1*

1Department of Microbiology, Chung Ang University, College of Medicine, Seoul 06974; 2School of Biology, Newcastle University, Newcastle upon Tyne, NE1 7RU, UK

The MinIONTM is a miniature nanopore-based analysis device in which the characteristics of an analyte, as it passes through the nanopore, cause changes in the flow of ions through the pore, which are measured, as current flow, by a low noise amplifier and analogue-to-digital converter. Potentially any molecular analyte capable of passing through the nanopore may modify the flow of ions and generate a signal which might be diagnostic. In practice the current device is focussed on DNA sequencing, directly sequencing RNA is a likely development. With the MinION Access Program making the MinIONTM widely available a flood of applications exploiting its real time, long read capabilities have been published. We review the background to the technology and compare it to current next generation sequencing.

Key Words: MinION, Nanopore, DNA sequencing

I. INTRODUCTION

Schadt and coworkers (1) have categorized 3 generations of sequencing platforms: first-generation, Sanger sequencing;

second-generation, amplification-based massively parallel sequencing; third generation, single-molecule sequencing.

Third generation nanopore sequencing is on the cusp of further revolutionizing how and why we sequence DNA.

The MinIONTM is a potentially revolutionary sequencing platform with significant advantages compared to both current second-generation sequencing platforms and the current third generation PacBio RS II single molecule, long read sequencer.

II. NEXT GENERATION SEQUENCING

The technology for sequencing DNA is evolving at an unprecedented rate, generating unprecedented amounts of data and driving an enormous amount of research in biology. In this review we look at comparing nanopore sequencing with the current technology, second generation, amplification-based, massively parallel sequencing and the emerging third generation single molecule, long read sequencing, as exemplified by the PacBio RS II sequencing platform.

2.1. Second generation sequencing

Sanger sequencing (2) has been at the heart of molecular biology since the 1970s and evolved over 30 years into a powerful technology (3, 4). Massively parallel amplification-

Review Article

Received: September 30, 2015/ Revised: October 5, 2015/ Accepted: October 8, 2015

*Corresponding author: Wonyong Kim. Department of Microbiology, Chung Ang University, College of Medicine, 84, Heukseok-ro, Dongjak-gu, Seoul 06974, Korea.

Phone: +82-2-820-5685, Fax: +82-2-822-5685, e-mail: kimwy@cau.ac.kr

CCThis is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/license/by-nc/3.0/).

285

(2)

based sequencing has revolutionized biological research (5~7) with both the scale, cost reduction and speed of development (3, 4).

However, most second generation sequencers produce short read lengths of hundreds of bases, as synchronization between amplified DNA strands is gradually lost (8). Current second generation sequencers (3, 4) range from benchtop, personal sequencers, such as Ion Torrent (9) PGM or Illumina (10) MiSeq, to large-scale, floor standing versions. All are designed for installation in the laboratory and for appli- cations in which the sample is transported to the sequencer.

Sample preparation, fragmentation and amplification of single DNA fragments by surface-bound amplification, bridge PCR (11) or bead-based emulsion PCR (12), requires multiple steps and some expertise. The data collected and the run time are pre-determined at setup, with run times varying from a few hours, for lower data yields and/or read lengths, to days for larger sequencers producing massive data (3, 4). Sample collection and transport, sample pre- paration, sequence data collection, base calling and data analysis are sequential processes.

Short read lengths, even with increasingly lower error rates, have required massive bioinformatics software development and need significant bioinformatics effort in the data analysis step (13) and analytical results only emerge at the end of the pipeline. So, although second generation sequencing has revolutionized many aspects of biology some applications are still limited.

2.1.1. Genomics

The limitation of short read lengths and errors is over- come by high coverage but repeat regions which are longer than read length are still a barrier. Even for prokaryotic whole genomes almost all whole genome sequences de- posited in Genbank are draft whole genomes with multiple contigs and potential mis-assemblies (14). Prokaryotic whole genomes with multiple ribosomal RNA operons seldom have more than one 16S gene in the draft, and sometimes none.

2.1.2. Transcriptomics

Whole genomes are relatively static, strains can be main- tained for relatively long periods without accumulating

significant changes. And the units of biological significance, like genes, are present, usually, as 1 or 2 copies per genome.

However, the transcriptome is much more dynamic. It changes with time and the temporal dynamics of differently regulated genes varies. The number of copies of different gene transcripts can vary over a huge range (15~17) but in both Escherichia coli and Saccharomyces cerevisiae many mRNA copies are less than 1 molecule per cell, often much less. The coverage required to capture the transcriptome at a single point in time, across this range, is much larger than for the genome. But the transcriptome is also dynamic over time, with fast and slow dynamics for different processes, with variation in rates of transcription, post-transcriptional modification and degradation. Patterns of expression vary with physicochemical conditions and growth history, which cells may experience in different experiments or in their natural environment. Determining differential gene expression (with time; in different environments; by different strains) is one goal but detecting modified transcripts such as alter- native splicing (18), mutant isoforms or managing the 3' bias found in transcripts as a result of mRNA degradation, needs even higher coverage. Quantitative data needs higher coverage still and careful quality control (19).

2.1.3. Modified bases

Short read RNA-seq for transcriptomics involves frag- mentation and cDNA preparation for sequencing. Both these steps lose information. Assembling full length transcripts from short reads is problematic, so for example, in cancer studies, where multiple mutations may be present, it may not be possible to distinguish different isoforms with only a single mutation each, from a single isoform with mul- tiple mutations (20). RNA is also subject to heavy post- transcriptional chemical modification of bases, such as methylation, that may modify function and stability (21), information lost in conversion to cDNA.

2.1.4. Metagenomics

In metagenomics, the DNA sample comes from an hetero- geneous assemblage of microbes perhaps with 1,000s of species, for example estimates in soil are of the order of tens of thousands (22), and sequence data is noisy and partial.

Metagenome size can be estimated as the sum of genome

(3)

sizes (G) for the number of species present (23):

…… i

Soil is estimated to contain ~1,000 Gbp metagenome DNA per gram (24), compared to 3 Gbp in the Human Genome Project. The number of reads required (23) to sequence the fraction P0 of genomic DNA size G is:

…… ii

As described for mRNA abundance in transcriptomics, the abundance of different species in microbial communities also varies over many orders of magnitude, so dominant members may be highly represented while the minority community is only sparsely sampled.

The linkage between the DNA fragments and the cells they come from is lost on cell lysis and DNA fragmentation.

So assembly of short fragments even into short elements, such as genes, operons or pathogenicity islands, is dependent upon DNA sequence alignment, with multiple homologous genes, in different genomes, which are related at taxonomic levels from members of the same species population to species in different phyla.

Even planning metagenomic sequencing projects for complex environments requires an estimate of the species richness to estimate the sequencing effort required. The Gold standard for this phylogenetic analysis is 16S sequencing (25), but 16S libraries are seldom large enough for accurate estimation and extrapolating to the asymptote is problematic (22), plus short 16S sequence fragments, variable regions targeted in second generation sequencing, are not as dis- criminatory as full length 16S (26).

Even for phylogenetic analysis using other house-keeping genes, and phylogenetic analysis by binning reads (27), short reads do not match long reads in finding homologs in the databases, a discrepancy that is only partially reversed by high coverage (28). For example, for a Chesapeake Bay

virioplankton metagenome library, among COG hits from 750 bp reads, randomly sampled short read data (~100- 200 bp), sampled at a depth of two short reads per long read, missed up to 72% of the hits found using long reads (28). Detecting rare members of a microbiome is challenging but quantifying the frequency distribution needs higher coverage and is even more challenging. Rare members of a microbiome may not be actively contributing at the point in time the microbiome is sampled but may be a significant part of it's dynamic response to changing conditions.

2.1.5. Meta-transcriptomics

Meta-transcriptomics (29) adds all the extra complexities of transcriptomics to those of meta-genomic approaches (30).

III. THIRD GENERATION SEQUENCING

3.1. PacBio RS II

Zero mode waveguides allow single molecule fluorophore detection (31), at biologically relevant concentrations. This has been used with DNA polymerase (32), as well as the ribosome (33) to follow DNA and protein synthesis. This technology has been developed into a single molecule, real time sequencing platform, PacBio (1, 34). The PacBio RS II is an 860 kg sequencing platform (Fig. 1) generating 500 Mb to a Gb of data in about 4 h, with half the reads longer than 20 kb, and 5% longer than 30 kb (35). Individual single pass reads have high error rates of 11~15%, but are random errors, mostly indels, so very high consensus accuracy can be achieved with increased coverage.

The SMRT-bell template has hairpin adapters ligated onto both ends of the digested DNA or cDNA molecules.

The template can be generated directly from sample DNA, without amplification, and sequenced repeatedly using a strand displacing polymerase (circular consensus sequencing, CCS), increasing accuracy (36). With CCS, multiple reads from a single molecule can give a consensus, as well as using sequencing of multiple DNA templates to give more coverage.

(4)

3.2. Genomics

PacBio has been extensively used for finishing draft genomes (37). Ribeiro et al. (2012) (38) used long PacBio reads with 2 Illumina libraries, a short-paired end library and a long "jump" paired end library, using the Illumina libraries to generate a draft genome and the long PacBio reads to assemble the contigs into a single genome. However, Koren et al. (2012) (39) were able to use a single Illumina short-read library to directly correct errors in PacBio long reads and assemble the corrected long reads to full whole genomes. But later they were able to show, as PacBio chemistry improved, that reads from a single SMRT cell run could be error corrected from the coverage by PacBio reads and assembled to a single contig, for bacterial genomes (40). PacBio long reads can also unravel the complexity in

eukaryotic genomes (41) resolving long repeats with high GC regions in the human genome. However, the throughput is lower, and cost higher, for PacBio sequencing than for second generation sequencing.

3.3. Modified bases

Because DNA is sequenced directly, with no amplification, any modified bases are retained in the template. However, PacBio sequencing is sequencing by synthesis, using fluoro- phore labeled standard DNA bases. However, the sequencing is captured real time on video and the kinetics of incorpo- ration differ for modified bases (42). These kinetic signatures have been used to map methylated adenines in a pathogenic Escherichia coli genome (43) and offer the potential to detect other modified bases.

3.4. Transcriptomics

The long reads potentially allow the sequencing of full length transcripts, at least for polyA tailed eukaryotic mRNA, as PacBio transcriptome sequencing is by cDNA modified bases are not identified. Application to pooled human tissue polyA RNA (44) found full length transcripts for lengths up to 1.5 kb, longer transcripts had more missing bases at the 5' end. Many known intron structures were recovered but also >10% were previously unannotated structures.

Most transcriptome studies using PacBio still use hybrid approaches, both for error correction and quantitation, per- haps also using circular consensus sequencing (45). Many of the novel isoforms detected were novel combinations of known splice sites. In the study by Tilgner et al. (2014) (45)

~99,000 annotated exon-exon junctions were detected by reads from a single PacBio SMRT cell and an Illumina run generating 100 million 101-bp paired end reads. Each junction detected by PacBio was detected 40x (median value) by Illumina, and Illumina reads covered ~92,000 annotated junctions not detected by PacBio, while PacBio only recovered 992 not seen in the Illumina data. So the lower throughput and higher cost impact on these studies.

The high capital cost, large size and resources needed for PacBio mean that researchers largely buy-in sequencing from service providers.

Figure 1. (A) PacBio RS II (B) MinIONTM (Reproduced with permission, Oxford Nanopore).

(5)

IV. NANOPORE MINION

The MinIONTM (MinION) is a third generation, single molecule, real time DNA sequencer first announced in 2012, it was released in early 2014 in an early access programme (MinION Access Program, MAP), like a beta-software release, for testing and development under real conditions of use. Although not commercially released and under constant development the technology is widely available, through the MinION Access Program, and being applied across a wide spectrum of applications which can benefit from long read, real time sequencing in a portable platform (46).

The MinION is novel in a number of respects as it is based upon nanopore sequencing (47~49) in which the characteristics of a single stranded DNA molecule, as it is translocated through a molecular pore, are sensed by mea- suring the perturbation in the flow of ions through the pore.

Almost exactly as envisioned, conceptually, by Deamer and Church (50) in 1989. Many aspects of the MinION are propriety but fundamental aspects of the technology are well covered in the literature.

4.1. Nanopore sequencing 4.1.1. Nanopores

The background to the development of the α-hemolysin pore in nanopore sequencing is reflected on by Bayley (2015) (51). The MinION pore is propriety but probably a genetically engineered α-hemolysin pore, although Oxford Nanopore are active in looking at other nanopores. Wanunu (2012) (52) reviews the range of nanopores, from bio- logical: α-hemolysin; the porin MspA from Mycobacterium smegmatis (53); and the phi29 viral packaging motor (54);

to solid state pores in silicon nitride and graphene.

A key requirement for the sequencing strategy in the MinION is for a non-gated pore, i.e. the pore is normally open and allows the flow of ions through the pore, like the toxin α-hemolysin, and unlike most protein pores. The full

3D structure of α-hemolysin was determined relatively early (55). Water soluble α-hemolysin monomers self-assemble on a membrane into a heptameric oligomer which pene- trates the membrane (56). The membrane-spanning, solvent- accessible channel consists of a 26 Å diameter entrance to a wider vestibule connected to a transmembrane β-barrel which is 26 Å wide and 50 Å long with a 22 Å exit. At the junction of the vestibule and β-barrel is a constriction 14 Å wide. The 14-strand antiparallel β-barrel is formed from monomers joined by salt-links, hydrogen bonds and hydro- phobic interactions making an extremely stable molecule (57). The α-hemolysin channel typically carries a current (Fig. 2) of about 30 pA* which is readily measured (51), or 120 pA* at 25℃ and 120 mV transmembrane voltage in 1 M KCl (58). It allows the translocation of single stranded DNA or RNA, but not double stranded DNA (49). Trans- location of nucleic acid polymers, with a transmembrane voltage, is rapid, 1~10 μs/base (59).

Potentially, fast speed is desirable but at the minimum voltage required to drive DNA through a nanopore ~0.3 bases/μs are translocated and as few as ~100 (mostly hydrated K+) ions are translocated per base (47), which means stochastic noise overwhelms the signal. Available pico-current amplifiers are typically limited to sampling rates less than 250 kHz. Deamer and Akeson (2000) (47) estimate a speed of ~1 base/ms is needed to give a 100~

1,000 fold increase in the number of ions between a purine Figure 2. Current flow through a-hemolysin (Reproduced with Permission from Cherf et al., 2012 (66))

*The actual current depends upon conditions but is easily measurable across a wide range.

(6)

and pyrimidine base. In the MinION the rate is tunable, to some extent, but 30 bases/sec is recommended (= 0.00003 bases/μs), a dedicated ASIC (application-specific integrated circuit) gives a low-noise amplifier and a digital-to-analogue convertor on a parallel chip that sits underneath the pores.

4.1.2. Speed control

Slowing down the rate of translocation through protein nanopores by changing physical parameters such as tem- perature (60) and viscosity (61), using molecular brakes by modifying the DNA with chemical tags (62), binding counter-ions Li+ or Mg2+ (63), or engineering the protein pore (64, 65) do not change velocity enough. Using protein motors, such as DNA polymerases, to feed the DNA strand through nanopores has been more successful (66) with phi29 DNA polymerase having a number of advantages including high affinity for DNA, high processivity, and stability in the electric field applied across sequencing nanopores (67).

Fueled with dNTPs phi29 DNA polymerase ratchets the displaced DNA strand through α-hemolysin (66) one base at a time. The applied voltage across the pore stretches the DNA strand in the pore so that the interbase length is in- creased from 3.4 Å in free solution to 4.2 Å (68), limiting Brownian motion and secondary structure.

However, there needs to be some mechanism to prevent polymerase activity in solution, so that the motor is only switched on after the motor and the DNA are located on the nanopore. This switch has been implemented with a blocking oligomer (69) that is bound to the DNA template and prevents extension and excision by polymerases. Cherf et al. (2012) (66) designed an improved blocking oligomer (Fig. 3) specifically for phi29 DNA polymerase. The bloc-

king polymer is annealed to the DNA primer/template junction with 25 complementary nucleotides, 2 acridine residues (zz) at the 5' end and with a 3 carbon spacer (s) and 7 abasic residues (xxxxxxx) added at the 3' end. The blocking oligomer was found to facilitate binding of the phi29 DNA polymerase and the complex remained stable in solution. The single-stranded end of the phi29-DNA complex is drawn into the pore by the applied voltage, the vestibule of the α-hemolysin is wide enough for single stranded DNA but too narrow for double stranded DNA, the force on the template strand, from the transmembrane potential, unzips the blocking oligomer pulling it into the pore and allowing the polymerase to start synthesis (Fig. 7 steps ii - iii).

4.1.3. Reading the bases

The α-hemolysin vestibule and transmembrane channel are both 50 Å long and each accommodate about 12 bases of single stranded DNA, in an ideal pore a single position, or constriction, would "read" single bases as they moved across the reading head (70). However, α-hemolysin has 3 sites, R1, R2 and R3 (71) interacting with bases at relative base positions 1, 6 and 9. Interactions at these sites com- plicate the sequence related signal. Site directed mutagenesis can be used to engineer the sites, both to remove sensing sites and enhance the signal from specific sites (72).

The mspA porin naturally has a single short constriction 10~12 Å in diameter, but mspA also requires engineering (53) to enable non-gated translocation of ions. Phi29 DNA polymerase driven DNA translocation in this mutant mspA porin can achieve single base resolution (73) although the signal is influenced by at least 3~4 nucleotides (74).

However, Stoddart et al. (2010) (75, 76) have argued that two reading sites would provide additional redundant information. A perfect reading head would provide 4 current levels to correspond with ATGC, two reading heads would potentially provide 16 levels, not too many to discriminate.

Native DNA, and even more so RNA, contains modified bases, from epigenetic modifications and DNA damage (39), including at least four common modified cytosines in eukaryotic DNA and three common modified bases in prokaryotes, with only 5-methylcytosine in common.

At least some can be discriminated in the α-hemolysin Figure 3. Blocking oligomer for phi29 DNA polymerase motor

(Reproduced with Permission from Cherf et al., 2012 (66)).

(7)

nanopore (77) but each has the potential to confound the sequence related signal even if it cannot be unequivocally identified. Two reading heads in the nanopore might be more discriminatory but the number of potential levels to discri- minate has escalated significantly. Combined with the local influence of adjacent bases (74) it is clear that improved signal resolution would be advantageous, but the fundamental biophysics (52) sets limits on the signal resolution that can be achieved.

Reading the DNA sequence from this combination of phi29 DNA polymerase as the motor for a protein nanopore is further complicated by the ratchet action of phi29 DNA polymerase (66) which can ratchet forward and in reverse.

This ratchet action can result in two kinds of error: the strand can move back and forward so that a nucleotide is read more than once, inserting one or more nucleotides in the sequence;

or the strand can slip past too fast giving a missed base.

The high throughput of second generation sequencers depends upon the speed of the individual sequencing reactions, which is relatively slow, depending as it does upon the sequential flow of individual dNTPs, and the massively parallel implementation of those reactions in wells or on plates. Nanopore sequencing is inherently faster than second generation sequencing but scaling data throughput still depends upon implementation of multiple single nanopores reading in parallel. Assembling multiple nanopores is signifi- cantly facilitated by the self-assembly of the α-hemolysin monomers, soluble in solution, into a robust pore on a membrane (Fig. 4B). As each pore may sequence multiple strands sequentially the overall rate, and the yield, is deter- mined by both the speed of translocation of DNA through the pores and the rate of capture of DNA strands, or the pore occupancy. That rate of capture is limited by rates of diffusion and template DNA concentration.

4.1.4. Bioinformatics

Bioinformatics development has been essential for second generation sequencing, but base calling and the downstream bioinformatics is fundamentally different for third generation nanopore sequencing, and is a work in progress. Essentially the signal generated at each event, as a nucleotide ratchets through the constriction(s) in the nanopore, reads a kmer, a

combination of bases influencing the signal, in a sliding window, rather than a base by base call out of the sequence.

The noisy signal has characteristics of duration, mean current level and noise (standard deviation) which can be modeled as hidden Markov models (78). The continuous signal needs to be segmented (79) to identify the events corresponding to a single base transition.

With the mspA pore, which can read at single base resolution (73, 74), the signal is influenced by approximately 4 nucleotides (74), giving, theoretically, 256 levels to resolve all combinations. Laszlo et al. (2014) (80) mapped the sequence levels detected using phi29 DNA polymerase and the mspA nanopore with synthetic sequences containing all possible combinations of the four bases to a 'quadromer' map. The map was evaluated by sequencing bacteriophage phiX174 and matching predicted current levels versus actual levels with a high correlation (r = 0.9905 at 95%

confidence). Independent measurements of the current levels for the same sequence were very reproducible, much more reproducible than the signal for 4-base kmers in different positions in the sequence, indicating that the 4-base kmer is not the only sequence-related influence on the readings.

The α-hemolysin nanopore does not resolve at single base resolution. A 5 base kmer influences the current levels, with, theoretically, 1024 current levels to resolve, for genetically engineered α-hemolysin. However, Timp et al. (2012) (81) using simulated data showed, using hidden Markov models for 3-mers, how knowledge of the previous 3-mer places limits on the next. The first two bases of the 3-mer are defined by the last two bases of the previous 3-mer. They used a hidden Markov model to define the chain of states, which represent the, possibly ambiguous, hidden triplets, and the Viterbi algorithm (82) to identify the best possible path through the states. A strategy which can be implemented for α-hemolysin 5 base kmers. All these results suggest the potential for improvements from better nanopores, protocols, base calling and data analysis.

4.2. MinIONTM sequencing 4.2.1 MinION overview

Oxford Nanopore's MinIONTM access program (MAP)

(8)

invited participants to apply late 2013. The MinION itself is an ~100 g USB 3.0 device (Fig. 4) which plugs into a user supplied, core i7 powered, Windows 7 laptop with a solid state drive and at least 8 Gb RAM. A test program detects whether the hardware is compliant. The software from Oxford Nanopore consists of a MinION control program, MinKnow, installed on the control computer and a Metrichor agent which communicates with the cloud-based, base- calling program. A configuration test cell simulates data transfer and tests communication between the device, the computer and the software.

The nanopore sequencing array, 2048 nanopores in groups of 4, is contained in a disposable flow cell which plugs into the MinION (Fig. 4). Potentially 512 pores may be se- quencing, in parallel, at ~30 bases/sec, although in practice the number of active pores can vary. Pores may remain active and sequencing for perhaps 48 h and often, but not always perhaps, up to ~0.5 Gb of sequence data may be generated, though the record is about 2 Gb. Sequence read lengths depend critically upon the length of fragments pre- sented, but in the standard protocol DNA is fragmented to

~8 kb with fragments which are 10s of kb in length often sequenced. With fragmentation to higher molecular weights, and special care to avoid fragmentation while handling the samples, sequence lengths of 100 kb and above can be achieved (83).

The Achilles heel for MinION sequencing is the error rate of base calling. At the start of the Minon Access program individual sequence reads with only 65% identity to the known sequence were being generated (84). However, error rates improved rapidly over six months in 2014 (Fig. 5) to

~85% identity (46)*. Errors seem to be primarily random and corrected by coverage, either from short read sequence data (85) or directly from overlapping reads from the Min- ION (86). The addition of a hairpin adapter to double- stranded DNA allows sequencing of both strands, forward and reverse, generating a 2D read and a consensus sequence with improved error rates.

4.2.2. Applications

Initial efforts were to achieve alignment of sequence reads with known sequence (87). In fact a burn-in experiment with a standard lambda DNA sample supplied by Oxford Nanopore is an obligatory step in the initial use of the MinION in the MAP. From failure to even match sequence reads using BLAST against the correct, known sequence Figure 4. (A) MinION MkI (B) flow cell (C) nanopore array (Individual nanopore cells reproduced, modified, with permission from Oxford Nanopore).

*A user reported having many 2D reads with 90% identity in September 2015.

(9)

(84) to easy identification of virtually all reads (1D and 2D - see later) to lambda phage as the top hit using BLAST by us (R7.3 chemistry) and probably most users, followed rapidly (Fig. 5).

4.2.3. Sample preparation

One potential advantage of nanopore sequencing is mini- mal sample preparation. For the MinION sequencing kit the initial DNA sample needs to be a long, double stranded DNA with A-tails. A master mix of adapters is ligated to the DNA to produce the complex of dsDNA, blocking oligomer, motor protein, primer, hairpin adapter and tether (Fig. 6), which address the issues with implementing nanopore se- quencing outlined earlier.

However, adapters can ligate to either end, or both ends, of dA tailed dsDNA. A dsDNA with only the blocking oligomer and motor protein adapter (leader adapter), attached to either 1 end or both ends, can bind to a pore and a single strand of the DNA will be read (1D read). A dsDNA with the leader adapter attached at one end and the hairpin adapter at the other can bind to a pore and both strands of the dsDNA can be read, in a single read (2D read), giving a forward and reverse sequence read for a single molecule.

With a tether adapter attached the dsDNA molecule binds to the synthetic membrane that the MinION nanopore is embedded in, this increases the concentration close to the pore and the rate of capture. Other combinations such as

dsDNA with only hairpins attached are not sequenced. The reaction conditions to maximize the formation of the fully Figure 6. Example protocol for SQK-MAP006 (September 2015). FFPE - Formalin Fixed Paraffin Embedded DNA repair kit.

One step end repair/dA tail NEB Ultra II kit. Magnetic beads - Agencourt AMPure XP beads. MyOne Streptavidin magnetic beads.

Figure 5. Read accuracy over the start of the MinION Access Program 2014 (Reproduced with Permission, from Loman and Watson, 2015 (46)).

(10)

adapted molecule for sequencing (2D reads) are a com- promise, to minimize 1D reads, but they can be improved if the products of adapter ligation are captured with His-Tag (or streptavidin) magnetic beads, at the product purification step. The His-Tag beads only capture dsDNA with the hairpin adapter, either fully adapted (leader one end and hairpin the other) or hairpin only. The latter are not sequenced by the pores and so the yield of 2D reads can be increased, by improved reaction conditions, modified tether design and capture by the His-Tag beads. Capture by streptavidin mag- netic beads has been introduced recently which also captures dsDNA-adapter combinations to increase the yield of 2D reads.

Product purification, by magnetic bead, is important, to remove nucleotides, enzymes etc. which might interfere, and to transfer to a compatible buffer for sequencing. The eluate from the His-Tag beads is the presequencing mix. Following priming of the sequencing flow cell with running buffer and fuel mix the presequencing mix is added to running buffer and fuel mix to generate the sequencing library and flowed across the sequencing wells by pipette.

However, careful preparation of the dA tailed fragments is necessary. A starting sample of about 1 μg of pure, high molecular weight DNA is needed. This DNA will probably be of variable fragment size, have ragged ends and some damage (88). So fragmentation to known fragment size, optionally a preCR repair step (88), end repair and dA tailing (Fig. 6) are necessary preliminary steps. Currently to ensure each step proceeds optimally the DNA products from each step are purified by magnetic bead purification. So the full protocol requires, in addition to DNA purification from the sample, about 9 steps and 5 kits, in addition to the Nanopore sequencing kit (Fig. 6).

Experienced sequencers are using Agilent Bioanalysers and Qubit fluorimeters for fragment analysis and quantitation of DNA at each stage, to ensure >100 ng of DNA in the final library. The whole library preparation, from high molecular weight DNA to library can be completed in an afternoon

and prepares enough presequence mix for 2 × 150 μl of library to load the flow cell, and then, for longer runs, top up the sample. Good runs can continue sequencing for about 48 h but probably benefit from multiple top ups, perhaps as often as every 4 h, although the standard protocol has a top up at 24 h. On the other hand, runs can be shorter if less data is required e.g. a 6 h run is used in the lambda burn-in experiment, for a 48 kb template. The flow cell can be washed, with wash buffer, and used to analyze another sample, although currently the total sequencing run time and yield seems to be reduced when wash steps are introduced.

Standard MinION protocols, including fragmentation steps, easily lead to sequencing of fragments of 8 kb and usually libraries prepared to this fragment size will contain some larger, full 2D sequences of 10, 20, 30 kb or more. Urban et al. (2015) (83) showed that read lengths in excess of 100 kb could be achieved, but with careful library preparation both ultralong (>50 kb) and reasonable sequence yields could be achieved. The full 2D sequencing process, with the role of each of the adapters and the data generated, is well represented (Fig. 7) in Jain et al. (2015) (89).

4.2.4. Software

The MinKnow software runs Python scripts which control the sequencing parameters during a run. As well as standard scripts users can develop their own scripts. The data on pore occupancy and sequencing can be displayed in real time in the MinKnow window, though not the sequence itself, MinKnow is collecting data on ion currents and saving it to file.

The files are in FAST5 format (90), a hierarchical file format widely used for large data sets in the physical sciences (like NASA's Earth Observing System), but not generally familiar to biologists. In normal use these data files will be saved to a folder designated as the upload folder for Metrichor and Metrichor will scan for completed files, upload them to the cloud, base call and write out the data as a new FAST5 format file, which is downloaded to the designated download folder.

HDFView, downloaded from: http://www.hdfgroup.org/

products/java/, displays the structure of the FAST5 files, and a right click on an item gives the option to export to a

Recently replaced by using the DNA repair kit for formalin-fixed paraffin embedded samples.

(11)

Figure 7. Molecular events and ionic-current trace for a 2D read of an M13 phage dsDNA molecule. (A) Steps in DNA translocation through the nanopore: (i) open channel; (ii) dsDNA with lead adaptor (blue), bound molecular motor (orange) and hairpin adaptor (red) is captured by the nanopore; capture is followed by translocation of the (iii) lead adaptor, (iv) template strand (gold), (v) hairpin adaptor, (vi) complement strand (dark blue) and (vii) trailing adaptor (brown); and (viii) status returns to open channel. (B) Raw current trace for the passage of the M13 dsDNA construct through the nanopore. Regions of the trace corresponding to steps i-viii are labeled. (C) Expanded time and current scale for raw current traces corresponding to steps i-viii. Each adaptor generates a unique current signal used to aid base calling. (Reproduced with Permission, from Jain et al., 2015 (89))

(12)

text file. However, to extract the fastq or fasta data or to visualize other data such as the events, as a squiggle plot (Fig. 8), there are two software packages developed by users:

Poretools (91); and the poRe library (92) for R (93).

The error profile of MinION reads means that many of the tools for next generation sequence data do not work. In- itially, aligning MinION reads against the reference sequence using BLAST was not successful and the LAST aligner (94), using adaptive sized seeds for alignment, rather than the fixed sized 11-mers used by BLAST, was much more successful in aligning MinION reads to reference sequences. However,

with improving data, in our hands BLAST alignment is also usually successful. Aligning against a reference allows cor- rection of the error prone long reads with coverage (Fig. 9).

Quick et al. (2014) (87) published data from a single run for complete coverage of the Escherichia coli genome and aligned the reads with LAST (94).

Sovic et al. (2015) (95) have developed GraphMap for read mapping of long error prone data. GraphMap outper- forms mappers such as LAST in aligning MinION reads, to find overlaps, and mapping against a reference. Warren et al.

(2015) (96) provide alignment-free software (LINKS) for Figure 8. Squiggle plot from 15 seconds of a 2D read of lambda phage DNA

Figure 9. Overlap coverage of a set of 2D MinION reads from lambda. Mapped in Geneious v. 7.3 (Biomatters Ltd).

(13)

scaffolding with long reads. Many published applications were successfully applied to data from R7 and early R7.3 chemistry, with sequence identities of less than 65% for 1D reads and 70~77% for 2D reads, while current data achieves

~85% for 2D reads and >70% for many 1D reads. Koren and Phillippy (2015) (97) review software available for long read sequence assembly.

4.2.5. Scaffolding multi-contig whole genomes Karlsson et al. (2015) (98) successfully used long MinION reads to scaffold the multi-contig draft genomes, assembled from Illumina data, for two strains of Francisella, and com- pared it with using long reads from PacBio. They used BLAST and the SSPACE-LongRead perl script (99), which uses BLASR (100), developed for error-prone PacBio reads, to perform alignments, to scaffold contigs with the PacBio and MinION data.

Similarly Risse et al. (2015) (101) used a hybrid assembly of Illumina MiSeq (898,420 MiSeq 2 × 250 bp reads) and long MinION reads (7300 2D reads average length 6.6 kb) to assemble a complete, high quality, contiguous, ~5.2 Mb genome for a previously unsequenced Bacteroides fragilis strain, strain BE1, for an estimated sequencing cost of £276.

Goodwin et al. (2015) (102) also used a hybrid approach to sequence the Saccharomyces cerevisiae genome. In this case the Illumina MiSeq reads were used to correct the MinION reads and then the genome was assembled from the error corrected MinION reads.

4.2.6. Resolving elements with repetition

Alignment of MinION long reads to assembled short read sequence data which produces multi-contig whole genomes, has been used to identify the position of elements such as antibiotic resistance islands (103), with repetitive elements (transposons) which confound short reads. Ashton et al.

(2015) (103) were able to identify the insertion site of a multi-resistance island in Salmonella typhi H58 "which, despite many whole genome sequencing projects, has not been previously characterized".

Similarly the diploid nature of eukaryotes introduces an element of repetition which means that short read sequencing cannot resolve haplotypes, which can be important in phar- mocogenetics, in determining patient response to drugs

(104). The long read capability of the MinION was used to resolve variants and haplotypes of HLA-A, HLA-B and CYP2D6, genes important in determining patient response to many drugs (104). Using GraphMap Sovic et al. (2015) (95) obtained improved resolution re-analyzing the data set.

4.2.7. De novo assembly

Error correction of even error prone long reads can be overcome with coverage, provided errors are not systematic, potentially enabling de novo assembly from nanopore se- quence data alone (105). Loman et al. (2015) (86) assem- bled the Escherichia coli K-12 genome, from MinION data alone, by identifying overlaps between reads and performing multiple alignment for error correction. Error corrected reads were then assembled, but the assembly was 'polished' using the signal-level data to get 99.5% nucleotide identity.

4.2.8. RNA

Currently the MinION sequences RNA from its cDNA copy, although direct sequencing of RNA has always been a likely application (59, 106, 107). RNA is heavily modified by methylation and many other epigenetic tags, so the current Oxford Nanopore in-house RNA-sequencing protocol reads an mRNA with an attached cDNA. The read from the cDNA gives the accurate sequence while signal-level data from the RNA indicates potentially modified bases.

However, the long read capability is still useful for RNA sequenced from cDNA. In eukaryotes many mRNAs consist of multiple isoforms formed from the joining of alternative exons. Short-read sequencing is limited in its ability to directly measure exon connectivity in mRNAs containing multiple variations. Bolisetty et al. (2015) (108) used the MinIONTM to sequence 7,899 'full-length' isoforms expressed from four Drosophila genes, Dscam1, MRP, Mhc, and Rdl to directly determine the isoforms present.

Similarly, in prokaryotic metagenomics short read se- quencing of 16S has proved a powerful tool, but is limited to sequencing selected variable regions of the 16S RNA.

Many of the assignments in these studies are to family or genus level (22) and Elshahed et al. (2008) (26) have shown how much more powerful full length 16S RNA is in these studies. Benitez-Paez et al. (2015) (109) showed species level resolution of almost full length 16S rRNA gene

(14)

amplicons sequenced through the MinION from genomic DNA from a mock community (HM-782D BEI Resources http://www.beiresources.org) of 20 bacterial strains and equimolar ribosomal RNA operon counts. The numbers of reads for each species were equivalent, in line with their equimolar concentrations, giving confidence that quantitative data could also be obtained.

Some of the issues with metagenomics, transcriptomics and metatranscriptomics revolve around the shear sample size. Single molecule sequencing and long read lengths do not resolve this issue, even if, potentially, with higher accuracy, they might reduce the coverage needed. But these technologies, whether PacBio or MinION do not currently produce the data volumes to match the sample sizes e.g. the estimated ~1,000 Gb soil metagenome (24), nor to match the data produced from second generation sequencers (4, 5).

4.2.9. Point of care

The portability and relatively low cost, for individual samples, of the MinION, and speed, makes its potential for point of care and individualized medicine clear. For example in pharmocogenetics applications Ammar et al. (2015) (104) were able to resolve variants and haplotypes of HLA-A, HLA-B and CYP2D6, genes important in determining patient drug response.

A number of studies have looked at its application in this area of real time applications, including demonstration of rapid, real-time strain typing and antibiotic resistance deter- mination (110). Similarly MinION sequencing, in a Salmo- nella outbreak was able to assign to species level within 20 min, to a serotype in 40 min and assign as part of an out- break in less than 2 h (111). For viruses Wang et al. (2015) (112) demonstrated MinION sequencing of an influenza RNA genome and Greninger et al. (2015) (113) performed rapid metagenomic identification of viruses present in blood samples from patients with acute hemorrhagic fever (Ebola virus), an asymptomatic blood donor (chikungunya virus) and a low titre sample with hepatitis C. The high titre Ebola and chikungunya viruses were detected within 4~10 min of acquiring data, while the lower titre hepatitis C was detected within 40 min, and identification to the correct viral strain was achieved in all cases. Earlier studies, with lower accuracy

flow cells and protocols, demonstrated the ability to identify and differentiate clinically relevant bacteria and viruses (114).

V. CONCLUSION

Announcement of the intention to release a small, long read nanopore sequencer (MinION) by Oxford Nanopore (https://nanoporetech.com/news-events/press-releases/view /39 accessed Aug 2015) in 2012, generated huge anticipation (3) and a long wait. However, the practical application of this simple idea (50) has had to overcome tremendous technological challenges. Despite some disappointment at error rates (84) this long read technology has been rapidly used across a wide range of applications and demonstrated much of its potential. Many of these published results are based upon earlier incarnations of the manufacturing tech- nology and protocols. During the MinION access program there have been significant advances which many of these applications will benefit from. And it is clear that there is significant room for improvement in both base calling and bioinformatics data analysis so that, even without the expected improvements in both the hardware and experimental pro- tocols, the technology is likely to improve.

REFERENCES

1) Schadt EE, Turner S, Kasarskis A. A window into third-generation sequencing. Human Mol Genet 2010;

19:R227-40.

2)Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci U S A 1977;74:5463-7.

3)Glenn TC. Field Guide to Next Generation DNA Sequencers. Mol Ecol Resour 2011;11:759-69.

4)Glenn TC. http://www.molecularecologist.com/next- gen-fieldguide-2014/ (accessed July 2015), 2014.

5) Koboldt DC, Steinberg KM, Larson DE, Wilson RK, Mardis ER. The Next-Generation Sequencing Revo- lution and Its Impact on Genomics. Cell 2013;155:27 -38.

6) Morey M, Fernández-Marmiesse A, Castiñeiras D,

(15)

Fraga JM, Couce ML, Cocho JA. A glimpse into past, present, and future DNA sequencing. Mol Genet Metab 2013;110:3-24.

7) Smith MI, Turpin W, Tyler AD, Silverberg MS, Croitoru K. Microbiome analysis - from technical advances to biological relevance. F1000Prime Rep 2014;6:51.

8) Erlich Y, Mitra PP, de la Bastide M, McCombie WR, Hannon GJ. Alta-Cyclic: a self-optimizing base caller for next-generation sequencing. Nat Methods 2008;5:

679-82.

9) Rusk N. Torrents of sequence. Nature Methods 2011;

8:44.

10) Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, et al. Accurate whole human genome sequencing using reversible terminator chem- istry. Nature 2008;456:53-9.

11) Kawashima EH, Laurent F, Pascal M. Patent (2005- 05-12): Method of nucleic acid amplification. http://

www.patentlens.net/patentlens/patent/WO_1998_044 151_A1/en/ (accessed 22 July 2015), 2005.

12) Williams R, Peisajovich SG, Miller OJ, Magdassi S, Tawfik DS, Griffiths AD. Amplification of complex gene libraries by emulsion PCR. Nat Methods 2006;

3:545-50.

13) Mardis ER. The $1,000 genome, the $100,000 analysis?

Genome Med 2010;2:84.

14)Salzberg SL, Yorke JA. Beware of mis-assembled genomes. Bioinformatics 2005;21:4320-1.

15) Taniguchi Y, Choi PJ, Li GW, Chen H, Babu M, Hearn J, et al. Quantifying E. coli Proteome and Trans- criptome with Single-Molecule Sensitivity in Single Cells. Science 2010;329:533-8.

16) Marguerat S, Schmidt A, Codlin S, Chen W, Aebersold R, Bähler J. Quantitative analysis of fission yeast trans- criptomes and proteomes in proliferating and quiescent cells. Cell 2012;151:671-83.

17) Miura F, Kawaguchi N, Yoshida M, Uematsu C, Kito K, Sakaki Y, et al. Absolute quantification of the budding yeast transcriptome by means of competitive PCR between genomic and complementary DNAs.

BMC Genomics 2008;9:574.

18) Liu Y, Ferguson JF, Xue C, Silverman IM, Gregory B, Reilly MP. Evaluating the Impact of Sequencing Depth

on Transcriptome Profiling in Human Adipose. PLoS One 2013;8:e66883.

19)SEQC/MAQC-III Consortium. A comprehensive assess- ment of RNA-seq accuracy, reproducibility and infor- mation content by the Sequencing Quality Control Consortium. Nat Biotechnol 2014;32:903-14.

20) Pretto DI, Eid JS, Yrigollen CM, Tang HT, Loomis EW, Raske C, et al. Differential increases of specific FMR1 mRNA isoforms in premutation carriers. J Med Genet 2015;52:42-52.

21) Jaffrey SR. An expanding universe of mRNA modifi- cations. Nat Struct Mol Biol 2014;21:945-6.

22)Youssef N, Sheik CS, Krumholz LR, Najar FZ, Roe BA, Elshahed MS. Comparison of species richness estimates obtained using nearly complete fragments and simulated pyrosequencing-generated fragments in 16S rRNA gene-based environmental surveys. Appl Environ Microbiol 2009;75:5227-36.

23) Wooley JC, Godzik A, Friedberg I. A Primer on Meta- genomics. PLoS Comput Biol 2010;6:e1000667.

24) Vogel TM, Simonet P, Jansson JK, Hirsch PR, Tiedje JM, van Elsas JD, et al. TerraGenome: a consortium for the sequencing of a soil metagenome. Nat Rev Microbiol 2009;7:252.

25) DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, et al. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol 2006;72:5069-72.

26) Elshahed MS, Youssef NH, Spain AM, Sheik C, Najar FZ, Sukharnikov LO, et al. Novelty and uniqueness patterns of rare members of the soil biosphere. Appl Environ Microbiol 2008;74:5422-8.

27)Huson DH, Auch AF, Qi J, Schuster SC. MEGAN analysis of metagenomic data. Genome Res 2007;17:

377-86.

28)Wommack KE, Bhavsar J, Ravel J. Metagenomics:

Read Length Matters. Appl Environ Microbiol 2008;

74:453-63.

29)Moran AM. Metatranscriptomics: eavesdropping on complex microbial communities. Microbe 2009;4:329 -35.

30)Carvalhais LC, Dennis PG, Tyson GW, Schenk PM.

Application of metatranscriptomics to soil environ- ments. J Microbiol Methods 2012;91:246-51.

(16)

31) Levene MJ, Korlach J, Turner SW, Foquet M, Craighead HG, Webb WW. Zero-mode waveguides for single- molecule analysis at high concentrations. Science 2003;

299:682-6.

32) Eid J, Fehr A, Gray J, Luong K, Lyle J, Otto G, et al.

Real-Time DNA sequencing from single polymerase molecules. Science 2009;323:133-8.

33) Uemura S, Aitken CE, Korlach J, Flusberg BA, Turner SW, Puglisi JD. Real-time tRNA transit on single trans- lating ribosomes at codon resolution. Nature 2010;

464:1012-7.

34)McCarthy A. Third Generation DNA Sequencing:

Pacific Biosciences' single molecule real time tech- nology. Chem Biol 2010;17:675-6.

35) Reuter JA, Spacek DV, Snyder MP. High-Throughput Sequencing Technologies. Mol Cell 2015;58:586-97.

36) Travers KJ, Chin CS, Rank DR, Eid JS, Turner SW. A flexible and efficient template format for circular con- sensus sequencing and SNP detection. Nucleic Acids Res 2010;38:e159.

37) English AC, Richards S, Han Y, Wang M, Vee V, Qu J, et al. Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology. PLoS One 2012;7:e47768.

38) Ribeiro FJ, Przybylski D, Yin S, Sharpe T, Gnerre S, Abouelleil A, et al. Finished bacterial genomes from shotgun sequence data. Genome Res 2012;22:2270-7.

39)Koren S, Schatz MC, Walenz BP, Martin J, Howard JT, Ganapathy G, et al. Hybrid error correction and de novo assembly of single-molecule sequencing reads.

Nat Biotechnol 2012;30:693-700.

40) Koren S, Harhay GP, Smith TP, Bono JL, Harhay DM, Mcvey SD, et al. Reducing assembly complexity of microbial genomes with single-molecule sequencing.

Genome Biol 2013;14:R101.

41) Chaisson MJ, Huddleston J, Dennis MY, Sudmant PH, Malig M, Hormozdiari F, et al. Resolving the com- plexity of the human genome using single-molecule sequencing. Nature 2015;517:608-11.

42) Flusberg FA, Webster DR, Lee JH, Travers KJ, Olivares EC, Clark TA, et al. Direct detection of DNA meth- ylation during single-molecule, real-time sequencing.

Nat Methods 2010;7:461-5.

43)Fang G, Munera D, Friedman DI, Mandlik A, Chao

MC, Banerjee O, et al. Genome-wide mapping of methylated adenine residues in pathogenic Escherichia coli using single-molecule real-time sequencing. Nat Biotechnol 2012;30:1232-9.

44) Sharon D, Tilgner H, Grubert F, Snyder M. A single- molecule long-read survey of the human transcriptome.

Nat Biotechnol 2013;31:1009-14.

45) Tilgner H, Grubert F, Sharon D, Snyder MP. Defining a personal, allele-specific, and single-molecule long- read transcriptome. Proc Natl Acad Sci U S A 2014;

111:9869-74.

46) Loman NJ, Watson M. Successful test launch for nano- pore sequencing. Nat Methods 2015;12:303-4.

47) Deamer DW, Akeson M. Nanopores and nucleic acids:

prospects for ultrarapid sequencing. Trends Biotechnol 2000;18:147-51.

48) Deamer DW, Branton D. Characterization of nucleic acids by nanopore analysis. Acc Chem Res 2002;35:

817-25.

49) Kasianowicz JJ, Brandin E, Branton D, Deamer DW.

Characterization of individual polynucleotide molecules using a membrane channel. Proc Natl Acad Sci U S A 1996;93:13770-3.

50) Pennisi E. Genome sequencing. Search for pore-fection.

Science 2012;336:534-7.

51) Bayley H. Nanopore Sequencing: From imagination to reality. Clin Chem 2015;61:25-31.

52)Wanunu M. Nanopores: A journey towards DNA sequencing. Phys Life Rev 2012;9:125-58.

53) Butler TZ, Pavlenok M, Derrington IM, Niederweis M, Gundlach JH. Single-molecule DNA detection with an engineered MspA protein nanopore. Proc Natl Acad Sci U S A 2008;105:20647-52.

54) Wendell D, Jing P, Geng J, Subramaniam V, Lee TJ, Montemagno C, et al. Translocation of double-stranded DNA through membrane-adapted phi29 motor protein nanopores. Nat Nanotechnol 2009;4:765-72.

55) Song L, Hobaugh MR, Shustak C, Cheley S, Bayley H, Gouaux JE. Structure of staphylococcal α-hemolysin, a heptameric transmembrane pore. Science 1996;274:

1859-66.

56)Füssle R, Bhakdi S, Sziegoleit A, Tranum-Jensen J, Kranz T, Wellensiek HJ. On the mechanism of mem- brane damage by Staphylococcus aureus α-toxin. J Cell

(17)

Biol 1981;91:83-94.

57) Gouaux E. α-Hemolysin from Staphylococcus aureus:

an archetype of β-barrel, channel-forming toxins. J Struct Biol 1998;21:110-22.

58) Meller A, Branton D. Single molecule measurements of DNA transport through a nanopore. Electrophoresis 2002;23:2583-91.

59)Akeson M, Branton D, Kasianowicz JJ, Brandin E, Deamer DW. Microsecond time-scale discrimination among polycytidylic acid, polyadenylic acid and poly- uridylic acid as homopolymers or as segments within single RNA molecules. Biophys J 1999;77:3227-33.

60) Meller A, Nivon L, Brandin E, Golovchenko J, Branton D. Rapid nanopore discrimination between single polynucleotide molecules. Proc Natl Acad Sci U S A 2000;97:1079-84.

61)Kawano R, Schibel AE, Cauley C, White HS. Con- trolling the translocation of single-stranded DNA through alpha-hemolysin ion channels using viscosity.

Langmuir 2009;25:1233-7.

62)Mitchell N, Howorka S. Chemical tags facilitate the sensing of individual DNA strands with nanopores.

Angew Chem Int Ed Engl 2008;47:5565-8.

63) Zhang Y, Liu L, Sha J, Ni Z, Yi H, Chen Y. Nanopore detection of DNA molecules in magnesium chloride solutions. Nanoscale Res Lett 2013;8:245.

64)Maglia G, Restrepo MR, Mikhailova E, Bayley H.

Enhanced translocation of single DNA molecules through alpha-hemolysin nanopores by manipulation of internal charge. Proc Natl Acad Sci U S A 2008;

105:19720-5.

65)Rincon-Restrepo M, Mikhailova E, Bayley H, Maglia G.

Controlled translocation of individual DNA molecules through protein nanopores with engineered molecular brakes. Nano Lett 2011;11:746-50.

66) Cherf GM, Lieberman KR, Rashid H, Lam CE, Karplus K, Akeson M. Automated Forward and Reverse Ratcheting of DNA in a Nanopore at Five Angstrom Precision. Nat Biotechnol 2012;30:344-8.

67) Lieberman KR, Cherf GM, Doody MJ, Olasagasti F, Kolodji Y, Akeson M. Processive replication of single DNA molecules in a nanopore catalyzed by phi29 DNA polymerase. J Am Chem Soc 2010;132:17961 -72.

68) Stoddart D, Franceschini L, Heron A, Bayley H, Maglia G. DNA stretching and optimization of nucleobase recognition in enzymatic nanopore sequencing. Nano- technology 2015;26:084002.

69)Olasagasti F, Lieberman KR, Benner S, Cherf GM, Dahl JM, Deamer DW, et al. Replication of individual DNA molecules under electronic control using a protein nanopore. Nat Nanotechnol 2010;5:798-806.

70) Branton D, Deamer DW, Marziali A, Bayley H, Benner SA, Butler T, et al. The potential and challenges of nanopore sequencing. Nat Biotechnol 2008;26:1146 -53.

71) Stoddart D, Heron AJ, Mikhailova E, Maglia G, Bayley H. Single-nucleotide discrimination in immobilized DNA oligonucleotides with a biological nanopore. Proc Natl Acad Sci U S A 2009;106:7702-7.

72)Ervin EN, Barrall GA, Pal P, Bean MK, Schibel AE, Hibbs AD. Creating a single sensing zone within an alpha-hemolysin pore via site-directed mutagenesis.

Bionanoscience 2014;4:78-84.

73) Manrao EA, Derrington IM, Laszlo AH, Langford KW, Hopper MK, Gillgren N, et al. Reading DNA at single- nucleotide resolution with a mutant MspA nanopore and phi29 DNA polymerase. Nat Biotechnol 2012;30:

349-53.

74) Manrao EA, Derrington IM, Pavlenok M, Niederweis M, Gundlach JH. Nucleotide discrimination with DNA immobilized in the MspA nanopore. PLoS One 2011;

6:e25723.

75) Stoddart D, Heron AJ, Klingelhoefer J, Mikhailova E, Maglia G, Bayley H. Nucleobase recognition in ssDNA at the central constriction of the α-hemolysin pore.

Nano Lett 2010;10:3633-7.

76) Stoddart D, Maglia G, Mikhailova E, Heron AJ, Bayley H. Multiple base-recognition sites in a biological nanopore: two heads are better than one. Angew Chem Int Ed Engl 2010;49:556-9.

77)Wallace EV, Stoddart D, Heron AJ, Mikhailova E, Maglia G, Donohoe TJ, et al. Identification of epi- genetic DNA modifications with a protein nanopore.

Chem Commun 2010;46:8195-7.

78) Schreiber J, Karplus K. Segmentation of noisy signals generated by a nanopore. bioRxiv 2015. doi: http://

dx.doi.org/10.1101/014258.

(18)

79)Schreiber J, Karplus K. Analysis of nanopore data using Hidden Markov Models. Bioinformatics 2015;

31:1897-903.

80)Laszlo AH, Derrington IM, Ross BC, Brinkerhoff H, Adey A, Nova IC, et al. Decoding long nanopore sequencing reads of natural DNA. Nat Biotechnol 2014;

32:829-33.

81) Timp W, Comer J, Aksimentiev A. DNA base-calling from a nanopore using a Viterbi algorithm. Biophys J 2012;102:L37-9.

82) Viterbi AJ. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. Inf Theory IEEE Trans 1967;3:260-9.

83) Urban JM, Bliss J, Lawrence CE, Gerbi SA. Sequencing ultra-long DNA molecules with the Oxford Nanopore MinION. bioRxiv 2015. doi: http://dx.doi.org/10.1101/

019281.

84)Mikheyev AS, Tin MM. A first look at the Oxford Nanopore MinION sequencer. Mol Ecol Resour 2014;

14:1097-102.

85) Madoui MA, Engelen S, Cruaud C, Belser C, Bertrand L, Alberti A, et al. Genome assembly using nanopore- guided long and error-free DNA reads. BMC Genomics 2015;16:327.

86) Loman NJ, Quick J, Simpson JT. A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat Methods 2015;12:733-5.

87) Quick J, Quinlan AR, Loman NJ. A reference bacterial genome dataset generated on the MinION(TM) port- able single-molecule nanopore sequencer. Gigascience 2014;3:22.

88) Evans TC. DNA Damage - the major cause of missing pieces from the DNA puzzle. NEB Expressions 2007;

2:1.

89)Jain M, Fiddes IT, Miga KH, Olsen HE, Paten B, Akeson M. Improved data analysis for the MinION nanopore sequencer. Nat Methods 2015;12:351-6.

90) The HDF Group. Hierarchical Data Format, version 5, 1997-2015. http://www.hdfgroup.org/HDF5/ (accessed Aug 2015), 2015.

91)Loman NJ, Quinlan AR. Poretools: a toolkit for analyzing nanopore sequence data. Bioinformatics 2014;

30:3399-401.

92) Watson M, Thomson M, Risse J, Talbot R, Santoyo-

Lopez J, Gharbi K, et al. pore: an R package for the visualization and analysis of nanopore sequencing data. Bioinformatics 2015;31:114-5.

93)R Core Team. R: A Language and Environment for Statistical Computing. https://www.R-project.org.

(accessed Aug 2015), 2015.

94)Kielbasa SM, Wan R, Sato K, Horton P, Frith MC.

Adaptive seeds tame genomic sequence comparison.

Genome Res 2011;21:487-93.

95)Sovic I, Sikic M, Wilm A, Fenlon SN, Chen S, Nagarajan N. Fast and sensitive mapping of error- prone nanopore sequencing reads with GraphMap.

bioRxiv 2015. doi: http://dx.doi.org/10.1101/020719.

96) Warren RL, Yang C, Vandervalk BP, Behsaz B, Lagman A, Jones SJ, et al. LINKS: Scalable, alignment-free scaffolding of draft genomes with long reads. Giga- Science 2015;4:35.

97) Koren S, Phillippy AM. One chromosome, one contig:

complete microbial genomes from long-read sequencing and assembly. Curr Opin Microbiol 2015;23:110-20.

98) Karlsson E, Lärkeryd A, Sjödin A, Forsman M, Stenberg P. Scaffolding of a bacterial genome using MinION nanopore sequencing. Sci Rep 2015;5:11996.

99) Boetzer M, Pirovano W. SSPACE-LongRead: scaf- folding bacterial draft genomes using long read se- quence information. BMC Bioinformatics 2014;15:211.

100)Chaisson MJ, Tesler G. Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory.

BMC Bioinformatics 2012;13:238.

101)Risse J, Thomson M, Blakely G, Koutsovoulos G, Blaxter M, Watson M. A single chromosome assembly of Bacteroides fragilis strain BE1 from Illumina and MinION nanopore sequencing data. BioRxiv 2015.

doi: http://dx.doi.org/10.1101/024323.

102) Goodwin S, Gurtowski J, Ethe-Sayers S, Deshpande P, Schatz M, McCombie WR. Oxford Nanopore Se- quencing and de novo Assembly of a Eukaryotic Genome. bioRxiv 2015. doi: http://dx.doi.org/10.1101/

013490.

103) Ashton PM, Nair S, Dallman T, Rubino S, Rabsch W, Mwaigwisya S, et al. MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island. Nat Biotechnol 2015;33:

(19)

296-300.

104) Ammar R, Paton TA, Torti D, Shlien A, Bader GD.

Long read nanopore sequencing for detection of HLA and CYP2D6 variants and haplotypes. F1000Res 2015;

4:17.

105)Szalay T, Golovchenko JA. A de novo DNA Sequencing and Variant Calling Algorithm for Nanopores. bioRxiv 2015. doi: http://dx.doi.org/10.1101/019448.

106) Ayub M, Hardwick SW, Luisi BF, Bayley H. Nanopore- Based Identification of Individual Nucleotides for Direct RNA Sequencing. Nano Lett 2013;13:6144-50.

107) Schreiber J, Wescoe ZL, Abu-Shumays R, Vivian JT, Baatar B, Karplus K, et al. Error rates for nanopore discrimination among cytosine, methylcytosine, and hydroxymethylcytosine along individual DNA strands.

Proc Natl Acad Sci U S A 2013;110:18910-5.

108)Bolisetty M, Rajadinakaran G, Graveley B. Deter- mining Exon Connectivity in Complex mRNAs by Nanopore Sequencing. bioRxiv 2015. doi: http://

dx.doi.org/10.1101/019752.

109)Benitez-Paez A, Portune K, Sanz Y. Species level resolution of 16S rRNA gene amplicons sequenced through MinIONTM portable nanopore sequencer. Bio-

Rxiv 2015. doi: http://dx.doi.org/10.1101/021758.

110)Cao MD, Ganesamoorthy D, Elliott A, Zhang H, Cooper M, Coin L. Real-time strain typing and analysis of antibiotic resistance potential using Nanopore Min- ION sequencing. BioRxiv 2015. doi: http://dx.doi.org/

10.1101/019356.

111)Quick J, Ashton P, Calus S, Chatt C, Gossain S, Hawker J, et al. Rapid draft sequencing and real-time nanopore sequencing in a hospital out break of Salmonella. Genome Biol 2015;16:114.

112)Wang J, Moore NE, Deng YM, Eccles DA, Hall RJ.

MinION nanopore sequencing of an influenza genome.

Front Microbiol 2015;6:766.

113)Greninger AL, Naccache SN, Federman S, Yu G, Mbala P, Bres V, et al. Rapid metagenomic identifi- cation of viral pathogens in clinical samples by real- time nanopore sequencing analysis. bioRxiv 2015.

doi: http://dx.doi.org/10.1101/020420.

114) Kilianski A, Haas JL, Corriveau EJ, Liem AT, Willis KL, Kadavy DR, et al. Bacterial and viral identification and differentiation by amplicon sequencing on the MinION nanopore sequencer. Gigascience 2015;4:12.

수치

Figure 5. Read accuracy over the start of  the MinION Access  Program  2014  (Reproduced  with  Permission,  from  Loman  and  Watson, 2015 (46))
Figure 7. Molecular events and ionic-current trace for a 2D read of  an M13 phage dsDNA molecule
Figure 9. Overlap coverage of  a set of  2D MinION reads from lambda. Mapped in Geneious v

참조

관련 문서

 Students needing additional information about grading policies and procedures should meet with their faculty advisor, Executive Director/Chairperson or a

1st association group – “Lifeline” reports the device status and al- lows for assigning single device only (main controller by default)4. 2nd association group –

Its reputation as the priciest real estate market in the world has guaranteed Hong Kong a place in the lower half of our investment rankings survey almost every year since

- Sufficient time for diffusion in solid & liquid (low cooling rate) - Relative amount of solid and liquid : lever rule.. : high cooling

Laser cutting process is one of flexible rapid manufacturing technologies with various advantages including a high cutting speed, manufacturing of parts with

We determined the nucleotide sequences of the mitochondrial DNA (mtDNA) control region using cloning and sequencing, and obtained the complete sequence from the cattle bones

For confirmation of results of direct real-time melting curve analysis, we also performed an in-house JAK2 V617F ASP and a BsaXI-treated nested PCR-direct

• Theory can extent to molten polymers and concentrated solutions.. The single-molecule bead spring models.. a)