Table of Contents
Fragment libraries are not sufficient to fully assemble genomes due to repetitive elements that make the correct order and orientation of contigs impossible to determine. Long span read data (obtained through mate pair libraries, jumping libraries, or informative mate pairs) can be combined with fragment libraries to properly assemble next generation sequencing data into large scaffolds, enabling easier genome closure and finishing. The addition of mate pair libraries can make cost-effective genome closure a reality, with limited manual sequencing required for small genomes.
Figure 1: De novo assembly and closing Thermus aquaticus
| JGI Permanent |
| Fragment library || NxSeq® Mate Pairs |
+ fragment library
| Manually finished |
Thermus aquaticus genome
|# Contigs > 500 bp||22||54||22||NA|
|Contig N50||106 kb||79 kb||144 kb||NA|
|Max contig achieved||343,213||167,687||260,181||NA|
|Genome scaffolds > 5kb||0||0||1||1|
Max scaffold achieved
|Plasmid sizes||?||NA||14.5 kb, 70.3 kb|| 14,047 bp, 16,597 bp |
78,727 bp, and 69,906 bp
5,001,861 reads – 4,656,638 mappable
Assembled 2.5 M reads in SPAdes with K45
De novo assembly of E. coli K12 genome. 2.5M fragment reads were assembled de novo into 163 contigs over 1 kb by SPAdes 3.1. Scaffolding was performed with commercial software using 3.2M 8 kb mate pairs. The single scaffold was compared to a reference genome with Mauve 2.3.1.
Figure 3: Assembly of Repeat-Rich Mouse BACs
Assembly Forms One 171kb Scaffold
Sequence assembly for two repeat-rich mouse BACs. The sequences were assembled with DNAStar software using Ion Torrent 400 bp fragments and 5 kb NxSeq sequence data. Despite having over 50% repeat sequence, two BACs were each assembled into single scaffolds of 171 kb (shown) and 143 kb (not shown).
Lucigen has created a new paradigm in long span read technology via highly efficient mate pair library prep technology. Genomic DNA is sheared to the desired size (2-8 kb for bead-based methods and 10-20 kb for gel-based sizing methods), end repaired, A-tailed and ligated to barcode adaptors prior to size selection. The insert is ligated to a unique multiplex coupler with encrypted Chimera Code™ sequences. Samples are then treated with exonuclease to remove unwanted DNA, and finally digested with a selection of endonucleases to produce the correct sized di-tags. Biotin capture allows for the removal of unwanted DNA fragments prior to the addition of a Junction Code adaptor and re-circularization. Libraries are then PCR amplified and sequenced on an Illumina sequencer.
Figure 4. NxSeq Long Mate Pair Library Workflow
Lucigen's patent-pending Chimera Code sequences are the key to achieving ultra-high frequencies of true mate pairs, ensuring the most accurate assembly possible. Software analysis of final sequences filters out false mate pairs formed by chimeras during the library prep process. As a result, most libraries achieve >90% true mate pair efficiency.
Figure 5. Chimeric Read Detection
| E. coli DH10B |
| E. coli DH10B |
| E. coli DH10B |
|True Mate Pairs|| 2,071,267 |
| 2,094,413 |
| 2,938,426 |
|Chimeric Reads|| 96,019 |
| 148,517 |
| 152,933 |
Avg. Read Length (after split)
|170 b||161 b||159 b|
|Total Mate Pair Bases||352,115,390||337,200,493||467,209,734|
|Mapped Mate Pair Distance||2,543||5,145||6,191|
The NxSeq Long Mate Pair Library Kit can accommodate a wide range of insert sizes to fit your needs. Bead-based, gel-free fragment sizing protocols enable libraries up to 8 kb insert size, while gel-based sizing protocols will accommodate 10-20 kb insert size.
The result is tight sizing of your mate pairs, enabling accurate and complete bioinformatic assembly.
Figure 7. Long Mate Pair Libraries
An 8 kb NxSeq Long Mate Pair library was constructed using bead-based, gel-free methods, and a 10-20 kb mate pair library was constructed using gel isolation. Resulting true mate pairs were mapped against the respective reference genome to determine the resulting mate pair distances.
Want to multiplex up to 12 libraries at one time? Lucigen offers the NxSeq Long Mate Pair Library Index kit with 12 different indexed amplification primer sets (Illumina compatible). See the ordering information tab for more details.
To perform bioinformatic analysis of your Illumina runs, scripts must be run to confirm Chimera Code and Junction Code sequences as well as filter out these sequences prior to final assembly. These scripts, along with a sample data set for trial analysis can be found here.
Would you like to have a NxSeq Long Mate Pair library, but don't want to do it yourself? Contact our Custom Genomic Services group and we'll provide a no-obligation quote for a range of services offered by Lucigen.
The NxSeq Long Mate Pair Library Kit and Index Kit are compatible with Illumina MiSeq (300, 500, and 600 cycle reagent kits), HiSeq 2500 (250 cycle reagent kit), and NxSeq 500 (300 cycle reagent kit) instruments. The product is not compatible with 50, 75, or 150 cycle reagents kits for any instrument.