Barcode misassignment with multiplexing on the HiSeq 4000
You may have seen the preprint posted on bioRxiv by Sinha, et al. describing index switching on Illumina instruments which employ exclusion amplification (ExAmp) to build clusters on patterned flow cells for new instruments (HiSeq 3000/4000/X Ten). This study identified a potential problem which can occur when libraries are multiplexed on these instruments. The authors reported an index switching frequency of 5-10%. If true, this would have serious implications for all experiments with NGS data generated on one of these instruments. However, Lutz Froenicke at the UC Davis Center has posted an excellent analysis of the data presented in this manuscript which suggests that Sinha et al. may have substantially over estimated the extent of the problem. The main contributor to index switching during ExAmp clustering is the presence of free, unused index PCR primers left over from library preparation. The libraries used in the study had significant levels of free primer in them.
This is not to say that index switching does not occur, it does but there are specific steps which can be taken to mitigate the issue. Illumina has also confirmed the problem (which they call “Index Hopping”) and described it on this web page and produced a white paper. The RTSF Genomics Core uses the Illumina TruSeq Nano DNA and TruSeq Stranded mRNA library preparation kits with the majority of samples submitted to our facility. A notable difference between library construction using TruSeq compared to Nextera XT (used in the Sinha manuscript) is that TruSeq adds indexed adapters by ligation to dsDNA fragments as compared to Nextera XT which attaches indices via PCR with barcoded primers. Residual barcoded PCR primers are more prone to be swapped for the legitimate primer during ExAmp cluster generation than are residual partially dsDNA adapters from the TruSeq kits. Data presented by Illumina shows that with good quality TruSeq Nano DNA or TruSeq Stranded mRNA libraries the frequency of index hopping is < 0.5%. The RTSF Genomics Core has tested index hopping with TruSeq Nano DNA and TruSeq Stranded mRNA libraries prepared by our facility and our results are consistent with those reported by Illumina, < 0.5% of reads are misassigned due to index swaps.
Illumina provides some Best Practices recommendations to minimize the degree of index hopping. The RTSF Genomics Core always performs an additional SPRI bead cleanup on finished libraries to remove as much residual adapter as possible. Our libraries are stored at -20°C and to the extent possible pooling happens shortly before sequencing. The most effective practice to mitigate this issue is the use of dual unique indexes when pooling libraries for sequencing. For a read to be misassigned when a multiplexed library pool contains only dual unique indexes would require both indexes to be swapped during cluster generation; the probability of this is vanishingly small. With current dual index sets available it is possible to multiplex up to 8 libraries in one lane of the Illumina HiSeq 4000 utilizing dual unique barcodes. Illumina has plans to expand their index sets to permit up to 96 dual unique combinations.
If your experiment will not permit creating pools with only dual unique indexes (i.e. you must pool more than 8 libraries per lane) how concerned should you be? As stated above the observed frequency of index swapping is very low. To have a significant effect on downstream analysis, the sample contributing any misassigned reads would need to be substantially different from the sample of the legitimate reads to produce a detectable deviation from the truth. For most projects this will not be a problem, affecting only those analyses looking at low-frequency events.