Ion Torrent Hits 400bp Read Length Mark...Why we're excited.

It’s been a while since I have blogged on Ion Torrent data.  There are several reasons for this.

First and foremost is that my head has been buried in The Archon Genomics XPrize Validation Protocol in which we were contracted to execute staring at the end of last year.  The AGXP is an ambitious, incentivized prize competition that will award $10 million to the first team to rapidly, accurately, and economically sequence 100 whole human genomes to a level of accuracy never before achieved.  The VP provides an opportunity to test and score current WGS technologies, assist the scientific community in establishing the reference genome(s) for this Competition and the means for analyzing the results of the Competition.

Second, it was with great anticipation and excitement that we received our Certificate of Compliance recently from the US Dept. of Health and Human Services Centers for Medicare and Medicaid Services. This means we have been deemed a CLIA lab and can begin to offer our Clinical Exome test under CLIA guidelines.  No small feat.

But, here I am, again, knocking the dust off the blogging boots, and wading into another data set Ion Torrent has publically released. 

So why I am I excited about this new long read data set.  I think you can call it a cumulative excitement.  400bp is a milestone, much like 200bp, mate pairs, fast turn arounds, and high quality runs.  It is a culmination of effort over the past year to continually improve and stretch the technology.  I discussed in January our road at EdgeBio with the Ion Torrent and how we have seen marked improvements in throughput, quality, cost, and extensibility/usability of the system and bioinformatics tools.  Much of it discussed ACTUAL experience running the machine and analyzing the data.  I think this is very important.  Data sets released by machine vendors should be looked at with a careful eye and a disclaimer that it may be a while until you see those results on your machine.  That being said, as a services provider, we must remain on the edge of the technology and examine data sets before we can choose which of the many new protocols (Mate Pair, Paired End, Longer Reads, RNA-SEQ, AMpliseq, TargetSeq, Nimblgen Exomes, etc) to start through our internal QA/QC efforts to eventually provide to our clients. 

This 400bp data set release by Ion represents a new protocol that we are anxious to start running internally, but are happy to look at, pre-release, and analyze before we can move it into the wet lab. This dataset, available on the Ion Community, exhibits a modal read-length of 400 bases with the majority of bases at Q30 quality or better through the 400 bases. 

Fig1. FastQC plot of raw read average QV value across each position in the read.  Note that reads maintain an avg. Q30 QV until between 250bp and 300bp and Q20 QV out to 400bp.

Fig 2. The fraction of raw Q30 bases is cumulative up to the positions noted. Considering all the reads, all the bases, up to position 400, 55% are Q30 or greater.

One would hope that longer reads would yield improved assemblies.  From what we have seen, this is still not the case for the 400bp reads compared to the 200bp high quality reads from 316 and 318 chips.  Although, we make the argument that this is still a very good and cheap draft assembly, and have shown that this combined with some open source cloud based annotation can get you quite a bit of useful data to work within your research.  Also, one could, for more money of course, potentially generate a single scaffolds and larger contig assemblies using mate pairs.  But we haven’t done this here at Edge yet.

We are delighted Ion gave us this preview of the long read-length products in development at Ion and based off of previous experience, expect continued data quality improvements by the time of release in the second half of 2012.