Tuesday, October 7, 2014

Visualizing RNA-seq alignments with fusions: is there anything better than IGV?

I was recently working on improving the module for generating alignments with annotated fusion-supporting reads in the context of InFusion pipeline for fusion gene and chimeric transcript discovery from RNA-seq data. Once you have such alignment in BAM format it is quite useful to visually inspect the reads supporting fusion breakpoint (junction between fused genes or transcripts). First of all it is a good sanity check to see if the prediction makes sense. Additionally, inspecting the coverage around the breakpoint might give some hints if the prediction is false positive.

So far I was using IGV for this purpose and it was working nicely for me. One cool thing about IGV is that it allows to color reads by any SAM record tag. This makes possible to color-code reads belonging to the same fusion based on a custom tag which is assigned by InFusion. IGV also allows splitting the screen to visualize both parts of the fusion (sadly without zooming). You get to see something like this:


When checking several fusion predictions I noticed that IGV was not showing some fusion-supporting alignments. Later I figured out that this was due to a configuration setting of a maximum number of reads displayed. But at first I thought - why not try using something else? There are tons of genome browsers out there.

I tested several popular of them, but none worked as good as IGV. See my comments below.

Important note: all software was tested on Ubuntu 12.04 x64.

In my tests I was trying to visualize a certain region of a BAM file that according to prediction has a fusion breakpoint. This how the region looks in IGV v2.2.2:


As you can see reads belonging to fusions are shown in different colors based on a custom SAM record tag. This allows to easily distinguish between the fusion-supporting and "normal" alignments. Another very nice feature is the coverage track. It makes possible to see the overall coverage level in the region and visually detect any suspicious peaks.

Now let's see how other browsers display the same region.

IGB v8.1.11. I loaded the BAM file and successfully zoomed to the region. At first I was not able to see any reads, however after clicking on "Optimize track height" the reads appeared. Unfortunately it was not possible to color reads by arbitrary tag and I could not find an option to search for a read by name. So it was not easy to find and inspect fusion-supporting alignments:


GenomeView 2450. As previously I loaded the BAM file and zoomed to the region of interest. Only certain number of reads was shown. Also as I understand this browser has a fixed color-scheme for reads. Therefore it was not possible to visualize fusion supporting reads:


Tablet v1.14.04. I loaded the BAM file and tried to jump to the desired base in the chromosome, however the software crashed complaining that the BAM file is not sorted or index is missing (which was not true). I also tried to search by name for a certain read supporting the fusion breakpoint in order to quickly zoom to the location. The read was found successfully, but once again when I tried to go the location of the read the software crashed.

Update 08.10.2014: The developers replied very quickly to my bug-report. The issue was due to one alignment record in the BAM file having wrongly formatted CIGAR. After fixing this record the region was shown successfully. Searching read by name is a very handy feature of Tablet. However the coloring by tag is not supported. One possible solution to mark fusion-supporting reads is to assign read groups to them, but this would require some additional work.

Savant v2.0.5. I loaded the BAM file, but unfortunately when I zoomed to the region of interest the data track was flickering and it was not possible to see anything. I reported the issue to the developers. Here how it looks on a screenshot:


Conclusion. Although the bugs in the software might be fixed in future versions, at the current moment IGV remains the best browser to visualize RNA-seq alignments with fusion genes. I suppose the same true (and some people agree) for visualization of structural variations in WGS-data.