Monday, March 30, 2015

Computing coverage out of exon regions

After performing alignment of RNA-seq or exome sequencing data it is quite important to compute coverage. There are several ways how to perform this task. However, there is one additional interesting issue: check if there is any coverage outside of exon regions. For example, in case of RNA-seq this allows to show if there are some previously unknown transcripts expressed. In this post I will describe how to compute out-of-region coverage from BAM file.

To perform out-of-region coverage check two files should be available:
- alignment data in SAM/BAM format. Sample: kidney.bam.
- gene annotations in BED format. Sample: transcripts.human.64.bed.

I will describe two methods to perform this task. Of course, I will recommend second method, using my favorite tool ;)

Using BEDtools


First of all, a file listing the coordinates of regions outside of genes from annotation file is required. This can be done using command complmentBed:

$~/tools/BEDTools/bin/complementBed -i transcripts.human.64.bed -g hg19.fa.fai > transcripts.human.64.outside.bed

After, the creating out-of-region annotation file, coverage can be computed:

$~/tools/BEDTools/bin/coverageBed -abam kidney.bam -b transcripts.human.64.outside.bed -d | awk '{c+=$5;len+=1}END{print "mappedBases=" c ,"RegionsSize=" len, "meanCoverage=" c/len}'

More details about BEDtools can be found here.

Using Qualimap


Qualimap BAM QC mode supports performing analysis of a BAM file within regions from annotation file in BED/GFF/GTF formats. Additionally, there is an option to perform additional analysis in "out of regions" block. The options is called "Analyze outside regions" , more details in documentation. So, here's an example:

$ qualimap bamqc -bam kidney.bam -gff annotations/transcripts.human.64.bed -os --java-mem-size=4G

Qualimap can be downloaded from here.

Well, that's enough for now. Have fun ;)

P.S. Thanks a lot to Tristan Carland for reporting a bug in QualiMap when computing coverage out of regions. In version 2.1 the bug is fixed.


No comments: