Tuesday, January 10, 2012

Extract single chromosome reads from BAM/SAM

Sometimes it is required to extract subset of reads for only one specific chromosome.
It's rather easy to accomplish this task with SAMtools.

First we create the index for a BAM file. It is required for random region positioning.

samtools index accepted_hits.bam

Then we extract the data for specific region, for example chromosome 20.

samtools view -h accepted_hits.bam chr20 > accepted_hits.20.sam

6 comments:

Anonymous said...

This was very helpful, thank you! I have a question though, I downloaded a bam file for a cell line from the UCSC FTP server and viewed the generated GTF file in IGV. But the thing is, these bam files are just Huge! and take such a long time to download. I'm not aware of any easy way to download all the bam files for long RNA seq. Any idea?

Unknown said...

Hi, this is how to extract chromosome but how to extract a region or locus? for example genes A. Tq

Konstantin Okonechnikov said...

Sorry for late replies, I didn't realize somebody actually reading this blog, it was supposed as a personal worklog :)

2Vaish:
You're welcome. Not sure if there is any way to increase speed of download from UCSC. Check if the data is available on NCBI SRA, and there one can use Aspera

2Wan Fahmi: It's very easy to set exact locus, for example:
samtools view file.bam chr20:1000-2000
To get intersection with a gene first you need to create a bed annotation file and then use corresponding option -L:
samtools view -L genes.bed file.bam

AQ said...

Hi! That's what I was looking for :)

However, I have a question more. Is there a way to do this operation (extract only reads of a specific chromosome) with Rsamtools?

Thank you!

diya said...

Hi,

How can get the region from bam file, only for the gene of my interest. For example I want to get the part of bam file for the gene "ABC"

Thanks,

Anonymous said...

samtools view in.bam chr1 -b > out.bam use -b to output bam format