It's rather easy to accomplish this task with SAMtools.
First we create the index for a BAM file. It is required for random region positioning.
samtools index accepted_hits.bam
Then we extract the data for specific region, for example chromosome 20.
samtools view -h accepted_hits.bam chr20 > accepted_hits.20.sam
6 comments:
This was very helpful, thank you! I have a question though, I downloaded a bam file for a cell line from the UCSC FTP server and viewed the generated GTF file in IGV. But the thing is, these bam files are just Huge! and take such a long time to download. I'm not aware of any easy way to download all the bam files for long RNA seq. Any idea?
Hi, this is how to extract chromosome but how to extract a region or locus? for example genes A. Tq
Sorry for late replies, I didn't realize somebody actually reading this blog, it was supposed as a personal worklog :)
2Vaish:
You're welcome. Not sure if there is any way to increase speed of download from UCSC. Check if the data is available on NCBI SRA, and there one can use Aspera
2Wan Fahmi: It's very easy to set exact locus, for example:
samtools view file.bam chr20:1000-2000
To get intersection with a gene first you need to create a bed annotation file and then use corresponding option -L:
samtools view -L genes.bed file.bam
Hi! That's what I was looking for :)
However, I have a question more. Is there a way to do this operation (extract only reads of a specific chromosome) with Rsamtools?
Thank you!
Hi,
How can get the region from bam file, only for the gene of my interest. For example I want to get the part of bam file for the gene "ABC"
Thanks,
samtools view in.bam chr1 -b > out.bam use -b to output bam format
Post a Comment