Monday, March 11, 2013

Random subsample from a BAM file

If you want to extract a random subsample of reads from a BAM file it is possible to use samtools view command with parameter -s.

The tricky part is to set the random seed: it is supposed to be the integer part of the provided parameter value. So, let's say you would like to have 1% of reads in your sample and the seed number must be equal to 42. Then your command should look like this:

samtools view -s 42.01 -b accepted_hits.bam > sample.bam

This syntax is a little bit obscure, but there is also an alternative: DownsampleSam program from Picard package. Here one can set the random seed explicitly using -R option:

java -jar ~/tools/picard-tools-1.70/DownsampleSam.jar I=accepted_hits.bam P=0.01 R=42 O=sample.bam


Have fun! :)