Monday, October 22, 2012

Counting size of transcript from GTF file

Here I use grep to get records associated with query transcript and awk to perform the calculations:

cat test.gtf | grep "transcript_id \"ENSMUST00000105216\"" | egrep -v "CDS|start_codon|stop_codon" | awk '{l += $5 -$4 + 1}END{print l}'

Note that egrep is used to filter out all records containing CDS, start_codon or stop_codon. egrep is applied because it supports regular expression "logical or" statement.