Wednesday, May 13, 2015

Linux: how to print header line of tab-delimeted file in column with numbers


Let's say there is a header of certain atab-delimited file that has a lot of parameters. There are many examples: annotation files, results of certain algorithms, etc...

Sometimes it's interesting to show only select certain parameters. This can be performed easily using cut command. For example, this is how to report only chromosome name and start-end positions of gene exon from GTF file:

cut -f 1,4,5 Homo_sapiens.GRCh38.78.gtf | less

To perform such task using cut command it's important to report exact indexes of required columns. But, what if there are too many columns (more than 30 for example)? How to detect exact indexes?

Here's an example of a command to detect index numbers of a first line of a file:

head -n 1 fusions.detailed.txt | tr "\t" "\n" | cat -n

This example creates a list of all parameters of a detailed report from InFusion tool. Result of this example can be found here (numbers and words in bold font).