For example, let's search for all possible alignments matches (M operator) in CIGAR:
In [55]: import re
In [56]: match = re.findall(r'(\d+)M', '40M25N5M')
In [57]: print match
-------> print(match)
['40', '5']
Expression \d+M represents all strings having pattern "nM", where n is a number that can consist from one or multiple digits. Round brackets create a group from the number, so it can be accessed later.
Similarly one can iterate over a CIGAR string:
In [60]: match = re.findall(r'(\d+)(\w)', '40M25N5M')
In [61]: match
Out[61]: [('40', 'M'), ('25', 'N'), ('5', 'M')]
Here we use \w meta-symbol to represent any letter and round brackets for grouping.
Have fun!
1 comment:
Nice trick! Only thing is, considering = is a valid operation on a cigar string according to the specification, I use:
r'([0-9]+)([MIDNSHPX=])'
Post a Comment