Sunday, December 18, 2016

This finally happend: novel tool for fusion discovery from RNA-seq data

There was a long and a bit crazy story, however the toolkit that I was working on has now a publication in PLOS OnE. In short, tool has some unique features including support of strand-specificity to detect anti-sense chimeras and discovery of intergenic fusions. It's open-source and available in bitbucket repo.

It took us more than 3 years to finish the publication and this was amazing experience, which allowed me to learn what really can happen in science. Main lesson: there can be serious competition in research area that results in multiple unexpected issues. The journals with high impact factor are complex systems. Imagine submitting a manuscript, waiting for review 4 months(!) and receiving a statement that the "manuscript should be rejected while tool ability X is not important", with some strange citations. And then, exactly one month after rejection of your manuscript, there is a publication in the same journal with main statement of "the importance of ability X" in their tool. In our case the "ability X" was fusion isoform discovery confirmed with experimental validation.

This was probably main project for me during PhD and a perfect lesson. Of course, I am super grateful to my supervisor Dr. Fernando Garcia-Alcalde, to the whole team from MPIIB and to Lexogen company for amazing support. Moreover, additional useful comments I received from Prof. Steven Salzberg and he doesn't need an introduction for those who are working in sequencing data analysis area (btw, there is also his awesome blog).

Despite the complexity and multiple factors, science remains to me the most exciting and interesting area, where critical revision and communication have high impact. I will be glad to any comments, fixes or suggestions about InFusion. And of course, wish everyone not to experience any strange "non-scientific" problems, but have honest and correct reviews. Stay in science ;)

Sunday, September 18, 2016

Single cell RNA-sequencing: are there going to be some established analysis standards?

Novel technology that allows to detect precisely the cell types and understand how they are organised in such multi-combined cell systems as brain is really awesome. However, due to high complexity of the experiment procedure it is really not easy to find correct overcomes for multiple problems in research project with scRNA-seq. An interesting aspect is the selection of appropriate data analysis methods and tools. There was a nice recent publication in Genome Biology about this. Glad to note that Qualimap2 was mentioned there, stating that even though it is working on multiple datasets,however was not designed for scRNA-seq and additional precise quality control is required for such data. And I totally agree with this statement.

Actually, a good overview for scRNA-seq quality control can be found in a recent tutorial publication in f1000. Additionally, there is a nice detailed course available from University of Cambridge Bioinformatics training unit.

In general scRNA data analysis procedure has many issues that might lead to errors and completely different final results can be produced by various tools on the same data. A known example: comments about the publication describing the tool for cell-cycle heterogenity normalization. Two recent publications provide rather useful status of scRNA-seq analysis procedure in my opinion. First one is "Disentangling neural cell diversity using single-cell transcriptomics" by JF Poulin et al. This manuscript gives an overview about current status in scRNA-seq data analysis and advices about experiment strategy selection. Second one is "Comprehensive Classification of Retinal Bipolar Neurons by Single-Cell Transcriptiomics" by K Shekhar et al. There is a full description of the analysis procedure of ~25000(!) bipolar cells (retina, visibility) including source code to reproduce it. In my opinion, such description should become a standard.

Also, there are already various resources about scRNA-seq tools. For example, here's a list in special github repo from Linnarsson lab. Of course, there are many other places with such information. So, if there are any additional confident resources really useful for this topic - will be glad to see the comments ;)