Development of bioinformatics tools for sequence data mining and virus discovery. [P40]

Filloux D., Petel A., Sempere G., Claverie S., Lefeuvre P., Mahe F., Roumagnac P.. 2017. In : Livre des résumés des 16 ème Rencontres de virologie végétale. Aussois : CIRAD; CNRS, p. 101-101. Rencontres de virologie végétale, 2017-01-15/2017-01-19, Aussois (France).

Most virus diversity studies focus on samples derived from humans, domesticated plants and domesticated animals. Yet these species only comprise a tiny fraction of all species currently living on Earth. Consequently, our knowledge of the virus diversity is probably drastically biased. Recent metagenomics-based studies have identified hundreds of unknown viruses living in uncultivated and cultivated hosts. While genomic, transcriptomic and metagenomic next generation sequencing datasets are exponentially increasing, a large part of the virus-related sequences are probably still missed because (i) bioinformatics tools are still under-developed and (ii) our scientific community does not always share datasets. It is therefore crucial to better share, clean, store and analyze these datasets in order to better describe and characterize the virus diversity. To fulfill this objective, we have developed over the last few years several computational tools: ThisIsNotAPipe: a pipeline for the analysis of multiplexed viral metagenomes, MetaXplor: a Web accessible NoSQL database dedicated to the storage and management of viral metagenomic datasets, EVEFinder: a tool for the research of endogenous viral elements (EVEs) in genomes of non-viral organisms, DarkMattor: a tool for the " dark matter " mining based on the research of proteic conserved domains.

