Publications des agents du Cirad

Cirad

Resources to manage Rice Big Genomics Data

Agret C., Gottin C., Dereeper A., Tranchant-Dubreuil C., Chateau A., Diévart A., Sarah G., Mancheron A., Ruiz M., Droc G.. 2019. Taipei : ISFRG, 1 p.. International Symposium of Rice Functional Genomics. 17, 2019-11-04/2019-11-06, Taipei (Taïwan).

Constant progress in sequencing technologies creates a huge data overload. These big data combine Volume, Velocity and Variability constraints. Therefore, it is crucial to be able to integrate large amounts of heterogeneous data, with different formats and semantics, and manipulate them through complex workflows. This requires new, automated methods and tools for data integration and workflow management, to enable users with different backgrounds and interests to easily integrate and manipulate various datasets. We have developed the Rice Genome Hub, an integrative genome information system that allows centralized access to genomics and genetics data, and analytical tools to facilitate translational and applied research in rice. The hub is built using the Content Management System Drupal with the Tripal module that interacts with the Chado database. The Hub interface provides several functionalities (Blast, DotPlots, Gene Search, JBrowse, Primer Blaster, Primer Designer) to make it easy for querying, visualizing and downloading research data. We also plugged in-house tools developed by the South Green bioinformatics platform such as SNiPlay (detection and analyses of SNPs), Gigwa (filtering on genomic variations), daTALbase (exploration of data related to Xanthomonas TAL effectors), and DiffExDB (differential expression analysis).We also developed RedOak, a reference-free and alignment-free software package that allows for the indexing of a large collection of similar genomes. RedOak can be applied to reads from unassembled genomes, and it provides a nucleotide sequence query function. This software is based on a k-mer approach and has been developed to be heavily parallelized and distributed on several nodes of a cluster. Analysis of presence-absence variation (PAV) of genes among different genomes is a classical output of pan-genomic approaches. RedOak has a nucleotide sequence query function, including reverse complements, that can be used to quickly analyze the

Documents associés

Communication de congrès

Agents Cirad, auteurs de cette publication :