Projets
INRA UMR ASP - M. LEROY Philippe
Development of a plant species genomic sequence annotation pipeline on the AUVERGRID regional grid
1- Scientific context and project objectives
Since December 2005, mixed research unit (UMR) 1095 in Clermont-Ferrand has reorganised itself around two major projects focusing on bread wheat (T. aestivum). These projects study the structural and evolutionary biology aspects on the one hand (Theme 1) and functional and integrative biology aspects on the other (Theme 2). They respond to the priority thematic fields CT1 “Understanding, safeguarding and enhancing plant genetic variety” and CT2 “Understanding and controlling the genetic determinism of characters of interest and those linked to product use” of the 2005-2009 strategy outline of the French National Institute for Agricultural Research’s (INRA) Department of Plant Improvement and Genetics. The projects are being developed as part of a national and international partnership (particularly ETGI - European Triticeae Genomic Initiative & IWGSC - International Wheat Genomic Sequencing Consortium) and cover the fields of genetics, structural and functional genomics, comparative and evolutionary genomics, biochemistry, physiology, bioinformatics, modelling and genetic resources. The project presented here is therefore being carried out in this context by the “Structure, Function & Evolution of Wheat Genomes” team, coordinated by C. Feuillet (DR), for a French National Research Agency project “EXEGESE-BLE” and the international consortium IWGSC.
The projects’ objectives are:
- 1. to enable the “TriAnnotPipeline” pipeline, developed in close cooperation with the Research Unit in Genomics and Bioinformatics (INRA-Evry), to benefit from the advantages of the AUVERGRID computing grid (parallel computing).
- 2. to store and archive the data from the pipeline.
- 3. to implement new functions (modules) so as to make TriAnnotPipeline more performing and easier to use.
2- Project description
As part of the genomics programmes on bread wheat at regional (Structure, Function & Evolution of Wheat Genomes Team), national (EXEGESE-BLE project), European (ETGI) and international (IWGSC) level, and with a view to mass producing BAC (Bacterial Artificial Chromosome) sequences over the next ten years, the INRA-UBP mixed research unit Plant Health & Improvement of Clermont-Ferrand, in close cooperation with the Research Unit in Genomics and Bioinformatics (INRA-Evry), has been developing an automatic annotation pipeline of this type of sequence since 2000, mainly from species belonging to the Poaceae family. TriAnnotPipeline comes from an initial pipeline: BacAnalysis.pl, which was developed for the Génoplante projects and submitted to the French Programme Protection Agency (APP) in 2004 (IDDN.FR.001.100015.000.S.P.2004.000.10000).
TriAnnotPipeline now enables the batch execution of sequence comparison tools, repetition searches or gene predictors (FGeneSH, GeneMarkHMM, GeneID, Eugene to come). This programme can currently be found online at the Research Unit in Genomics and Bioinformatics (http://urgi.infobiogen.fr/projects/TriAnnot/). During its development as a Perl-object, the focus on standard use and modularity (XML, Bioperl, GFF3, Game) makes it possible to increase the capacity to set parameters, extensibility, ease of use and resource management, and lastly to visualise using common tools (GBrowse et Apollo). TriAnnotPipeline has already been submitted to the APP (IDDN.FR.001.050008.000.R.C.2006.000.31235).
Aware of the swift and significant increase in the production and provision of BAC sequences of bread wheat or similar species expected in the next few years, we would like, within the project LifeGrid, to use the AUVERGRID computing power with the aim of increasing the performances of TriAnnotPipeline. The Auvergne grid may be accessible from the Evry bioinformatics platform where the TriAnnot site is based, thereby increasing analysis and storage space.
We envisage 3 stages: 1. Installation of databanks and bioinformatics programmes as well as modules specific to their processing by the pipeline on the AUVERGRID grid and adaptation of the body of the pipeline (enabling data management, harmonisation and visualisation of results) so as to carry out tasks on AUVERGRID in parallel and to retrieve the results. 2. Development of a database, on AUVERGRID, for archiving data from the automatic annotation of the pipeline. 3. Maintenance & implementation of new functions (modules) in TriAnnotPipeline in line with the capacities offered by the grid.
3- Public or controlled use
This tool will be for public use. A login/password is nevertheless required for security reasons. This is very easy to obtain.
More information on the TriAnnot project is available at the following address.
4- Expected results
This project seeks to provide the international scientific community with a real WEB service (rapid, effective and user-friendly) for the high-speed assessed annotation of BAC sequences from sequencing programmes of plant species belonging to the Poaceae family in the framework of the IWGSC. With this bioinformatics tool, it must be possible to automatically annotate a large number of simultaneous BAC sequences and, eventually, a whole chromosome. The tool will also enable an online assessment of the automatic annotation results through a suitable, powerful and user-friendly graphics interface. Lastly, there will be a storage possibility for archiving and tracing annotation data from both the automatic analysis and assessment by bioanalysis.
LifeGrid, the regional information system