Stephane Rombauts

Stephane Rombauts — Staff Scientist
Joined the group in 1996

As a bioinformatician with a molecular biologists background, my first project was to set up from scratch what was going to become the PlantCARE database and brought me to focus my interest on gene expression. I then have been working on the annotation of the genomic sequences that contributed to the ESSA projects for the sequencing of Arabidopsis were the first genome duplications in Arabidopsis were shown. This involved next to some dataming, a lot of manual annotation of raw genomic sequences and genes, correcting faulty annotation done by automated systems that reported poor results at that time. I since then collaborated and still now provide an input in the development and enhancement of the Eugene gene prediction platform that performs now among the best. Over the years I have kept both interest. I'm still maintaining the PlantCARE database, that became part of the PlaNet project aiming at interconnecting databases on different aspects of plant genomics, while, on the other hand, I'm involved in the annotation of new upcoming plant genomes, which brough us to adapt the Eugene platform to other plant genomes than Arabidopsis thaliana. Now I try to combine both, as more plant genomes become available and comparative methods enable more reliable in silico promoter analyses. This means that from a raw genomic sequences I follow the whole pipe line from genome annotation, to the extraction of the data necessary to study promoter sequences and find clues to decifer potential co-expressed genes in networks.

Birthdate: 05 February 1969, Gent, Belgium.

Since October 1997: Staff member in the Bioinformatics and Evolutionary Genomics at the University of Ghent, VIB, Belgium
October 1991 - June 1996: M.Sc. Biotechnology, Ghent University, Belgium.

Service as reviewer:
Journal of experimental Botany, Bioinformatics

Publications

  1. Yau, S., Krasovec, M., Benites, L. F., Rombauts, S., Groussin, M., Vancaester, E., … Piganeau, G. (2020). Virus-host coexistence in phytoplankton through the genomic lens. SCIENCE ADVANCES, 6(14).
    Virus-microbe interactions in the ocean are commonly described by "boom and bust" dynamics, whereby a numerically dominant microorganism is lysed and replaced by a virus-resistant one. Here, we isolated a microalga strain and its infective dsDNA virus whose dynamics are characterized instead by parallel growth of both the microalga and the virus. Experimental evolution of clonal lines revealed that this viral production originates from the lysis of a minority of virus-susceptible cells, which are regenerated from resistant cells. Whole-genome sequencing demonstrated that this resistant-susceptible switch involved a large deletion on one chromosome. Mathematical modeling explained how the switch maintains stable microalga-virus population dynamics consistent with their observed growth pattern. Comparative genomics confirmed an ancient origin of this "accordion" chromosome despite a lack of sequence conservation. Together, our results show how dynamic genomic rearrangements may account for a previously overlooked coexistence mechanism in microalgae-virus interactions.
  2. De Vos, Stephanie, Van Stappen, G., Sorgeloos, P., Vuylsteke, M., Rombauts, S., & Bossier, P. (2019). Identification of salt stress response genes using the Artemia transcriptome. AQUACULTURE, 500, 305–314.
    Habitat salinity is a major abiotic factor governing the activity, physiology, biology and distribution of aquatic animals. Salinity changes cause salt stress, affecting crustaceans reared in aquaculture both on an ecological and economic level. Current salt stress research in aquatic animals is mainly focused on salt stress in the gills at relatively low salinity ranges. Knowledge about whole-body salinity response in crustaceans and other organisms is lacking, especially in hypersaline conditions. Artemia franciscana is a small halophilic model crustacean able to withstand high salinities up to 300 g/l and strong osmotic shocks thanks to its mitigating strategies for fluctuating salinity levels, such as its unique larval salt gland and osmoregulatory capacity. This study aims to identify the genes responsible for Anemia's unique hypersalinity tolerance by differential expression analysis. First, the full transcriptome of A. franciscana in different metabolic and life cycle stages was assembled de novo (assembly statistics: N50 = 1,430; GC content = 35.63%; transcript number = 64,972) and functionally annotated (annotated transcripts = 36%). Then, naupliar RNA-Seq reads generated under respectively hypersaline and marine conditions were pseudo-aligned to the A. franciscana transcriptome. Expression levels in both conditions were finally compared and 177 differentially expressed, functionally annotated transcripts were identified, of which 113 transcripts with GO annotations. Signalling genes, such as EIF and several genes from the glutathione and the chitin metabolic pathways were induced in Artemia under hypersaline conditions. Hypersalinity also activated gene regulation mechanisms (expression, transcription and post transcription) in the nucleus for DNA repair, ubiquitination, and also for cell cycle arrest through La-related protein. Several lipid metabolic genes and lipid transporters were upregulated, potentially to provide energy for ion balance and to maintain membrane structure integrity. Transport of metal ions and other ions was upregulated as well to maintain ion homeostasis. Lastly, known crustacean stress response genes such as Heat shock 70 kDa protein cognate were upregulated. This work shows that salt stress in Artemia nauplii, through signal transduction, gene regulation, lipid metabolism, transport and stress response genes, has an important influence on known and novel homeostasis-repairing mechanisms in Artemia.
  3. Navia, D., Novelli, V. M., Rombauts, S., Freitas-Astua, J., de Mendonca, R. S., Nunes, M. A., … Van de Peer, Y. (2019). Draft genome assembly of the false spider mite Brevipalpus yothersi. MICROBIOLOGY RESOURCE ANNOUNCEMENTS, 8(6).
    The false spider mite Brevipalpus yothersi infests a broad host plant range and has become one of the most economically important species within the genus Brevipalpus. This phytophagous mite inflicts damage by both feeding on plants and transmitting plant viruses. Here, we report the first draft genome sequence of the false spider mite, which is also the first plant virus mite vector to be sequenced. The similar to 72 Mb genome (sequenced at 42x coverage) encodes similar to 16,000 predicted protein-coding genes.
  4. Burgess, S. T., Marr, E. J., Bartley, K., Nunn, F. G., Down, R. E., Weaver, R. J., … Nisbet, A. J. (2019). A genomic analysis and transcriptomic atlas of gene expression in Psoroptes ovis reveals feeding- and stage-specific patterns of allergen expression. BMC GENOMICS, 20.
    Background: Psoroptic mange, caused by infestation with the ectoparasitic mite, Psoroptes ovis, is highly contagious, resulting in intense pruritus and represents a major welfare and economic concern for the livestock industry Worldwide. Control relies on injectable endectocides and organophosphate dips, but concerns over residues, environmental contamination, and the development of resistance threaten the sustainability of this approach, highlighting interest in alternative control methods. However, development of vaccines and identification of chemotherapeutic targets is hampered by the lack of P. ovis transcriptomic and genomic resources. Results: Building on the recent publication of the P. ovis draft genome, here we present a genomic analysis and transcriptomic atlas of gene expression in P. ovis revealing feeding- and stage-specific patterns of gene expression, including novel multigene families and allergens. Network-based clustering revealed 14 gene clusters demonstrating either single- or multi-stage specific gene expression patterns, with 3075 female-specific, 890 male-specific and 112, 217 and 526 transcripts showing larval, protonymph and tritonymph specific-expression, respectively. Detailed analysis of P. ovis allergens revealed stage-specific patterns of allergen gene expression, many of which were also enriched in "fed" mites and tritonymphs, highlighting an important feeding-related allergenicity in this developmental stage. Pair-wise analysis of differential expression between life-cycle stages identified patterns of sex-biased gene expression and also identified novel P. ovis multigene families including known allergens and novel genes with high levels of stage-specific expression. Conclusions: The genomic and transcriptomic atlas described here represents a unique resource for the acarid-research community, whilst the OrcAE platform makes this freely available, facilitating further community-led curation of the draft P. ovis genome.
  5. Burgess, S. T., Marr, E. J., Bartley, K., Nunn, F. G., Down, R. E., Weaver, R. J., … Nisbet, A. J. (2019). A genomic analysis and transcriptomic atlas of gene expression in Psoroptes ovis reveals feeding- and stage-specific patterns of allergen expression. bioRxiv. Cold Spring Harbor Laboratory.
  6. Linsmith, G., Rombauts, S., Montanari, S., Deng, C. H., Celton, J.-M., Guérif, P., … Bianco, L. (2019). Pseudo-chromosome-length genome assembly of a double haploid “Bartlett” pear (Pyrus communis L.). GIGASCIENCE, 8(12).
    BACKGROUND: We report an improved assembly and scaffolding of the European pear (Pyrus communis L.) genome (referred to as BartlettDHv2.0), obtained using a combination of Pacific Biosciences RSII long-read sequencing, Bionano optical mapping, chromatin interaction capture (Hi-C), and genetic mapping. The sample selected for sequencing is a double haploid derived from the same "Bartlett" reference pear that was previously sequenced. Sequencing of di-haploid plants makes assembly more tractable in highly heterozygous species such as P. communis. FINDINGS: A total of 496.9 Mb corresponding to 97% of the estimated genome size were assembled into 494 scaffolds. Hi-C data and a high-density genetic map allowed us to anchor and orient 87% of the sequence on the 17 pear chromosomes. Approximately 50% (247 Mb) of the genome consists of repetitive sequences. Gene annotation confirmed the presence of 37,445 protein-coding genes, which is 13% fewer than previously predicted. CONCLUSIONS: We showed that the use of a doubled-haploid plant is an effective solution to the problems presented by high levels of heterozygosity and duplication for the generation of high-quality genome assemblies. We present a high-quality chromosome-scale assembly of the European pear Pyrus communis and demostrate its high degree of synteny with the genomes of Malus x Domestica and Pyrus x bretschneideri.
  7. Burgess, S. T., Bartley, K., Nunn, F., Wright, H. W., Hughes, M., Gemmell, M., … Nisbet, A. J. (2018). Draft genome assembly of the poultry red mite, Dermanyssus gallinae. MICROBIOLOGY RESOURCE ANNOUNCEMENTS, 7(18). https://doi.org/10.1128/mra.01221-18
    The poultry red mite, Dermanyssus gallinae, is a major worldwide concern in the egg-laying industry. Here, we report the first draft genome assembly and gene prediction of Dermanyssus gallinae, based on combined PacBio and MinION long-read de novo sequencing. The ∼959-Mb genome is predicted to encode 14,608 protein-coding genes.
  8. Nishiyama, T., Sakayama, H., de Vries, J., Buschmann, H., Saint-Marcoux, D., Ullrich, K. K., Haas, F. B., et al. (2018). The Chara genome : secondary complexity and implications for plant terrestrialization. CELL, 174(2), 448–464.
    Land plants evolved from charophytic algae, among which Charophyceae possess the most complex body plans. We present the genome of Chara braunii; comparison of the genome to those of land plants identified evolutionary novelties for plant terrestrialization and land plant heritage genes. C. braunii employs unique xylan synthases for cell wall biosynthesis, a phragmoplast (cell separation) mechanism similar to that of land plants, and many phytohormones. C. braunii plastids are controlled via landplant- like retrograde signaling, and transcriptional regulation is more elaborate than in other algae. The morphological complexity of this organism may result from expanded gene families, with three cases of particular note: genes effecting tolerance to reactive oxygen species (ROS), LysM receptor-like kinases, and transcription factors (TFs). Transcriptomic analysis of sexual reproductive structures reveals intricate control by TFs, activity of the ROS gene network, and the ancestral use of plant-like storage and stress protection proteins in the zygote.
  9. Burgess, S. T., Bartley, K., Marr, E. J., Wright, H. W., Weaver, R. J., Prickett, J. C., … Nisbet, A. J. (2018). Draft genome assembly of the sheep scab mite, Psoroptes ovis. MICROBIOLOGY RESOURCE ANNOUNCEMENTS, 6(16). https://doi.org/10.1128/genomea.00265-18
    Sheep scab, caused by infestation with Psoroptes ovis, is highly contagious, results in intense pruritus, and represents a major welfare and economic concern. Here, we report the first draft genome assembly and gene prediction of P. ovis based on PacBio de novo sequencing. The ∼63.2-Mb genome encodes 12,041 protein-coding genes.
  10. Tzfadia, O., Bocobza, S., Defoort, J., Almekias-Siegl, E., Panda, S., Levy, M., Storme, V., et al. (2018). The “TranSeq” 3’-end sequencing method for high-throughput transcriptomics and gene space refinement in plant genomes. PLANT JOURNAL, 96(1), 223–232.
    High-throughput RNA sequencing has proven invaluable not only to explore gene expression but also for both gene prediction and genome annotation. However, RNA sequencing, carried out on tens or even hundreds of samples, requires easy and cost-effective sample preparation methods using minute RNA amounts. Here, we present TranSeq, a high-throughput 3'-end sequencing procedure that requires 10- to 20-fold fewer sequence reads than the current transcriptomics procedures. TranSeq significantly reduces costs and allows a greater increase in size of sample sets analyzed in a single experiment. Moreover, in comparison with other 3'-end sequencing methods reported to date, we demonstrate here the reliability and immediate applicability of TranSeq and show that it not only provides accurate transcriptome profiles but also produces precise expression measurements of specific gene family members possessing high sequence similarity. This is difficult to achieve in standard RNA-seq methods, in which sequence reads cover the entire transcript. Furthermore, mapping TranSeq reads to the reference tomato genome facilitated the annotation of new transcripts improving >45% of the existing gene models. Hence, we anticipate that using TranSeq will boost large-scale transcriptome assays and increase the spatial and temporal resolution of gene expression data, in both model and non-model plant species. Moreover, as already performed for tomato (ITAG3.0; www.solgenomics.net), we strongly advocate its integration into current and future genome annotations.
  11. Krasovec, M., Vancaester, E., Rombauts, S., Bucchini, F., Yau, S., Hemon, C., Lebredonchel, H., et al. (2018). Genome analyses of the microalga Picochlorum provide insights into the evolution of thermotolerance in the green lineage. GENOME BIOLOGY AND EVOLUTION, 10(9), 2347–2365.
    While the molecular events involved in cell responses to heat stress have been extensively studied, our understanding of the genetic basis of basal thermotolerance, and particularly its evolution within the green lineage, remains limited. Here, we present the 13.3-Mb haploid genome and transcriptomes of a halotolerant and thermotolerant unicellular green alga, Picochlorum costavermella (Trebouxiophyceae) to investigate the evolution of the genomic basis of thermotolerance. Differential gene expression at high and standard temperatures revealed that more of the gene families containing up-regulated genes at high temperature were recently evolved, and less originated at the ancestor of green plants. Inversely, there was an excess of ancient gene families containing transcriptionally repressed genes. Interestingly, there is a striking overlap between the thermotolerance and halotolerance transcriptional rewiring, as more than one-third of the gene families up-regulated at 35 degrees C were also up-regulated under variable salt concentrations in Picochlorum SE3. Moreover, phylogenetic analysis of the 9,304 protein coding genes revealed 26 genes of horizontally transferred origin in P. costavermella, of which five were differentially expressed at higher temperature. Altogether, these results provide new insights about how the genomic basis of adaptation to halo- and thermotolerance evolved in the green lineage.
  12. Canaguier, A., Grimplet, J., Di Gaspero, G., Scalabrin, S., Duchêne, E., Choisne, N., … Adam-Blondon, A.-F. (2017). A new version of the grapevine reference genome assembly (12X.v2) and of its annotation (VCost.v3). GENOMICS DATA, 14, 56–62. https://doi.org/10.1016/j.gdata.2017.09.002
  13. Vidal-Quist, J., Ortego, F., Rombauts, S., Castanera, P., & Hernandez-Crespo, P. (2017). Dietary shifts have consequences for the repertoire of allergens produced by the European house dust mite. MEDICAL AND VETERINARY ENTOMOLOGY, 31(3), 272–280.
    Products manufactured from mass-cultured house dust mites, currently commercialized for the diagnosis and immunotherapy of allergy, are heterogeneous in terms of allergen composition and thus present concerns to regulatory authorities. The most abundant species, Dermatophagoides pteronyssinus (Trouessart) (Astigmata: Pyroglyphidae), produces 19 allergenic proteins. Many of these are putatively involved in mite digestive physiology and metabolism. This study aimed to evaluate the effects of mite-rearing media on allergen production. Mites were adapted to feed on culture media supplemented with proteins, lipids, carbohydrates or beard shavings, and collected to quantify major allergens (Der p 1 and 2) by immunodetection, transcription of allergen genes by real-time quantitative polymerase chain reaction, and allergen-related enzymatic activities. All culture media significantly affected the content of major allergens. Modification of macronutrients in the diet produced minor effects on the transcription of allergen genes, but significantly altered mite allergen-related activities. The most remarkable impacts were detected in mites feeding on beard shavings and were reflected in reductions in the content of major allergens, alterations in the transcription of nine allergen genes, and changes in eight allergen-related activities. These results demonstrate the importance of culture media to the quality and consistency of mite extracts used for pharmaceuticals, and highlight the need to further elucidate allergen production by mites in the laboratory and in domestic environments.
  14. Orr, Russell JS, Rombauts, S., Van de Peer, Y., & Shalchian-Tabrizi, K. (2017). Draft genome sequences of two unclassified Chitinophagaceae bacteria, IBVUCB1 and IBVUCB2, isolated from environmental samples. GENOME ANNOUNCEMENTS, 5(33).
    We report here the draft genome sequences of two Chitinophagaceae bacteria, IBVUCB1 and IBVUCB2, assembled from metagenomes of surface samples from freshwater lakes. The genomes are >99% complete and may represent new genera within the Chitinophagaceae family, indicating a larger diversity than currently identified.
  15. Orr, Russel JS, Rombauts, S., Van de Peer, Y., & Shalchian-Tabrizi, K. (2017). Draft genome sequences of two unclassified bacteria, Hydrogenophaga sp. strains IBVHS1 and IBVHS2, isolated from environmental samples. GENOME ANNOUNCEMENTS, 5(34).
  16. Miclotte, G., Plaisance, S., Rombauts, S., Van de Peer, Y., Audenaert, P., & Fostier, J. (2017). OMSim : a simulator for optical map data. BIOINFORMATICS, 33(17), 2740–2742.
    Motivation: The Bionano Genomics platform allows for the optical detection of short sequence patterns in very long DNA molecules (up to 2.5 Mbp). Molecules with overlapping patterns can be assembled to generate a consensus optical map of the entire genome. In turn, these optical maps can be used to validate or improve de novo genome assembly projects or to detect large-scale structural variation in genomes. Simulated optical map data can assist in the development and benchmarking of tools that operate on those data, such as alignment and assembly software. Additionally, it can help to optimize the experimental setup for a genome of interest. Such a simulator is currently not available. Results: We have developed a simulator, OMSim, that produces synthetic optical map data that mimics real Bionano Genomics data. These simulated data have been tested for compatibility with the Bionano Genomics Irys software system and the Irys-scaffolding scripts. OMSim is capable of handling very large genomes (over 30 Gbp) with high throughput and low memory requirements.
  17. Miclotte, G., Heydari, M., Demeester, P., Rombauts, S., Van de Peer, Y., Audenaert, P., & Fostier, J. (2016). Jabba: hybrid error correction for long sequencing reads. ALGORITHMS FOR MOLECULAR BIOLOGY, 11, 10.
    Background: Third generation sequencing platforms produce longer reads with higher error rates than second generation technologies. While the improved read length can provide useful information for downstream analysis, underlying algorithms are challenged by the high error rate. Error correction methods in which accurate short reads are used to correct noisy long reads appear to be attractive to generate high-quality long reads. Methods that align short reads to long reads do not optimally use the information contained in the second generation data, and suffer from large runtimes. Recently, a new hybrid error correcting method has been proposed, where the second generation data is first assembled into a de Bruijn graph, on which the long reads are then aligned. Results: In this context we present Jabba, a hybrid method to correct long third generation reads by mapping them on a corrected de Bruijn graph that was constructed from second generation data. Unique to our method is the use of a pseudo alignment approach with a seed-and-extend methodology, using maximal exact matches (MEMs) as seeds. In addition to benchmark results, certain theoretical results concerning the possibilities and limitations of the use of MEMs in the context of third generation reads are presented. Conclusion: Jabba produces highly reliable corrected reads: almost all corrected reads align to the reference, and these alignments have a very high identity. Many of the aligned reads are error-free. Additionally, Jabba corrects reads using a very low amount of CPU time. From this we conclude that pseudo alignment with MEMs is a fast and reliable method to map long highly erroneous sequences on a de Bruijn graph.
  18. Cao, T. N. P., Greenhalgh, R., Dermauw, W., Rombauts, S., Bajda-Wybouw, S., Zhurov, V., … Clark, R. M. (2016). Complex evolutionary dynamics of massively expanded chemosensory receptor families in an extreme generalist chelicerate herbivore. GENOME BIOLOGY AND EVOLUTION, 8(11), 3323–3339.
    While mechanisms to detoxify plant produced, anti-herbivore compounds have been associated with plant host use by herbivores, less is known about the role of chemosensory perception in their life histories. This is especially true for generalists, including chelicerate herbivores that evolved herbivory independently from the more studied insect lineages. To shed light on chemosensory perception in a generalist herbivore, we characterized the chemosensory receptors (CRs) of the chelicerate two-spotted spider mite, Tetranychus urticae, an extreme generalist. Strikingly, T. urticae has more CRs than reported in any other arthropod to date. Including pseudogenes, 689 gustatory receptors were identified, as were 136 degenerin/Epithelial Na+ Channels (ENaCs) that have also been implicated as CRs in insects. The genomic distribution of T. urticae gustatory receptors indicates recurring bursts of lineage-specific proliferations, with the extent of receptor clusters reminiscent of those observed in the CR-rich genomes of vertebrates or C. elegans. Although pseudogenization of many gustatory receptors within clusters suggests relaxed selection, a subset of receptors is expressed. Consistent with functions as CRs, the genomic distribution and expression of ENaCs in lineage-specific T. urticae expansions mirrors that observed for gustatory receptors. The expansion of ENaCs in T. urticae to > 3-fold that reported in other animals was unexpected, raising the possibility that ENaCs in T. urticae have been co-opted to fulfill a major role performed by unrelated CRs in other animals. More broadly, our findings suggest an elaborate role for chemosensory perception in generalist herbivores that are of key ecological and agricultural importance.
  19. Saltykova, A., Pulido-Tamayo, S., Pazoutova, M., Rensing, S. A., Nishiyama, T., Van de Peer, Y., Marchal, K., et al. (2015). Identifying prokaryotic consortia that live in close interactions with algae. EUROPEAN JOURNAL OF PHYCOLOGY (Vol. 50, pp. 145–146). Presented at the 6th Euopean Phycological congress.
  20. Blanc-Mathieu, R., Verhelst, B., Derelle, E., Rombauts, S., Bouget, F.-Y., Carre, I., Chateau, A., et al. (2014). An improved genome of the model marine alga Ostreococcus tauri unfolds by assessing Illumina de novo assemblies. BMC GENOMICS, 15.
    Background: Cost effective next generation sequencing technologies now enable the production of genomic datasets for many novel planktonic eukaryotes, representing an understudied reservoir of genetic diversity. O. tauri is the smallest free-living photosynthetic eukaryote known to date, a coccoid green alga that was first isolated in 1995 in a lagoon by the Mediterranean sea. Its simple features, ease of culture and the sequencing of its 13 Mb haploid nuclear genome have promoted this microalga as a new model organism for cell biology. Here, we investigated the quality of genome assemblies of Illumina GAIIx 75 bp paired end reads from Ustreococcus touri, thereby also improving the existing assembly and showing the genome to be stably maintained in culture. Results: The 3 assemblers used, ABySS, CLCBio and Velvet, produced 95% complete genomes in 1402 to 2080 scaffolds with a very low rate of misassembly. Reciprocally, these assemblies improved the original genome assembly by filling in 930 gaps. Combined with additional analysis of raw reads and PCR sequencing effort, 1194 gaps have been solved in total adding up to 460 kb of sequence. Mapping of RNAseq II lumina data on this updated genome led to a twofold reduction in the proportion of multi-exon protein coding genes, representing 19% of the total 7699 protein coding genes. The comparison of the DNA extracted in 2001 and 2009 revealed the fixation of 8 single nucleotide substitutions and 2 deletions during the approximately 6000 generations in the lab. The deletions either knocked out or truncated two predicted transmembrane proteins, including a glutamate receptor like gene. Conclusion: High coverage (>80 fold) paired end Illumina sequencing enables a high quality 95% complete genome assembly of a compact 13 Mb haploid eukaryote. This genome sequence has remained stable for 6000 generations of lab culture.
  21. Grimplet, J., Adam-Blondon, A.-F., Bert, P.-F., Bitz, O., Cantu, D., Davies, C., Delrot, S., et al. (2014). The grapevine gene nomenclature system. BMC GENOMICS, 15.
    Background: Grapevine (Vitis vinifera L.) is one of the most important fruit crops in the world and serves as a valuable model for fruit development in woody species. A major breakthrough in grapevine genomics was achieved in 2007 with the sequencing of the Vitis vinifera cv. PN40024 genome. Subsequently, data on structural and functional characterization of grape genes accumulated exponentially. To better exploit the results obtained by the international community, we think that a coordinated nomenclature for gene naming in species with sequenced genomes is essential. It will pave the way for the accumulation of functional data that will enable effective scientific discussion and discovery. The exploitation of data that were generated independently of the genome release is hampered by their heterogeneous nature and by often incompatible and decentralized storage. Classically, large amounts of data describing gene functions are only available in printed articles and therefore remain hardly accessible for automatic text mining. On the other hand, high throughput "Omics" data are typically stored in public repositories, but should be arranged in compendia to better contribute to the annotation and functional characterization of the genes. Results: With the objective of providing a high quality and highly accessible annotation of grapevine genes, the International Grapevine Genome Project (IGGP) commissioned an international Super-Nomenclature Committee for Grape Gene Annotation (sNCGGa) to coordinate the effort of experts to annotate the grapevine genes. The goal of the committee is to provide a standard nomenclature for locus identifiers and to define conventions for a gene naming system in this paper. Conclusions: Learning from similar initiatives in other plant species such as Arabidopsis, rice and tomato, a versatile nomenclature system has been developed in anticipation of future genomic developments and annotation issues. The sNCGGa's first outreach to the grape community has been focused on implementing recommended guidelines for the expert annotators by: (i) providing a common annotation platform that enables community-based gene curation, (ii) developing a gene nomenclature scheme reflecting the biological features of gene products that is consistent with that used in other organisms in order to facilitate comparative analyses.
  22. Andolfo, G., Sanseverino, W., Rombauts, S., Van de Peer, Y., Bradeen, J., Carputo, D., Frusciante, L., et al. (2013). Overview of tomato (Solanum lycopersicum) candidate pathogen recognition genes reveals important Solanum R locus dynamics. NEW PHYTOLOGIST, 197(1), 223–237.
    To investigate the genome-wide spatial arrangement of R loci, a complete catalogue of tomato (Solanum lycopersicum) and potato (Solanum tuberosum) nucleotide-binding site (NBS) NBS, receptor-like protein (RLP) and receptor-like kinase (RLK) gene repertories was generated. Candidate pathogen recognition genes were characterized with respect to structural diversity, phylogenetic relationships and chromosomal distribution. NBS genes frequently occur in clusters of related gene copies that also include RLP or RLK genes. This scenario is compatible with the existence of selective pressures optimizing coordinated transcription. A number of duplication events associated with lineage-specific evolution were discovered. These findings suggest that different evolutionary mechanisms shaped pathogen recognition gene cluster architecture to expand and to modulate the defence repertoire. Analysis of pathogen recognition gene clusters associated with documented resistance function allowed the identification of adaptive divergence events and the reconstruction of the evolution history of these loci. Differences in candidate pathogen recognition gene number and organization were found between tomato and potato. Most candidate pathogen recognition gene orthologues were distributed at less than perfectly matching positions, suggesting an ongoing lineage-specific rearrangement. Indeed, a local expansion of Toll/Interleukin-1 receptor (TIR)-NBS-leucine-rich repeat (LRR) (TNL) genes in the potato genome was evident. Taken together, these findings have implications for improved understanding of the mechanisms of molecular adaptive selection at Solanum R loci.
  23. Dermauw, W., Wybouw, N., Rombauts, S., Menten, B., Vontas, J., Grbić, M., Clark, R. M., et al. (2013). A link between host plant adaptation and pesticide resistance in the polyphagous spider mite Tetranychus urticae. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 110(2), E113–E122.
    Plants produce a wide range of allelochemicals to defend against herbivore attack, and generalist herbivores have evolved mechanisms to avoid, sequester, or detoxify a broad spectrum of natural defense compounds. Successful arthropod pests have also developed resistance to diverse classes of pesticides and this adaptation is of critical importance to agriculture. To test whether mechanisms to overcome plant defenses predispose the development of pesticide resistance, we examined adaptation of the generalist two-spotted spider mite, Tetranychus urticae, to host plant transfer and pesticides. T. urticae is an extreme polyphagous pest with more than 1,100 documented hosts and has an extraordinary ability to develop pesticide resistance. When mites from a pesticide-susceptible strain propagated on bean were adapted to a challenging host (tomato), transcriptional responses increased over time with ∼7.5% of genes differentially expressed after five generations. Whereas many genes with altered expression belonged to known detoxification families (like P450 monooxygenases), new gene families not previously associated with detoxification in other herbivores showed a striking response, including ring-splitting dioxygenase genes acquired by horizontal gene transfer. Strikingly, transcriptional profiles of tomato-adapted mites resembled those of multipesticide-resistant strains, and adaptation to tomato decreased the susceptibility to unrelated pesticide classes. Our findings suggest key roles for both an expanded environmental response gene repertoire and transcriptional regulation in the life history of generalist herbivores. They also support a model whereby selection for the ability to mount a broad response to the diverse defense chemistry of plants predisposes the evolution of pesticide resistance in generalists.
  24. Zimmer, A. D., Lang, D., Buchta, K., Rombauts, S., Nishiyama, T., Hasebe, M., Van de Peer, Y., et al. (2013). Reannotation and extended community resources for the genome of the non-seed plant Physcomitrella patens provide insights into the evolution of plant gene structures and functions. BMC GENOMICS, 14.
    Background: The moss Physcomitrella patens as a model species provides an important reference for early-diverging lineages of plants and the release of the genome in 2008 opened the doors to genome-wide studies. The usability of a reference genome greatly depends on the quality of the annotation and the availability of centralized community resources. Therefore, in the light of accumulating evidence for missing genes, fragmentary gene structures, false annotations and a low rate of functional annotations on the original release, we decided to improve the moss genome annotation. Results: Here, we report the complete moss genome re-annotation (designated V1.6) incorporating the increased transcript availability from a multitude of developmental stages and tissue types. We demonstrate the utility of the improved P. patens genome annotation for comparative genomics and new extensions to the cosmoss.org resource as a central repository for this plant "flagship" genome. The structural annotation of 32,275 protein-coding genes results in 8387 additional loci including 1456 loci with known protein domains or homologs in Plantae. This is the first release to include information on transcript isoforms, suggesting alternative splicing events for at least 10.8% of the loci. Furthermore, this release now also provides information on non-protein-coding loci. Functional annotations were improved regarding quality and coverage, resulting in 58% annotated loci (previously: 41%) that comprise also 7200 additional loci with GO annotations. Access and manual curation of the functional and structural genome annotation is provided via the www.cosmoss.org model organism database. Conclusions: Comparative analysis of gene structure evolution along the green plant lineage provides novel insights, such as a comparatively high number of loci with 5'-UTR introns in the moss. Comparative analysis of functional annotations reveals expansions of moss house-keeping and metabolic genes and further possibly adaptive, lineage-specific expansions and gains including at least 13% orphan genes.
  25. Van Moerkercke, A., Fabris, M., Pollier, J., Baart, G., Rombauts, S., Hasnain, G., Rischer, H., et al. (2013). CathaCyc, a metabolic pathway database built from Catharanthus roseus RNA-Seq data. PLANT AND CELL PHYSIOLOGY, 54(5), 673–685.
    drugs vinblastine and vincristine. The TIA pathway operates in a complex metabolic network that steers plant growth and survival. Pathway databases and metabolic networks reconstructed from 'omics' sequence data can help to discover missing enzymes, study metabolic pathway evolution and, ultimately, engineer metabolic pathways. To date, such databases have mainly been built for model plant species with sequenced genomes. Although genome sequence data are not available for most medicinal plant species, next-generation sequencing is now extensively employed to create comprehensive medicinal plant transcriptome sequence resources. Here we report on the construction of CathaCyc, a detailed metabolic pathway database, from C. roseus RNA-Seq data sets. CathaCyc (version 1.0) contains 390 pathways with 1,347 assigned enzymes and spans primary and secondary metabolism. Curation of the pathways linked with the synthesis of TIAs and triterpenoids, their primary metabolic precursors, and their elicitors, the jasmonate hormones, demonstrated that RNA-Seq resources are suitable for the construction of pathway databases. CathaCyc is accessible online ( ext-link-type="uri" xlink:href="http://www.cathacyc.org" xmlns:xlink="http://www.w3.org/1999/xlink">http://www.cathacyc.org) and offers a range of tools for the visualization and analysis of metabolic networks and 'omics' data. Overlay with expression data from publicly available RNA-Seq resources demonstrated that two well-characterized C. roseus terpenoid pathways, those of TIAs and triterpenoids, are subject to distinct regulation by both developmental and environmental cues. We anticipate that databases such as CathaCyc will become key to the study and exploitation of the metabolism of medicinal plants.
  26. Pavlidi, Nena, Dermauw, W., Rombauts, S., Chrisargiris, A., Van Leeuwen, T., & Vontas, J. (2013). Analysis of the olive fruit fly Bactrocera oleae transcriptome and phylogenetic classification of the major detoxification gene families. PLOS ONE, 8(6).
    he olive fruit fly Bactrocera oleae has a unique ability to cope with olive flesh, and is the most destructive pest of olives worldwide. Its control has been largely based on the use of chemical insecticides, however, the selection of insecticide resistance against several insecticides has evolved. The study of detoxification mechanisms, which allow the olive fruit fly to defend against insecticides, and/or phytotoxins possibly present in the mesocarp, has been hampered by the lack of genomic information in this species. In the NCBI database less than 1,000 nucleotide sequences have been deposited, with less than 10 detoxification gene homologues in total. We used 454 pyrosequencing to produce, for the first time, a large transcriptome dataset for B. oleae. A total of 482,790 reads were assembled into 14,204 contigs. More than 60% of those contigs (8,630) were larger than 500 base pairs, and almost half of them matched with genes of the order of the Diptera. Analysis of the Gene Ontology (GO) distribution of unique contigs, suggests that, compared to other insects, the assembly is broadly representative for the B. oleae transcriptome. Furthermore, the transcriptome was found to contain 55 P450, 43 GST-, 15 CCE- and 18 ABC transporter-genes. Several of those detoxification genes, may putatively be involved in the ability of the olive fruit fly to deal with xenobiotics, such as plant phytotoxins and insecticides. In summary, our study has generated new data and genomic resources, which will substantially facilitate molecular studies in B. oleae, including elucidation of detoxification mechanisms of xenobiotic, as well as other important aspects of olive fruit fly biology.
  27. Pollier, J., Rombauts, S., & Goossens, A. (2013). Analysis of RNA-Seq data with TopHat and Cufflinks for genome-wide expression analysis of jasmonate-treated plants and plant cell cultures. In Alain Goossens & L. Pauwels (Eds.), Jasmonate signaling : methods and protocols (Vol. 1011, pp. 305–315). New York, NY, USA: Humana Press.
    The recent development of various deep sequencing techniques has led to the most powerful transcript profiling method available to date, RNA sequencing or RNA-Seq. Besides the identification of new genes and new splice variants of known genes, RNA-Seq allows to compare the whole transcriptome of any organism under two or more experimental conditions, such as before and after jasmonate treatment. However, the vast amounts of data generated during RNA-Seq experiments require complex computational methods for read mapping and expression quantification. Here, we describe a detailed protocol for the analysis of deep sequencing data, starting from the raw RNA-Seq reads. First, a quality check is performed on the raw reads to assess the quality of the sequencing. Subsequently, adapters and low-quality sequences are trimmed off the raw reads. The resulting processed reads are mapped to the reference genome, and the mapped reads are counted to generate expression data for the annotated genes for each sample. This method can be used for the analysis of RNA-Seq data of any organism for which a reference genome is available.
  28. Fabris, M., Matthijs, M., Rombauts, S., Vyverman, W., Goossens, A., & Baart, G. (2012). The metabolic blueprint of Phaeodactylum tricornutum reveals a eukaryotic Entner-Doudoroff glycolytic pathway. PLANT JOURNAL, 70(6), 1004–1014.
    Diatoms are one of the most successful groups of unicellular eukaryotic algae. Successive endosymbiotic events contributed to their flexible metabolism, making them competitive in variable aquatic habitats. Although the recently sequenced genomes of the model diatoms Phaeodactylum tricornutum and Thalassiosira pseudonana have provided the first insights into their metabolic organization, the current knowledge on diatom biochemistry remains fragmentary. By means of a genome-wide approach, we developed DiatomCyc, a detailed pathway/genome database of P. tricornutum. DiatomCyc contains 286 pathways with 1719 metabolic reactions and 1613 assigned enzymes, spanning both the central and parts of the secondary metabolism of P. tricornutum. Central metabolic pathways, such as those of carbohydrates, amino acids and fatty acids, were covered. Furthermore, our understanding of the carbohydrate model in P. tricornutum was extended. In particular we highlight the discovery of a functional EntnerDoudoroff pathway, an ancient alternative for the glycolytic EmbdenMeyerhofParnas pathway, and a putative phosphoketolase pathway, both uncommon in eukaryotes. DiatomCyc is accessible online (), and offers a range of software tools for the visualization and analysis of metabolic networks and omics data. We anticipate that DiatomCyc will be key to gaining further understanding of diatom metabolism and, ultimately, will feed metabolic engineering strategies for the industrial valorization of diatoms.
  29. Veenstra, J. A., Rombauts, S., & Grbić, M. (2012). In silico cloning of genes encoding neuropeptides, neurohormones and their putative G-protein coupled receptors in a spider mite. INSECT BIOCHEMISTRY AND MOLECULAR BIOLOGY, 42(4), 277–295.
  30. Sato, S., Tabata, S., Hirakawa, H., Asamizu, E., Shirasawa, K., Isobe, S., Kaneko, T., et al. (2012). The tomato genome sequence provides insights into fleshy fruit evolution. NATURE, 485(7400), 635–641.
    Tomato (Solanum lycopersicum) is a major crop plant and a model system for fruit development. Solanum is one of the largest angiosperm genera(1) and includes annual and perennial plants from diverse habitats. Here we present a high-quality genome sequence of domesticated tomato, a draft sequence of its closest wild relative, Solanum pimpinellifolium(2), and compare them to each other and to the potato genome (Solanum tuberosum). The two tomato genomes show only 0.6% nucleotide divergence and signs of recent admixture, but show more than 8% divergence from potato, with nine large and several smaller inversions. In contrast to Arabidopsis, but similar to soybean, tomato and potato small RNAs map predominantly to gene-rich chromosomal regions, including gene promoters. The Solanum lineage has experienced two consecutive genome triplications: one that is ancient and shared with rosids, and a more recent one. These triplications set the stage for the neofunctionalization of genes controlling fruit characteristics, such as colour and fleshiness.
  31. Moreau, H., Verhelst, B., Couloux, A., Derelle, E., Rombauts, S., Grimsley, N., Van Bel, M., et al. (2012). Gene functionalities and genome structure in Bathycoccus prasinos reflect cellular specializations at the base of the green lineage. GENOME BIOLOGY, 13(8).
    Background: Bathycoccus prasinos is an extremely small cosmopolitan marine green alga whose cells are covered with intricate spider's web patterned scales that develop within the Golgi cisternae before their transport to the cell surface. The objective of this work is to sequence and analyze its genome, and to present a comparative analysis with other known genomes of the green lineage. Research: Its small genome of 15 Mb consists of 19 chromosomes and lacks transposons. Although 70% of all B. prasinos genes share similarities with other Viridiplantae genes, up to 428 genes were probably acquired by horizontal gene transfer, mainly from other eukaryotes. Two chromosomes, one big and one small, are atypical, an unusual synapomorphic feature within the Mamiellales. Genes on these atypical outlier chromosomes show lower GC content and a significant fraction of putative horizontal gene transfer genes. Whereas the small outlier chromosome lacks colinearity with other Mamiellales and contains many unknown genes without homologs in other species, the big outlier shows a higher intron content, increased expression levels and a unique clustering pattern of housekeeping functionalities. Four gene families are highly expanded in B. prasinos, including sialyltransferases, sialidases, ankyrin repeats and zinc ion-binding genes, and we hypothesize that these genes are associated with the process of scale biogenesis. Conclusion: The minimal genomes of the Mamiellophyceae provide a baseline for evolutionary and functional analyses of metabolic processes in green plants.
  32. Young, N. D., Debellé, F., Oldroyd, G. E., Geurts, R., Cannon, S. B., Udvardi, M. K., Benedito, V. A., et al. (2011). The Medicago genome provides insight into the evolution of rhizobial symbioses. NATURE, 480(7378), 520–524.
    Legumes (Fabaceae or Leguminosae) are unique among cultivated plants for their ability to carry out endosymbiotic nitrogen fixation with rhizobial bacteria, a process that takes place in a specialized structure known as the nodule. Legumes belong to one of the two main groups of eurosids, the Fabidae, which includes most species capable of endosymbiotic nitrogen fixation(1). Legumes comprise several evolutionary lineages derived from a common ancestor 60 million years ago (Myr ago). Papilionoids are the largest clade, dating nearly to the origin of legumes and containing most cultivated species(2). Medicago truncatula is a long-established model for the study of legume biology. Here we describe the draft sequence of the M. truncatula euchromatin based on a recently completed BAC assembly supplemented with Illumina shotgun sequence, together capturing similar to 94% of all M. truncatula genes. A whole-genome duplication (WGD) approximately 58 Myr ago had a major role in shaping the M. truncatula genome and thereby contributed to the evolution of endosymbiotic nitrogen fixation. Subsequent to the WGD, the M. truncatula genome experienced higher levels of rearrangement than two other sequenced legumes, Glycine max and Lotus japonicus. M. truncatula is a close relative of alfalfa (Medicago sativa), a widely cultivated crop with limited genomics tools and complex autotetraploid genetics. As such, the M. truncatula genome sequence provides significant opportunities to expand alfalfa's genomic toolbox.
  33. Grbić, M., Van Leeuwen, T., Clark, R. M., Rombauts, S., Rouzé, P., Grbić, V., Osborne, E. J., et al. (2011). The genome of Tetranychus urticae reveals herbivorous pest adaptations. NATURE, 479(7374), 487–492.
    The spider mite Tetranychus urticae is a cosmopolitan agricultural pest with an extensive host plant range and an extreme record of pesticide resistance. Here we present the completely sequenced and annotated spider mite genome, representing the first complete chelicerate genome. At 90 megabases T. urticae has the smallest sequenced arthropod genome. Compared with other arthropods, the spider mite genome shows unique changes in the hormonal environment and organization of the Hox complex, and also reveals evolutionary innovation of silk production. We find strong signatures of polyphagy and detoxification in gene families associated with feeding on different hosts and in new gene families acquired by lateral gene transfer. Deep transcriptome analysis of mites feeding on different plants shows how this pest responds to a changing host environment. The T. urticae genome thus offers new insights into arthropod evolution and plant-herbivore interactions, and provides unique opportunities for developing novel plant protection strategies.
  34. Mortier, V., Fenta, B. A., Martens, C., Rombauts, S., Holsters, M., Kunert, K., & Goormachtig, S. (2011). Search for nodulation-related CLE genes in the genome of Glycine max. JOURNAL OF EXPERIMENTAL BOTANY, 62(8), 2571–2583.
    CLE peptides are potentially involved in nodule organ development and in the autoregulation of nodulation (AON), a systemic process that restricts nodule number. A genome-wide survey of CLE peptide genes in the soybean glycine max genome resulted in the identification of 39 GmCLE genes, the majority of which have not yet been annotated. qRT-PCR analysis indicated two different nodulation-related CLE expression patterns, one linked with nodule primordium development and a new one linked with nodule maturation. Moreover, two GmCLE gene pairs, encoding group-III CLE peptides that were previously shown to be involved in AON, had a transient expression pattern during nodule development, were induced by the essential nodulation hormone cytokinin, and one pair was also slightly induced by the addition of nitrate. Hence, our data support the hypothesis that group-III CLE peptides produced in the nodules are involved in primordium homeostasis and intertwined in activating AON, but not in sustaining it.
  35. Boruc, J., Van Den Daele, H., Hollunder, J., Rombauts, S., Mylle, E., Hilson, P., … Russinova, E. (2010). Functional modules in the Arabidopsis core cell cycle binary protein-protein interaction network. PLANT CELL, 22(4), 1264–1280.
  36. Boruc, J., Mylle, E., Duda, M., De Clercq, R., Rombauts, S., Geelen, D., Hilson, P., et al. (2010). Systematic localization of the Arabidopsis core cell cycle proteins reveals novel cell division complexes. PLANT PHYSIOLOGY, 152(2), 553–565.
    Cell division depends on the correct localization of the cyclin-dependent kinases that are regulated by phosphorylation, cyclin proteolysis, and protein-protein interactions. Although immunological assays can define cell cycle protein abundance and localization, they are not suitable for detecting the dynamic rearrangements of molecular components during cell division. Here, we applied an in vivo approach to trace the subcellular localization of 60 Arabidopsis (Arabidopsis thaliana) core cell cycle proteins fused to green fluorescent proteins during cell division in tobacco (Nicotiana tabacum) and Arabidopsis. Several cell cycle proteins showed a dynamic association with mitotic structures, such as condensed chromosomes and the preprophase band in both species, suggesting a strong conservation of targeting mechanisms. Furthermore, colocalized proteins were shown to bind in vivo, strengthening their localization-function connection. Thus, we identified unknown spatiotemporal territories where functional cell cycle protein interactions are most likely to occur.
  37. Rehrauer, H., Aquino, C., Gruissem, W., Henz, S. R., Hilson, P., Laubinger, S., Naouar, N., et al. (2010). AGRONOMICS1: A New Resource for Arabidopsis Transcriptome Profiling. PLANT PHYSIOLOGY, 152(2), 487–499.
  38. Mortier, V., Den Herder, G., Whitford, R., Van De Velde, W., Rombauts, S., D’haeseleer, K., Holsters, M., et al. (2010). CLE peptides control Medicago truncatula nodulation locally and systemically. PLANT PHYSIOLOGY, 153(1), 222–237.
  39. Mueller, Lukas, Klein Lankhorst, R., Tanksley, S. D., Giovannoni, J. J., White, R., Vrebalov, J., Fei, Z., et al. (2009). A snapshot of the emerging tomato genome sequence. PLANT GENOME, 2(1), 78–92.
    The genome of tomato (Solanum lycopersicum L.) is being sequenced by an international consortium of 10 countries (Korea, China, the United Kingdom, India, the Netherlands, France, Japan, Spain, Italy, and the United States) as part of the larger “International Solanaceae Genome Project (SOL): Systems Approach to Diversity and Adaptation” initiative. The tomato genome sequencing project uses an ordered bacterial artificial chromosome (BAC) approach to generate a high-quality tomato euchromatic genome sequence for use as a reference genome for the Solanaceae and euasterids. Sequence is deposited at GenBank and at the SOL Genomics Network (SGN). Currently, there are around 1000 BACs finished or in progress, representing more than a third of the projected euchromatic portion of the genome. An annotation effort is also underway by the International Tomato Annotation Group. The expected number of genes in the euchromatin is ∼40,000, based on an estimate from a preliminary annotation of 11% of finished sequence. Here, we present this first snapshot of the emerging tomato genome and its annotation, a short comparison with potato (Solanum tuberosum L.) sequence data, and the tools available for the researchers to exploit this new resource are also presented. In the future, whole-genome shotgun techniques will be combined with the BAC-by-BAC approach to cover the entire tomato genome. The high-quality reference euchromatic tomato sequence is expected to be near completion by 2010.
  40. Worden, A. Z., Lee, J.-H., Mock, T., Rouzé, P., Simmons, M. P., Aerts, A. L., Allen, A. E., et al. (2009). Green evolution and dynamic adaptations revealed by genomes of the marine picoeukaryotes Micromonas. SCIENCE, 324(5924), 268–272.
    Picoeukaryotes are a taxonomically diverse group of organisms less than 2 micrometers in diameter. Photosynthetic marine picoeukaryotes in the genus Micromonas thrive in ecosystems ranging from tropical to polar and could serve as sentinel organisms for biogeochemical fluxes of modern oceans during climate change. These broadly distributed primary producers belong to an anciently diverged sister clade to land plants. Although Micromonas isolates have high 18S ribosomal RNA gene identity, we found that genomes from two isolates shared only 90% of their predicted genes. Their independent evolutionary paths were emphasized by distinct riboswitch arrangements as well as the discovery of intronic repeat elements in one isolate, and in metagenomic data, but not in other genomes. Divergence appears to have been facilitated by selection and acquisition processes that actively shape the repertoire of genes that are mutually exclusive between the two isolates differently than the core genes. Analyses of the Micromonas genomes offer valuable insights into ecological differentiation and the dynamic nature of early plant evolution.
  41. de Almeida Engler, J., De Veylder, L., De Groodt, R., Rombauts, S., Boudolf, V., De Meyer, B., Hemerly, A. S., et al. (2009). Systematic analysis of cell-cycle gene expression during Arabidopsis development. Plant Journal, 59(4), 645–660.
    The steady-state distribution of cell-cycle transcripts in Arabidopsis thaliana seedlings was studied in a broad in situ survey to provide a better understanding of the expression of cell-cycle genes during plant development. The 61 core cell-cycle genes analyzed were expressed at variable levels throughout the different plant tissues: 23 genes generally in dividing and young differentiating tissues, 34 genes mostly in both dividing and differentiated tissues and four gene transcripts primarily in differentiated tissues. Only 21 genes had a typical patchy expression pattern, indicating tight cell-cycle regulation. The increased expression of 27 cell-cycle genes in the root elongation zone hinted at their involvement in the switch from cell division to differentiation. The induction of 20 cell-cycle genes in differentiated cortical cells of etiolated hypocotyls pointed to their possible role in the process of endoreduplication. Of seven cyclin-dependent kinase inhibitor genes, five were upregulated in etiolated hypocotyls, suggesting a role in cell-cycle arrest. Nineteen genes were preferentially expressed in pericycle cells activated by auxin that give rise to lateral root primordia. Approximately 1800 images have been collected and can be queried via an online database. Our in situ analysis revealed that 70% of the cell-cycle genes, although expressed at different levels, show a large overlap in their localization. The lack of regulatory motifs in the upstream regions of the analyzed genes suggests the absence of a universal transcriptional control mechanism for all cell-cycle genes.
  42. Den Herder, G., De Keyser, A., De Rycke, R., Rombauts, S., Van De Velde, W., Clemente, M. R., Verplancke, C., et al. (2008). Seven in absentia proteins affect plant growth and nodulation in Medicago truncatula. PLANT PHYSIOLOGY, 148(1), 369–382.
    Protein ubiquitination is a posttranslational regulatory process essential for plant growth and interaction with the environment. E3 ligases, to which the seven in absentia (SINA) proteins belong, determine the specificity by selecting the target proteins for ubiquitination. SINA proteins are found in animals as well as in plants, and a small gene family with highly related members has been identified in the genome of rice (Oryza sativa), Arabidopsis (Arabidopsis thaliana), Medicago truncatula, and poplar (Populus trichocarpa). To acquire insight into the function of SINA proteins in nodulation, a dominant negative form of the Arabidopsis SINAT5 was ectopically expressed in the model legume M. truncatula. After rhizobial inoculation of the 35S:SINAT5DN transgenic plants, fewer nodules were formed than in control plants, and most nodules remained small and white, a sign of impaired symbiosis. Defects in rhizobial infection and symbiosome formation were observed by extensive microscopic analysis. Besides the nodulation phenotype, transgenic plants were affected in shoot growth, leaf size, and lateral root number. This work illustrates a function for SINA E3 ligases in a broad spectrum of plant developmental processes, including nodulation.
  43. Benhamed, M., Martin-Magniette, M.-L., Taconnat, L., Bitton, F., Servet, C., De Clercq, R., De Meyer, B., et al. (2008). Genome-scale Arabidopsis promoter array identifies targets of the histone acetyltransferase GCN5. PLANT JOURNAL, 56(3), 493–504.
    We have assembled approximately 20 000 Arabidopsis thaliana promoter regions, compatible with functional studies that require cloning and with microarray applications. The promoter fragments can be captured as modular entry clones (MultiSite Gateway format) via site-specific recombinational cloning, and transferred into vectors of choice to investigate transcriptional networks. The fragments can also be amplified by PCR and printed on glass arrays. In combination with immunoprecipitation of protein-DNA complexes (ChIP-chip), these arrays enable characterization of binding sites for chromatin-associated proteins or the extent of chromatin modifications at genome scale. The Arabidopsis histone acetyltransferase GCN5 associated with 40% of the tested promoters. At most sites, binding did not depend on the integrity of the GCN5 bromodomain. However, the presence of the bromodomain was necessary for binding to 11% of the promoter regions, and correlated with acetylation of lysine 14 of histone H3 in these promoters. Combined analysis of ChIP-chip and transcriptomic data indicated that binding of GCN5 does not strictly correlate with gene activation. GCN5 has previously been shown to be required for light-regulated gene expression and growth, and we found that GCN5 targets were enriched in early light-responsive genes. Thus, in addition to its transcriptional activation function, GCN5 may play an important role in priming activation of inducible genes under non-induced conditions.
  44. Rensing, S. A., Lang, D., Zimmer, A. D., Terry, A., Salamov, A., Shapiro, H., Nishiyama, T., et al. (2008). The Physcomitrella genome reveals evolutionary insights into the conquest of land by plants. SCIENCE, 319(5859), 64–69.
    We report the draft genome sequence of the model moss Physcomitrella patens and compare its features with those of flowering plants, from which it is separated by more than 400 million years, and unicellular aquatic algae. This comparison reveals genomic changes concomitant with the evolutionary movement to land, including a general increase in gene family complexity; loss of genes associated with aquatic environments ( e. g., flagellar arms); acquisition of genes for tolerating terrestrial stresses ( e. g., variation in temperature and water availability); and the development of the auxin and abscisic acid signaling pathways for coordinating multicellular growth and dehydration response. The Physcomitrella genome provides a resource for phylogenetic inferences about gene function and for experimental analysis of plant processes through this plant's unique facility for reverse genetics.
  45. Foissac, S., Gouzy, J., Rombauts, S., Mathé, C., Amselem, J., Sterck, L., Van de Peer, Y., et al. (2008). Genome annotation in plants and fungi: EuGène as a model platform. CURRENT BIOINFORMATICS, 3(2), 87–97.
    In this era of whole genome sequencing, reliable genome annotations ( identification of functional regions) are the cornerstones for many subsequent analyses. Not only is careful annotation important for studying the gene and gene family content of a genome and its host, but also for wide-scale transcriptome and proteome analyses attempting to describe a certain biological process or to get a global picture of a cell's behavior. Although the number of sequenced genomes is increasing thanks to the application of new technologies, genome-wide analyses will critically depend on the quality of the genome annotations. However, the annotation process is more complicated in the plant field than in the animal field because of the limited funding that leads to much fewer experimental data and less annotation expertise. This situation calls for highly automated annotation platforms that can make the best use of all available data, experimental or not. We discuss how the gene prediction (the process of predicting protein gene structures in genomic sequences) research field increasingly shifts from methods that typically exploited one or two types of data to more integrative approaches that simultaneously deal with various experimental, statistical, or other in silico evidence. We illustrate the importance of integrative approaches for producing high-quality automatic annotations of genomes of plants and algae as well as of fungi that live in close association with plants using the platform EuGene as an example.
  46. Sterck, L., Rombauts, S., Vandepoele, K., Rouzé, P., & Van de Peer, Y. (2007). How many genes are there in plants (... and why are they there)? CURRENT OPINION IN PLANT BIOLOGY, 10(2), 199–203.
    Annotation of the first few complete plant genomes has revealed that plants have many genes. For Arabidopsis, over 26 500 gene loci have been predicted, whereas for rice, the number adds up to 41 000. Recent analysis of the poplar genome suggests more than 45 000 genes, and partial sequence data from Medicago and Lotus also suggest that these plants contain more than 40 000 genes. Nevertheless, estimations suggest that ancestral angiosperms had no more than 12 000-14 000 genes. One explanation for the large increase in gene number during angiosperm evolution is gene duplication. It has been shown previously that the retention of duplicates following small- and large-scale duplication events in plants is substantial. Taking into account the function of genes that have been duplicated, we are now beginning to understand why many plant genes might have been retained, and how their retention might be linked to the typical lifestyle of plants.
  47. Palenik, Brian, Grimwood, J., Aerts, A., Rouzé, P., Salamov, A., Putnam, N., Dupont, C., et al. (2007). The tiny eukaryote Ostreococcus provides genomic insights into the paradox of plankton speciation. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 104(18), 7705–7710.
    The smallest known eukaryotes, at approximate to 1-mu m diameter, are ostreococcus tauri and related species of marine phytoplankton. The genome of Ostreococcus lucimarinus has been completed and compared with that of O. tauri. This comparison reveals surprising differences across orthologous chromosomes in the two species from highly syntenic chromosomes in most cases to chromosomes with almost no similarity. Species divergence in these phytoplankton is occurring through multiple mechanisms acting differently on different chromosomes and likely including acquisition of new genes through horizontal gene transfer. We speculate that this latter process may be involved in altering the cell-surface characteristics of each species. In addition, the genome of O. lucimarinus provides insights into the unique metal metabolism of these organisms, which are predicted to have a large number of selenocysteine-containing proteins. Selenoenzymes are more catalytically active than similar enzymes lacking selenium, and thus the cell may require less of that protein. As reported here, selenoenzymes, novel fusion proteins, and loss of some major protein families including ones associated with chromatin are likely important adaptations for achieving a small cell size.
  48. Capoen, W., Den Herder, J., Rombauts, S., De Gussem, J., De Keyser, A., Holsters, M., & Goormachtig, S. (2007). Comparative transcriptome analysis reveals common and specific tags for root hair and crack-entry invasion in Sesbania rostrata. PLANT PHYSIOLOGY, 144(4), 1878–1889.
    The tropical legume Sesbania rostrata provides its microsymbiont Azorhizobium caulinodans with versatile invasion strategies to allow nodule formation in temporarily flooded habitats. In aerated soils, the bacteria enter via the root hair curling mechanism. Submergence prevents this epidermal invasion by accumulation of inhibiting concentrations of ethylene and, under these conditions, the bacterial colonization occurs via intercellular cortical infection at lateral root bases. The transcriptome of both invasion ways was compared by cDNA-amplified fragment length polymorphism analysis. Clusters of gene tags were identified that were specific for either epidermal or cortical invasion or were shared by both. The data provide insight into mechanisms that control infection and illustrate that entry via the epidermis adds a layer of complexity to rhizobial invasion.
  49. Den Herder, J., Lievens, S., Rombauts, S., Holsters, M., & Goormachtig, S. (2007). A symbiotic plant peroxidase involved in bacterial invasion of the tropical legume Sesbania rostrata. PLANT PHYSIOLOGY, 144(2), 717–727.
    Aquatic nodulation on the tropical legume Sesbania rostrata occurs at lateral root bases via intercellular crack-entry invasion. A gene was identified (Srprx1) that is transiently up-regulated during the nodulation process and codes for a functional class III plant peroxidase. The expression strictly depended on bacterial nodulation factors (NFs) and could be modulated by hydrogen peroxide, a downstream signal for crack-entry invasion. Expression was not induced after wounding or pathogen attack, indicating that the peroxidase is a symbiosis-specific isoform. In situ hybridization showed Srprx1 transcripts around bacterial infection pockets and infection threads until they reached the central tissue of the nodule. A root nodule extensin (SrRNE1) colocalized with Srprx1 both in time and space and had the same NF requirement, suggesting a function in a similar process. Finally, in mixed inoculation nodules that were invaded by NF-deficient bacteria and differed in infection thread progression, infection-associated peroxidase transcripts were not observed. Lack of Srprx1 gene expression could be one of the causes for the aberrant structure of the infection threads.
  50. Ruttink, T., Arend, M., Morreel, K., Storme, V., Rombauts, S., Fromm, J., Bhalerao, R. P., et al. (2007). A molecular timetable for apical bud formation and dormancy induction in poplar. PLANT CELL, 19(8), 2370–2390.
    The growth of perennial plants in the temperate zone alternates with periods of dormancy that are typically initiated during bud development in autumn. In a systems biology approach to unravel the underlying molecular program of apical bud development in poplar (Populus tremula 3 Populus alba), combined transcript and metabolite profiling were applied to a high-resolution time course from short-day induction to complete dormancy. Metabolite and gene expression dynamics were used to reconstruct the temporal sequence of events during bud development. Importantly, bud development could be dissected into bud formation, acclimation to dehydration and cold, and dormancy. To each of these processes, specific sets of regulatory and marker genes and metabolites are associated and provide a reference frame for future functional studies. Light, ethylene, and abscisic acid signal transduction pathways consecutively control bud development by setting, modifying, or terminating these processes. Ethylene signal transduction is positioned temporally between light and abscisic acid signals and is putatively activated by transiently low hexose pools. The timing and place of cell proliferation arrest (related to dormancy) and of the accumulation of storage compounds (related to acclimation processes) were established within the bud by electron microscopy. Finally, the identification of a large set of genes commonly expressed during the growth-to-dormancy transitions in poplar apical buds, cambium, or Arabidopsis thaliana seeds suggests parallels in the underlying molecular mechanisms in different plant organs.
  51. Fawcett, J., Rombauts, S., Pattyn, P., Sterck, L., & Van de Peer, Y. (2007). The annotation and analysis of the genome of Arabidopsis lyrata. GENES & GENETIC SYSTEMS (Vol. 82, pp. 520–520). Presented at the 79th Annual meeting of the Genetics Society of Japan.
  52. Derelle, E., Ferraz, C., Rombauts, S., Rouzé, P., Worden, A. Z., Robbens, S., Partensky, F., et al. (2006). Genome analysis of the smallest free-living eukaryote Ostreococcus tauri unveils many unique features. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 103(31), 11647–11652.
    The green lineage is reportedly 1,500 million years old, evolving shortly after the endosymbiosis event that gave rise to early photosynthetic eukaryotes. In this study, we unveil the complete genome sequence of an ancient member of this lineage, the unicellular green alga Ostreococcus tauri (Prasinophyceae). This cosmopolitan marine primary producer is the world's smallest free-living eukaryote known to date. Features likely reflecting optimization of environmentally relevant pathways, including resource acquisition, unusual photosynthesis apparatus, and genes potentially involved in C-4 photosynthesis, were observed, as was downsizing of many gene families. Overall, the 12.56-Mb nuclear genome has an extremely high gene density, in part because of extensive reduction of intergenic regions and other forms of compaction such as gene fusion. However, the genome is structurally complex. It exhibits previously unobserved levels of heterogeneity for a eukaryote. Two chromosomes differ structurally from the other eighteen. Both have a significantly biased G+C content, and, remarkably, they contain the majority of transposable elements. Many chromosome 2 genes also have unique codon usage and splicing, but phylogenetic analysis and composition do not support alien gene origin. In contrast, most chromosome 19 genes show no similarity to green lineage genes and a large number of them are specialized in cell surface processes. Taken together, the complete genome sequence, unusual features, and downsized gene families, make O. tauri an ideal model system for research on eukaryotic genome evolution, including chromosome specialization and green lineage ancestry.
  53. Cannon, S. B., Sterck, L., Rombauts, S., Sato, S., Cheung, F., Gouzy, J., Wang, X., et al. (2006). Legume genome evolution viewed through the Medicago truncatula and Lotus japonicus genomes. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 103(40), 14959–14964.
    Genome sequencing of the model legumes, Medicago truncatula and Lotus japonicus, provides an opportunity for large-scale sequence-based comparison of two genomes in the same plant family. Here we report synteny comparisons between these species, including details about chromosome relationships, large-scale synteny blocks, microsynteny within blocks, and genome regions lacking clear correspondence. The Lotus and Medicago genomes share a minimum of 10 large-scale synteny blocks, each with substantial collinearity and frequently extending the length of whole chromosome arms. The proportion of genes syntenic and collinear within each synteny block is relatively homogeneous. Medicago-Lotus comparisons also indicate similar and largely homogeneous gene densities, although gene-containing regions in Mt occupy 20-30% more space than Lj counterparts, primarily because of larger numbers of Mt retrotransposons. Because the interpretation of genome comparisons is complicated by large-scale genome duplications, we describe synteny, synonymous substitutions and phylogenetic analyses to identify and date a probable whole-genome duplication event. There is no direct evidence for any recent large-scale genome duplication in either Medicago or Lotus but instead a duplication predating speciation. Phylogenetic comparisons place this duplication within the Rosid I clade, clearly after the split between legumes and Salicaceae (poplar).
  54. Tuskan, G., DiFazio, S., Jansson, S., Bohlmann, J., Grigoriev, I., Hellsten, U., Putnam, N., et al. (2006). The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). SCIENCE, 313(5793), 1596–1604.
    We report the draft genome of the black cottonwood tree, Populus trichocarpa. Integration of shotgun sequence assembly with genetic mapping enabled chromosome-scale reconstruction of the genome. More than 45,000 putative protein-coding genes were identified. Analysis of the assembled genome revealed a whole-genome duplication event; about 8000 pairs of duplicated genes from that event survived in the Populus genome. A second, older duplication event is indistinguishably coincident with the divergence of the Populus and Arabidopsis lineages. Nucleotide substitution, tandem gene duplication, and gross chromosomal rearrangement appear to proceed substantially more slowly in Populus than in Arabidopsis. Populus has more protein-coding genes than Arabidopsis, ranging on average from 1.4 to 1.6 putative Populus homologs for each Arabidopsis gene. However, the relative frequency of protein domains in the two genomes is similar. Overrepresented exceptions in Populus include genes associated with lignocellulosic wall biosynthesis, meristem development, disease resistance, and metabolite transport.
  55. Van De Velde, Willem, Pérez Guerra, J. C., De Keyser, A., De Rycke, R., Rombauts, S., Maunoury, N., Mergaert, P., et al. (2006). Aging in legume symbiosis : a molecular view on nodule senescence in Medicago truncatula. PLANT PHYSIOLOGY, 141(2), 711–720.
    Rhizobia reside as symbiosomes in the infected cells of legume nodules to fix atmospheric nitrogen. The symbiotic relation is strictly controlled, lasts for some time, but eventually leads to nodule senescence. We present a comprehensive transcriptomics study to understand the onset of nodule senescence in the legume Medicago truncatula. Distinct developmental stages with characteristic gene expression were delineated during which the two symbiotic partners were degraded consecutively, marking the switch in nodule tissue status from carbon sink to general nutrient source. Cluster analysis discriminated an early expression group that harbored regulatory genes that might be primary tools to interfere with pod filling-related or stress-induced nodule senescence, ultimately causing prolonged nitrogen fixation. Interestingly, the transcriptomes of nodule and leaf senescence had a high degree of overlap, arguing for the recruitment of similar pathways.
  56. Sterck, L., Rombauts, S., Jansson, S., Sterky, F., Rouzé, P., & Van de Peer, Y. (2005). EST data suggest that poplar is an ancient polyploid. NEW PHYTOLOGIST, 167(1), 165–170.
    We analysed the publicly available expressed sequence tag (EST) collections for the genus Populus to examine whether evidence can be found for large-scale gene-duplication events in the evolutionary past of this genus. The ESTs were clustered into unigenes for each poplar species examined. Gene families were constructed for all proteins deduced from these unigenes, and K-S dating was performed on all paralogs within a gene family. The fraction of paralogs was then plotted against the K-S values, which resulted in a distribution reflecting the age of duplicated genes in poplar. Sufficient EST data were available for seven different poplar species spanning four of the six sections of the genus Populus. For all these species, there was evidence that a large-scale gene-duplication event had occurred. From our analysis it is clear that all poplar species have shared the same large-scale gene-duplication event, suggesting that this event must have occurred in the ancestor of poplar, or at least very early in the evolution of the Populus genus.
  57. Aubourg, S., Brunaud, V., Bruyère, C., Cock, M., Cooke, R., Cottet, A., Couloux, A., et al. (2005). GeneFarm, structural and functional annotation of Arabidopsis gene and protein families by a network of experts. NUCLEIC ACIDS RESEARCH, 33(suppl. 1), D641–D646.
    Genomic projects heavily depend on genome annotations and are limited by the current deficiencies in the published predictions of gene structure and function. It follows that, improved annotation will allow better data mining of genomes, and more secure planning and design of experiments. The purpose of the GeneFarm project is to obtain homogeneous, reliable, documented and traceable annotations for Arabidopsis nuclear genes and gene products, and to enter them into an added-value database. This re-annotation project is being performed exhaustively on every member of each gene family. Performing a family-wide annotation makes the task easier and more efficient than a gene-by-gene approach since many features obtained for one gene can be extrapolated to some or all the other genes of a family. A complete annotation procedure based on the most efficient prediction tools available is being used by 16 partner laboratories, each contributing annotated families from its field of expertise. A database, named GeneFarm, and an associated user-friendly interface to query the annotations have been developed. More than 3000 genes distributed over 300 families have been annotated and are available at http://genoplante-info.infobiogen.fr/Genefarm/. Furthermore, collaboration with the Swiss Institute of Bioinformatics is underway to integrate the GeneFarm data into the protein knowledgebase Swiss-Prot.
  58. Vanderauwera, S., Zimmermann, P., Rombauts, S., Vandenabeele, S., Langebartels, C., Gruissem, W., Inzé, D., et al. (2005). Genome-wide analysis of hydrogen peroxide-regulated gene expression in Arabidopsis reveals a high light-induced transcriptional cluster involved in anthocyanin biosynthesis. PLANT PHYSIOLOGY, 139(2), 806–821.
    In plants, reactive oxygen species and, more particularly, hydrogen peroxide (H2O2) play a dual role as toxic by-products of normal cell metabolism and as regulatory molecules in stress perception and signal transduction. Peroxisomal catalases are an important sink for photorespiratory H2O2. Using ATH1 Affymetrix microarrays, expression profiles were compared between control and catalase-deficient Arabidopsis (Arabidopsis thaliana) plants. Reduced catalase levels already provoked differences in nuclear gene expression under ambient growth conditions, and these effects were amplified by high light exposure in a sun simulator for 3 and 8 h. This genome-wide expression analysis allowed us to reveal the expression characteristics of complete pathways and functional categories during H2O2 stress. In total, 349 transcripts were significantly up- regulated by high light in catalase-deficient plants and 88 were down-regulated. From this data set, H2O2 was inferred to play a key role in the transcriptional up- regulation of small heat shock proteins during high light stress. In addition, several transcription factors and candidate regulatory genes involved in H2O2 transcriptional gene networks were identified. Comparisons with other publicly available transcriptome data sets of abiotically stressed Arabidopsis revealed an important intersection with H2O2-deregulated genes, positioning elevated H2O2 levels as an important signal within abiotic stress-induced gene expression. Finally, analysis of transcriptional changes in a combination of a genetic (catalase deficiency) and an environmental (high light) perturbation identified a transcriptional cluster that was strongly and rapidly induced by high light in control plants, but impaired in catalase-deficient plants. This cluster comprises the complete known anthocyanin regulatory and biosynthetic pathway, together with genes encoding unknown proteins.
  59. Robbens, S., Rombauts, S., Rouzé, P., Wuyts, J., Saeys, Y., Moreau, H., & Van de Peer, Y. (2005). Genome analysis of the world’s smallest free-living eukaryote Ostreococcus tauri unveils unique genome heterogeneity. Proceedings of the Molecular Biology and Evolution Conference (MBE) 2005.
  60. Lescot, M., Rombauts, S., Zhang, J., Aubourg, S., Mathé, C., Jansson, S., Rouzé, P., et al. (2004). Annotation of a 95-kb Populus deltoides genomic sequence reveals a disease resistance gene cluster and novel class I and class II transposable elements. THEORETICAL AND APPLIED GENETICS, 109(1), 10–22.
    Poplar has become a model system for functional genomics in woody plants. Here, we report the sequencing and annotation of the first large contiguous stretch of genomic sequence (95 kb) of poplar, corresponding to a bacterial artificial chromosome clone mapped 0.6 centiMorgan from the Melampsora larici-populina resistance locus. The annotation revealed 15 putative genetic objects, of which five were classified as hypothetical genes that were similar only with expressed sequence tags from poplar. Ten putative objects showed similarity with known genes, of which one was similar to a kinase. Three other objects corresponded to the toll/interleukin-1 receptor/nucleotide-binding site/leucine-rich repeat class of plant disease resistance genes, of which two were predicted to encode an amino terminal nuclear localization signal. Four objects were homologous to the Ty1/copia family of class I transposable elements, one of which was designated Retropop and interrupted one of the disease resistance genes. Two other objects constituted a novel Spm-like class II transposable element, which we designated Magali.
  61. Ral, J.-P., Derelle, E., Ferraz, C., Wattebled, F., Farinas, B., Corellou, F., Buléon, A., et al. (2004). Starch division and partitioning: a mechanism for granule propagation and maintenance in the picophytoplanktonic green alga Ostreococcus tauri. PLANT PHYSIOLOGY, 136(2), 3333–3340.
    Whereas Glc is stored in small-sized hydrosoluble glycogen particles in archaea, eubacteria, fungi, and animal cells, photosynthetic eukaryotes have resorted to building starch, which is composed of several distinct polysaccharide fractions packed into a highly organized semicrystalline granule. In plants, both the initiation of polysaccharide synthesis and the nucleation mechanism leading to formation of new starch granules are currently not understood. Ostreococcus tauri, a unicellular green alga of the Prasinophyceae family, defines the tiniest eukaryote with one of the smallest genomes. We show that it accumulates a single starch granule at the chloroplast center by using the same pathway as higher plants. At the time of plastid division, we observe elongation of the starch and division into two daughter structures that are partitioned in each newly formed chloroplast. These observations suggest that in this system the information required to initiate crystalline polysaccharide growth of a new granule is contained within the preexisting polysaccharide structure and the design of the plastid division machinery.
  62. Vandenabeele, Steven, Vanderauwera, S., Vuylsteke, M., Rombauts, S., Langebartels, C., Seidlitz, H. K., Zabeau, M., et al. (2004). Catalase deficiency drastically affects gene expression induced by high light in Arabidopsis thaliana. PLANT JOURNAL, 39(1), 45–58.
    In plants, hydrogen peroxide (H2O2) plays a major signaling role in triggering both a defense response and cell death. Increased cellular H2O2 levels and subsequent redox imbalances are managed at the production and scavenging levels. Because catalases are the major H2O2 scavengers that remove the bulk of cellular H2O2, altering their levels allows in planta modulation of H2O2 concentrations. Reduced peroxisomal catalase activity increased sensitivity toward both ozone and photorespiratory H2O2-induced cell death in transgenic catalase-deficient Arabidopsis thaliana. These plants were used as a model system to build a comprehensive inventory of transcriptomic variations, which were triggered by photorespiratory H2O2 induced by high-light (HL) irradiance. In addition to an H2O2-dependent and -independent type of transcriptional response during light stress, microarray analysis on both control and transgenic catalase-deficient plants, exposed to 0, 3, 8, and 23 h of HL, revealed several specific regulatory patterns of gene expression. Thus, photorespiratory H2O2 has a direct impact on transcriptional programs in plants.
  63. Vlieghe, Kobe, Vuylsteke, M., Florquin, K., Rombauts, S., Maes, S., Ormenese, S., Van Hummelen, P., et al. (2003). Microarray analysis of E2Fa-DPa-overexpressing plants uncovers a cross-talking genetic network between DNA replication and nitrogen assimilation. JOURNAL OF CELL SCIENCE, 116(20), 4249–4259.
    Previously we have shown that overexpression of the heterodimeric E2Fa-DPa transcription factor in Arabidopsis thaliana results in ectopic cell division, increased endoreduplication, and an early arrest in development. To gain a better insight into the phenotypic behavior of E2Fa-DPa transgenic plants and to identify E2Fa-DPa target genes, a transcriptomic microarray analysis was performed. Out of 4,390 unique genes, a total of 188 had a twofold or more up- (84) or down-regulated (104) expression level in E2Fa-DPa transgenic plants compared to wild-type lines. Detailed promoter analysis allowed the identification of novel E2Fa-DPa target genes, mainly involved in DNA replication. Secondarily induced genes encoded proteins involved in cell wall biosynthesis, transcription and signal transduction or had an unknown function. A large number of metabolic genes were modified as well, among which, surprisingly, many genes were involved in nitrate assimilation. Our data suggest that the growth arrest observed upon E2Fa-DPa overexpression results at least partly from a nitrogen drain to the nucleotide synthesis pathway, causing decreased synthesis of other nitrogen compounds, such as amino acids and storage proteins.
  64. Breyne, Peter, Dreesen, R., Cannoot, B., Rombaut, D., Vandepoele, K., Rombauts, S., Vanderhaeghen, R., et al. (2003). Quantitative cDNA-AFLP analysis for genome-wide expression studies. MOLECULAR GENETICS AND GENOMICS, 269(2), 173–179.
    An improved cDNA-AFLP method for genome-wide expression analysis has been developed. We demonstrate that this method is an efficient tool for quantitative transcript profiling and a valid alternative to microarrays. Unique transcript tags, generated from reverse-transcribed messenger RNA by restriction enzymes, were screened through a series of selective PCR amplifications. Based on in silico analysis, an enzyme combination was chosen that ensures that at least 60% of all the mRNAs were represented by an informative sequence tag. The sensitivity and specificity of the method allows one to detect poorly expressed genes and distinguish between homologous sequences. Accurate gene expression profiles were determined by quantitative analysis of band intensities, and subtle differences in transcriptional activity were revealed. A detailed screen for cell cycle-modulated genes in tobacco demonstrates the usefulness of the technology for genome-wide expression analysis.
  65. Rombauts, S., Florquin, K., Lescot, M., Marchal, K., Rouzé, P., & Van de Peer, Y. (2003). Computational approaches to identify promoters and cis-regulatory elements in plant genomes. PLANT PHYSIOLOGY, 132(3), 1162–1176.
    The identification of promoters and their regulatory elements is one of the major challenges in bioinformatics and integrates comparative, structural, and functional genomics. Many different approaches have been developed to detect conserved motifs in a set of genes that are either coregulated or orthologous. However, although recent approaches seem promising, in general, unambiguous identification of regulatory elements is not straightforward. The delineation of promoters is even harder, due to its complex nature, and in silico promoter prediction is still in its infancy. Here, we review the different approaches that have been developed for identifying promoters and their regulatory elements. We discuss the detection of cis-acting regulatory elements using word-counting or probabilistic methods (so-called "search by signal" methods) and the delineation of promoters by considering both sequence content and structural features ("search by content" methods). As an example of search by content, we explored in greater detail the association of promoters with CpG islands. However, due to differences in sequence content, the parameters used to detect CpG islands in humans and other vertebrates cannot be used for plants. Therefore, a preliminary attempt was made to define parameters that could possibly define CpG and CpNpG islands in Arabidopsis, by exploring the compositional landscape around the transcriptional start site. To this end, a data set of more than 5,000 gene sequences was built, including the promoter region, the 5'-untranslated region, and the first introns and coding exons. Preliminary analysis shows that promoter location based on the detection of potential CpG/CpNpG islands in the Arabidopsis genome is not straightforward. Nevertheless, because the landscape of CpG/ CpNpG islands differs considerably between promoters and introns on the one side and exons (whether coding or not) on the other, more sophisticated approaches can probably be developed for the successful detection of "putative" CpG and CpNpG islands in plants.
  66. Rombauts, S., Van de Peer, Y., & Rouzé, P. (2003). AFLPinSilico, simulating AFLP fingerprints. BIOINFORMATICS, 19(6), 776–777.
    A drawback of the Amplified Fragment Length Polymorphism (AFLP) fingerprinting method is the difficulty to correlate the different fragments with their DNA sequence. The AFLPinSilico application presented here simulates AFLP experiments run on either cDNA or genomic sequences, producing virtual fingerprints that allow high throughput identification of AFLP fragments. The program also enables biologists to manage experiments through simulations done beforehand, thereby reducing the number of experiments that have to be run. AFLPinSilico is available through the www or as a stand-alone version, through a command line executable (available upon request, for any platform running PERL).
  67. De Bodt, Stefanie, Raes, J., Florquin, K., Rombauts, S., Rouzé, P., Theißen, G., & Van de Peer, Y. (2003). Genomewide structural annotation and evolutionary analysis of the type I MADS-box genes in plants. JOURNAL OF MOLECULAR EVOLUTION, 56(5), 573–586.
    The type I MADS-box genes constitute a largely unexplored subfamily of the extensively studied MADS-box gene family, well known for its role in flower development. Genes of the type I MADS-box subfamily possess the characteristic MADS box but are distinguished from type II MADS-box genes by the absence of the keratin-like box. In this in silico study, we have structurally annotated all 47 members of the type I MADS-box gene family in Arabidopsis thaliana and exerted a thorough analysis of the C-terminal regions of the translated proteins. On the basis of conserved motifs in the C-terminal region, we could classify the gene family into three main groups, two of which could be further subdivided. Phylogenetic trees were inferred to study the evolutionary relationships within this large MADS-box gene subfamily. These suggest for plant type I genes a dynamic of evolution that is significantly different from the mode of both animal type I (SRF) and plant type II (MIKC-type) gene phylogeny. The presence of conserved motifs in the majority of these genes, the identification of Oryza sativa MADS-box type I homologues, and the detection of expressed sequence tags for Arabidopsis thaliana and other plant type I genes suggest that these genes are indeed of functional importance to plants. It is therefore even more intriguing that, from an experimental point of view, almost nothing is known about the function of these MADS-box type I genes.
  68. Lescot, M., Déhais, P., Thijs, G., Marchal, K., Moreau, Y., Van de Peer, Y., Rouzé, P., et al. (2002). PlantCARE, a database of plant cis-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences. NUCLEIC ACIDS RESEARCH, 30(1), 325–327.
    PlantCARE is a database of plant cis-acting regulatory elements, enhancers and repressors. Regulatory elements are represented by positional matrices, consensus sequences and individual sites on particular promoter sequences. Links to the EMBL, TRANSFAC and MEDLINE databases are provided when available. Data about the transcription sites are extracted mainly from the literature, supplemented with an increasing number of in silico predicted data. Apart from a general description for specific transcription factor sites, levels of confidence for the experimental evidence, functional information and the position on the promoter are given as well. New features have been implemented to search for plant cis-acting regulatory elements in a query sequence. Furthermore, links are now provided to a new clustering and motif search method to investigate clusters of co-expressed genes. New regulatory elements can be sent automatically and will be added to the database after curation.
  69. Rensing, S. A., Rombauts, S., Van de Peer, Y., & Reski, R. (2002). Moss transcriptome and beyond. TRENDS IN PLANT SCIENCE.
    The ancient land plant Physcomitrella patens is a model system that is becoming increasingly important for plant functional genomics because gene knockouts can be produced with relative ease. Recently, several EST-sequencing projects have been launched as a first step towards a thorough functional characterization of the moss. However, for careful comparison with other plant model systems, the complete genomic sequence is needed as well as the transcriptome.
  70. Thijs, G., Marchal, K., Lescot, M., Rombauts, S., De Moor, B., Rouzé, P., & Moreau, Y. (2002). A Gibbs sampling method to detect overrepresented motifs in the upstream regions of coexpressed genes. JOURNAL OF COMPUTATIONAL BIOLOGY, 9(2), 447–464.
    Microarray experiments can reveal important information about transcriptional regulation. In our case, we look for potential promoter regulatory elements in the upstream region of coexpressed genes. Here we present two modifications of the original Gibbs sampling algorithm for motif finding (Lawrence et al., 1993). First, we introduce the use of a probability distribution to estimate the number of copies of the motif in a sequence. Second, we describe the technical aspects of the incorporation of a higher-order background model whose application we discussed in Thijs et al. (2001). Our implementation is referred to as the Motif Sampler. We successfully validate our algorithm on several data sets. First, we show results for three sets of upstream sequences containing known motifs: 1) the G-box light-response element in plants, 2) elements involved in methionine response in Saccharomyces cerevisiae, and 3) the FNR O-2-responsive element in bacteria. We use these data sets to explain the influence of the parameters on the performance of our algorithm. Second, we show results for upstream sequences from four clusters of coexpressed genes identified in a microarray experiment on wounding in Arabidopsis thaliana. Several motifs could be matched to regulatory elements from plant defence pathways in our database of plant cis-acting regulatory elements (PlantCARE). Some other strong motifs do not have corresponding motifs in PlantCARE but are promising candidates for further analysis.
  71. Thijs, G., Moreau, Y., De Smet, F., Mathys, J., Lescot, M., Rombauts, S., Rouzé, P., et al. (2002). INCLUSive: INtegrated Clustering, Upstream of sequence retrieval and motif Sampling. BIOINFORMATICS, 18(2), 331–332.
    INCLUSive allows automatic multistep analysis of microarray data (clustering and motif finding). The clustering algorithm (adaptive quality-based clustering) groups together genes with highly similar expression profiles. The upstream sequences of the genes belonging to a cluster are automatically retrieved from GenBank and can be fed directly into Motif Sampler, a Gibbs sampling algorithm that retrieves statistically over-represented motifs in sets of sequences, in this case upstream regions of co-expressed genes.
  72. Breyne, Peter, Dreesen, R., Vandepoele, K., De Veylder, L., Van Breusegem, F., Callewaert, L., Rombauts, S., et al. (2002). Transcriptome analysis during cell division in plants. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 99(23), 14825–14830.
    Using synchronized tobacco Bright Yellow-2 cells and cDNA-amplified fragment length polymorphism-based genomewide expression analysis, we built a comprehensive collection of plant cell cycle-modulated genes. Approximately 1,340 periodically expressed genes were identified, including known cell cycle control genes as well as numerous unique candidate regulatory genes. A number of plant-specific genes were found to be cell cycle modulated. Other transcript tags were derived from unknown plant genes showing homology to cell cycle-regulatory genes of other organisms. Many of the genes encode novel or uncharacterized proteins, indicating that several processes underlying cell division are still largely unknown.
  73. Vandepoele, Klaas, Raes, J., De Veylder, L., Rouzé, P., Rombauts, S., & Inzé, D. (2002). Genome-wide analysis of core cell cycle genes in Arabidopsis. PLANT CELL, 14(4), 903–916.
    Cyclin-dependent kinases and cyclins regulate with the help of different interacting proteins the progression through the eukaryotic cell cycle. A high-quality, homology-based annotation protocol was applied to determine the core cell cycle genes in the recently completed Arabidopsis genome sequence. In total, 61 genes were identified belonging to seven selected families of cell cycle regulators, for which 30 are new or corrections of the existing annotation. A new class of putative cell cycle regulators was found that probably are competitors of E2F/DP transcription factors, which mediate the G1-to-S progression. In addition, the existing nomenclature for cell cycle genes of Arabidopsis was updated, and the physical positions of all genes were compared with segmentally duplicated blocks in the genome, showing that 22 core cell cycle genes emerged through block duplications. This genome-wide analysis illustrates the complexity of the plant cell cycle machinery and provides a tool for elucidating the function of new family members in the future.
  74. Moreau, Y., Thijs, G., Marchal, K., De Smet, F., Mathys, J., Lescot, M., Rombauts, S., et al. (2002). Integrating quality-based clustering of microarray data with Gibbs sampling for the discovery of regulatory motifs. JOBIM 2002 : journées ouvertes biologie, informatique, mathématique. Presented at the Journées Ouvertes Biologie, Informatique, Mathématique 2002 (JOBIM 2002).
  75. Lescot, M., Thijs, G., Rombauts, S., Déhais, P., Martin, D., Thieffry, D., Jacq, B., et al. (2002). Deciphering cis-acting regulatory elements in plant and drosophila promoter sequences. In J. Nicolas & C. Thermes (Eds.), JOBIM 2002 : journées ouvertes biologie, informatique, mathématique (pp. 349–350). Presented at the Journées Ouvertes Biologie, Informatique, Mathématique 2002 (JOBIM 2002), Rocquencourt, France: INRIA.
  76. Rombauts, S., Lescot, M., Thijs, G., Marchal, K., Moreau, Y., Déhais, P., Van de Peer, Y., et al. (2002). The PlantCARE database and tools for in silico search of plant cis-acting regulatory elements. JOBIM 2002 : journées ouvertes biologie, informatique, mathématique (pp. 183–184). Presented at the Journées Ouvertes Biologie, Informatique, Mathématique 2002 (JOBIM 2002).
  77. Boudolf, V., Rombauts, S., Naudts, M., Inzé, D., & De Veylder, L. (2001). Identification of novel cyclin-dependent kinases interacting with the CKS1 protein of Arabidopsis. JOURNAL OF EXPERIMENTAL BOTANY, 52(359), 1381–1382.
    The SUC1/CKS1 proteins interact with cyclin-dependent kinases (CDKs) and play an essential, but yet not entirely resolved, role in the regulation of the cell cycle. With the Arabidopsis thaliana CKS1At: protein as bait in a two-hybrid screen. two novel Arabidopsis CDKs, Arath;CDKB1;2 and Arath;CDKB2;1, were isolated. A closely related homologue of Arath;CDKB2;1 was discovered in the databases and was nominated Arath;CDKB2;2. Transcript analysis of the five known Arath;CDKA and Arath;CDKB genes revealed that they all had the highest expression in flowers and cell suspensions. Differences in the expression patterns in roots, leaves and stems suggest unique roles for each CDK.
  78. Lescot, M., Rombauts, S., Thijs, G., Marchal, K., De Moor, B., Moreau, Y., & Rouzé, P. (2001). In silico search of plant cis-acting regulatory elements. In L. Duret, C. Gaspin, & T. Schiex (Eds.), JOBIM 2001 : journées ouvertes biologie, informatique, mathématique (pp. 227–228). Toulouse, France: Institut National de la Recherche Agronomique (INRA).
  79. Thijs, G., Lescot, M., Marchal, K., Rombauts, S., De Moor, B., Rouzé, P., & Moreau, Y. (2001). A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling. BIOINFORMATICS, 17(12), 1113–1122.
    Motivation: Transcriptome analysis allows detection and clustering of genes that are coexpressed under various biological circumstances. Under the assumption that coregulated genes share cis-acting regulatory elements, it is important to investigate the upstream sequences controlling the transcription of these genes. To improve the robustness of the Gibbs sampling algorithm to noisy data sets we propose an extension of this algorithm for motif finding with a higher-order background model. Results: Simulated data and real biological data sets with well-described regulatory elements are used to test the influence of the different background models on the performance of the motif detection algorithm. We show that the use of a higher-order model considerably enhances the performance of our motif finding algorithm in the presence of noisy data. For Arabidopsis thaliana, a reliable background model based on a set of carefully selected intergenic sequences was constructed.
  80. Thijs, G., Rombauts, S., Lescot, M., Marchal, K., De Moor, B., Moreau, Y., & Rouzé, P. (2000). Detection of cis-acting regulatory elements in plants : a GIBBS sampling approach. Proceedings of the second international conference on bioinformatics of genome regulation and structure (Vol. 1, pp. 118–121). Presented at the 2nd International conference on Bioinformatics of Genome Regulation and Structure (BGRS 2000), Novosibirsk, Russia: Institute of Cytology and Genetics (ICG).
  81. Magyar, Z., Atanassova, A., De Veylder, L., Rombauts, S., & Inzé, D. (2000). Characterization of two distinct DP-related genes from Arabidopsis thaliana. FEBS LETTERS, 486(1), 79–87.
  82. Mathé, C., Déhais, P., Pavy, N., Rombauts, S., Van Montagu, M., & Rouzé, P. (2000). Gene prediction and gene classes in Arabidopsis thaliana. JOURNAL OF BIOTECHNOLOGY, 78(3), 293–299. Presented at the 1st European symposium on Applied Genome Research.
  83. Pavy, N., Rombauts, S., Déhais, P., Mathé, C., Ramana, D. V., Leroy, P., & Rouzé, P. (1999). Evaluation of gene prediction software using a genomic data set: application to Arabidopsis thaliana sequences. BIOINFORMATICS, 15(11), 887–899. Presented at the 2nd Georgia Tech international conference on Bioinformatics, in Silicon Biology, on Sequence, Structure and Function.
    Motivation: The annotation of the Arabidopsis thaliana genome remains a problem in terms of time and quality. To improve the annotation process, we want to choose the most appropriate tools to use inside a computer-assisted annotation platform. We therefore need evaluation of prediction programs with Arabidopsis sequences containing multiple genes. Results: We have developed AraSet, a data set of contigs of validated genes, enabling the evaluation of multi-gene models for the Arabidopsis genome. Besides conventional metrics to evaluate gene prediction at the site and the exon levels, new measures were introduced for the prediction at the protein sequence level as well as for the evaluation of gene models. This evaluation method is of general interest and could apply to any new gene prediction software and to any eukaryotic genome. The GeneMark.hmm program appears to be the most accurate software at all three level's for the Arabidopsis genomic sequences. Gene modeling could be further improved by combination of prediction software.
  84. Rouzé, P., Pavy, N., & Rombauts, S. (1999). Genome annotation: which tools do we have for it? CURRENT OPINION IN PLANT BIOLOGY, 2(2), 90–95.
    Genome data have to be converted into knowledge to be useful to biologists. Many valuable computational tools have already been developed to help annotation of plant genome sequences, and these may be improved further, for example by identification of more gene regulatory elements. The lack of a standard computer-assisted annotation platform for eukaryotic genomes remains a major bottle-neck.
  85. Pavy, N., Mathé, C., Rombauts, S., & Rouzé, P. (1999). Génomique et bio-informatique. OCL-OLEAGINEUX CORPS GRAS LIPIDES, 6(2), 148–154.
    Genomics projects produce huge amounts of data of different kinds whose interpretation stimulated the development of bioinformatics, a recent discipline based on theoretical aspects of informatics and mathematics, as well as on biology. Bioinformatics enables the staring and organizing of genome-wide molecular data, provides tools to analyze them and to convert raw data into biological knowledge. We illustrate how the combination of data management and of sequence analysis tools has already brought fruitful perspectives for gene discovery.
  86. Rombauts, S., Déhais, P., Van Montagu, M., & Rouzé, P. (1999). PlantCARE, a plant cis-acting regulatory element database. NUCLEIC ACIDS RESEARCH, 27(1), 295–296.
    PlantCARE is a database of plant cis-acting regulatory elements, enhancers and repressors. Besides the transcription motifs found on a sequence, it also offers a link to the EMBL entry that contains the full gene sequence as well as a description of the conditions in which a motif becomes functional. The information on these sites is given by matrices, consensus and individual site sequences on particular genes, depending on the available information.
  87. Terryn, N., Heijnen, L., De Keyser, A., Van Asseldonck, M., De Clercq, R., Verbakel, H., Gielen, J., et al. (1999). Evidence for an ancient chromosomal duplication in Arabidopsis thaliana by sequencing and analyzing a 400-kb contig at the APETALA2 locus on chromosome 4. FEBS LETTERS, 445(2-3), 237–245.
  88. Thijs, G., Moreau, Y., Rombauts, S., De Moor, B., & Rouzé, P. (1999). Recognition of gene regulatory sequences by bagging of neural networks. In IEE Conference Publications (Vol. 470, pp. 988–993). Edison, NJ, USA: Institute of Electrical Engineers INSPEC.
    We use an ensemble of multilayer perceptrons to build a model for a type of gene regulatory sequence called a G-box. A variant of the bagging method (bootstrap-and-aggregate) improves the performance of the ensemble over that of a single network. Through a decomposition of the generalization error of the ensemble into bias and variance components, we estimate this error from the hold-out samples of the individual networks. We test the model on putative G-boxes, on sequences upstream of light-regulated genes, and on a control group and demonstrate that the model separates these groups efficiently.
  89. Rouzé, P., Rombauts, S., Van Laere, G., Van Wiemeersch, L., & Van Montagu, M. (1996). Gene prediction in Arabidopsis thaliana: genomic sequences. ARCHIVES OF PHYSIOLOGY AND BIOCHEMISTRY, 104(3), B50–B50. Presented at the 162nd Annual meeting of the Société Belge de Biochimie et de Biologie Moléculaire.