Stephane Rombauts

Stephane Rombauts — Staff Scientist
Joined the group in 1996

As a bioinformatician with a molecular biologists background, my first project was to set up from scratch what was going to become the PlantCARE database and brought me to focus my interest on gene expression. I then have been working on the annotation of the genomic sequences that contributed to the ESSA projects for the sequencing of Arabidopsis where the first genome duplications in Arabidopsis were shown. This involved next to some dataming, a lot of manual annotation of raw genomic sequences and genes, correcting faulty annotation done by automated systems that reported poor results at that time. I since then collaborated and still now provide an input in the development and enhancement of the Eugene gene prediction platform that performs now among the best. Over the years I have kept both interest. I'm still maintaining the PlantCARE database, that became part of the PlaNet project aiming at interconnecting databases on different aspects of plant genomics, while, on the other hand, I'm involved in the annotation of new upcoming plant genomes, which brough us to adapt the Eugene platform to other plant genomes than Arabidopsis thaliana. Now I try to combine both, as more plant genomes become available and comparative methods enable more reliable in silico promoter analyses. This means that from a raw genomic sequences I follow the whole pipe line from genome annotation, to the extraction of the data necessary to study promoter sequences and find clues to decipher potential co-expressed genes in networks.

Birthdate: 05 February 1969, Gent, Belgium.

Since October 1997: Staff member in the Bioinformatics and Evolutionary Genomics at the University of Ghent, VIB, Belgium
October 1991 - June 1996: M.Sc. Biotechnology, Ghent University, Belgium.

Service as reviewer:
Journal of experimental Botany, Bioinformatics


  1. Ma, X., Vanneste, S., Chang, J., Ambrosino, L., Barry, K., Bayer, T., … Van de Peer, Y. (2024). Seagrass genomes reveal ancient polyploidy and adaptations to the marine environment. NATURE PLANTS, 10, 240–255.
    We present chromosome-level genome assemblies from representative species of three independently evolved seagrass lineages: Posidonia oceanica, Cymodocea nodosa, Thalassia testudinum and Zostera marina. We also include a draft genome of Potamogeton acutifolius, belonging to a freshwater sister lineage to Zosteraceae. All seagrass species share an ancient whole-genome triplication, while additional whole-genome duplications were uncovered for C. nodosa, Z. marina and P. acutifolius. Comparative analysis of selected gene families suggests that the transition from submerged-freshwater to submerged-marine environments mainly involved fine-tuning of multiple processes (such as osmoregulation, salinity, light capture, carbon acquisition and temperature) that all had to happen in parallel, probably explaining why adaptation to a marine lifestyle has been exceedingly rare. Major gene losses related to stomata, volatiles, defence and lignification are probably a consequence of the return to the sea rather than the cause of it. These new genomes will accelerate functional studies and solutions, as continuing losses of the 'savannahs of the sea' are of major concern in times of climate change and loss of biodiversity.
  2. Vidal‐Quist, J. C., Declercq, J., Vanhee, S., Lambrecht, B., Gómez‐Rial, J., Vidal, C., … Hernández‐Crespo, P. (2023). RNA viruses alter house dust mite physiology and allergen production with no detected consequences for allergenicity. INSECT MOLECULAR BIOLOGY, 32(2), 173–186.
    RNA viruses have recently been detected in association with house dust mites, including laboratory cultures, dust samples, and mite-derived pharmaceuticals used for allergy diagnosis. This study aimed to assess the incidence of viral infection on Dermatophagoides pteronyssinus physiology and on the allergenic performance of extracts derived from its culture. Transcriptional changes between genetically identical control and virus-infected mite colonies were analysed by RNAseq with the support of a new D. pteronyssinus high-quality annotated genome (56.8 Mb, 108 scaffolds, N50 = 2.73 Mb, 96.7% BUSCO-completeness). Extracts of cultures and bodies from both colonies were compared by inspecting major allergen accumulation by enzyme-linked immunosorbent assay (ELISA), allergen-related enzymatic activities by specific assays, airway inflammation in a mouse model of allergic asthma, and binding to allergic patient's sera IgE by ImmunoCAP. Viral infection induced a significant transcriptional response, including several immunity and stress-response genes, and affected the expression of seven allergens, putative isoallergens and allergen orthologs. Major allergens were unaffected except for Der p 23 that was upregulated, increasing ELISA titers up to 29% in infected-mite extracts. By contrast, serine protease allergens Der p 3, 6 and 9 were downregulated, being trypsin and chymotrypsin enzymatic activities reduced up to 21% in extracts. None of the parameters analysed in our mouse model, nor binding to human IgE were significantly different when comparing control and infected-mite extracts. Despite the described physiological impact of viral infection on the mites, no significant consequences for the allergenicity of derived extracts or their practical use in allergy diagnosis have been detected.
  3. Waegneer, E., Rombauts, S., Baert, J., Dauchot, N., De Keyser, A., Eeckhaut, T., … Ruttink, T. (2023). Industrial chicory genome gives insights into the molecular timetable of anther development and male sterility. FRONTIERS IN PLANT SCIENCE, 14.
    Industrial chicory (Cichorium intybus var. sativum) is a biannual crop mostly cultivated for extraction of inulin, a fructose polymer used as a dietary fiber. F1 hybrid breeding is a promising breeding strategy in chicory but relies on stable male sterile lines to prevent self-pollination. Here, we report the assembly and annotation of a new industrial chicory reference genome. Additionally, we performed RNA-Seq on subsequent stages of flower bud development of a fertile line and two cytoplasmic male sterile (CMS) clones. Comparison of fertile and CMS flower bud transcriptomes combined with morphological microscopic analysis of anthers, provided a molecular understanding of anther development and identified key genes in a range of underlying processes, including tapetum development, sink establishment, pollen wall development and anther dehiscence. We also described the role of phytohormones in the regulation of these processes under normal fertile flower bud development. In parallel, we evaluated which processes are disturbed in CMS clones and could contribute to the male sterile phenotype. Taken together, this study provides a state-of-the-art industrial chicory reference genome, an annotated and curated candidate gene set related to anther development and male sterility as well as a detailed molecular timetable of flower bud development in fertile and CMS lines.
  4. De Bruyn, C., Ruttink, T., Lacchini, E., Rombauts, S., Haegeman, A., de keyser, ellen, … Van Laere, K. (2023). Identification and characterization of CYP71 subclade cytochrome P450 enzymes involved in the biosynthesis of bitterness compounds in Cichorium intybus. FRONTIERS IN PLANT SCIENCE, 14.
    Industrial chicory (Cichorium intybus var. sativum) and witloof (C. intybus var. foliosum) are crops with an important economic value, mainly cultivated for inulin production and as a leafy vegetable, respectively. Both crops are rich in nutritionally relevant specialized metabolites with beneficial effects for human health. However, their bitter taste, caused by the sesquiterpene lactones (SLs) produced in leaves and taproot, limits wider applications in the food industry. Changing the bitterness would thus create new opportunities with a great economic impact. Known genes encoding enzymes involved in the SL biosynthetic pathway are GERMACRENE A SYNTHASE (GAS), GERMACRENE A OXIDASE (GAO), COSTUNOLIDE SYNTHASE (COS) and KAUNIOLIDE SYNTHASE (KLS). In this study, we integrated genome and transcriptome mining to further unravel SL biosynthesis. We found that C. intybus SL biosynthesis is controlled by the phytohormone methyl jasmonate (MeJA). Gene family annotation and MeJA inducibility enabled the pinpointing of candidate genes related with the SL biosynthetic pathway. We specifically focused on members of subclade CYP71 of the cytochrome P450 family. We verified the biochemical activity of 14 C. intybus CYP71 enzymes transiently produced in Nicotiana benthamiana and identified several functional paralogs for each of the GAO, COS and KLS genes, pointing to redundancy in and robustness of the SL biosynthetic pathway. Gene functionality was further analyzed using CRISPR/Cas9 genome editing in C. intybus. Metabolite profiling of mutant C. intybus lines demonstrated a successful reduction in SL metabolite production. Together, this study increases our insights into the C. intybus SL biosynthetic pathway and paves the way for the engineering of C. intybus bitterness.
  5. Wouters, M., Bastiaanse, H., Rombauts, S., de Vries, L., De Pooter, T., Strazisar, M., … Boerjan, W. (2023). Suppression of the Arabidopsis cinnamoyl-CoA reductase 1-6 intronic T-DNA mutation by epigenetic modification. PLANT PHYSIOLOGY, 192(4), 3001–3016.
    Arabidopsis (Arabidopsis thaliana) T-DNA insertion collections are popular resources for fundamental plant research. Cinnamoyl-CoA reductase 1 (CCR1) catalyzes an essential step in the biosynthesis of the cell wall polymer lignin. Accordingly, the intronic transfer (T)-DNA insertion mutant ccr1-6 has reduced lignin levels and shows a stunted growth phenotype. Here, we report restoration of the ccr1-6 mutant phenotype and CCR1 expression levels after a genetic cross with a UDP-glucosyltransferase 72e1 (ugt72e1), -e2, -e3 T-DNA mutant. We discovered that the phenotypic recovery was not dependent on the UGT72E family loss of function but due to an epigenetic phenomenon called trans T-DNA suppression. Via trans T-DNA suppression, the gene function of an intronic T-DNA mutant was restored after the introduction of an additional T-DNA sharing identical sequences, leading to heterochromatinization and splicing out of the T-DNA-containing intron. Consequently, the suppressed ccr1-6 allele was named epiccr1-6. Long-read sequencing revealed that epiccr1-6, not ccr1-6, carries dense cytosine methylation over the full length of the T-DNA. We showed that the SAIL T-DNA in the UGT72E3 locus could trigger the trans T-DNA suppression of the GABI-Kat T-DNA in the CCR1 locus. Furthermore, we scanned the literature for other potential cases of trans T-DNA suppression in Arabidopsis and found that 22% of the publications matching our query report on double or higher-order T-DNA mutants that meet the minimal requirements for trans T-DNA suppression. These combined observations indicate that intronic T-DNA mutants need to be used with caution since methylation of intronic T-DNA might derepress gene expression and can thereby confound results.
  6. Salojärvi, J., Rambani, A., Yu, Z., Guyot, R., Strickler, S., Lepelley, M., … Descombes, P. (2023). The genome and population genomics of allopolyploid Coffea arabica reveal the diversification history of modern coffee cultivars.
    AbstractCoffea arabica, an allotetraploid hybrid ofC. eugenioidesandC. canephora, is the source of approximately 60% of coffee products worldwide, and its cultivated accessions have undergone several population bottlenecks. We present chromosome-level assemblies of a di-haploidC. arabicaaccession and modern representatives of its diploid progenitors,C. eugenioidesandC. canephora. The three species exhibit largely conserved genome structures between diploid parents and descendant subgenomes, with no obvious global subgenome dominance. We find evidence for a founding polyploidy event 350,000-610,000 years ago, followed by several pre-domestication bottlenecks, resulting in narrow genetic variation. A split between wild accessions and cultivar progenitors occurred ∼30.5 kya, followed by a period of migration between the two populations. Analysis of modern varieties, including lines historically introgressed withC. canephora, highlights their breeding histories and loci that may contribute to pathogen resistance, laying the groundwork for future genomics-based breeding ofC. arabica.
  7. Xu, Z., Li, Z., Ren, F., Gao, R., Wang, Z., Zhang, J., … Song, J. (2022). The genome of Corydalis reveals the evolution of benzylisoquinoline alkaloid biosynthesis in Ranunculales. PLANT JOURNAL, 111(1), 217–230.
    Species belonging to the order Ranunculales have attracted much attention because of their phylogenetic position as a sister group to all other eudicot lineages and their ability to produce unique yet diverse benzylisoquinoline alkaloids (BIAs). The Papaveraceae family in Ranunculales is often used as a model system for studying BIA biosynthesis. Here, we report the chromosome-level genome assembly of Corydalis tomentella, a species of Fumarioideae-one of the two subfamilies of Papaveraceae. Based on the comparisons of sequenced Ranunculalean species, we present clear evidence of a shared whole-genome duplication (WGD) event that has occurred before the divergence of Ranunculales but after its divergence from other eudicot lineages. The C. tomentella genome enabled us to integrate isotopic labelling and comparative genomics to reconstruct the BIA biosynthetic pathway for both sanguinarine biosynthesis shared by papaveraceous species and the cavidine biosynthesis specific to Corydalis. Also, our comparative analysis revealed that gene duplications, especially tandem gene duplications, underlie the diversification of BIA biosynthetic pathways in Ranunculales. In particular, tandemly duplicated berberine bridge enzyme-like genes appear to be involved in cavidine biosynthesis. In conclusion, our study of the C. tomentella genome provides important insights into the occurrence of WGDs during the early evolution of eudicots as well as into the evolution of BIA biosynthesis in Ranunculales.
  8. Maestri, S., Gambino, G., Lopatriello, G., Minio, A., Perrone, I., Cosentino, E., … Calderón, L. (2022). “Nebbiolo” genome assembly allows surveying the occurrence and functional implications of genomic structural variations in grapevines (Vitis vinifera L.). BMC GENOMICS, 23(1).
    Background: 'Nebbiolo' is a grapevine cultivar typical of north-western Italy, appreciated for producing high-quality red wines. Grapevine cultivars are characterized by possessing highly heterozygous genomes, including a great incidence of genomic rearrangements larger than 50 bp, so called structural variations (SVs). Even though abundant, SVs are an under-explored source of genetic variation mainly due to methodological limitations at their detection. Results: We employed a multiple platform approach to produce long-range genomic data for two different 'Nebbiolo' clones, namely: optical mapping, long-reads and linked-reads. We performed a haplotype-resolved de novo assembly for cultivar 'Nebbiolo' (clone CVT 71) and used an ab-initio strategy to annotate it. The annotated assembly enhanced our ability to detect SVs, enabling the study of genomic regions not present in the grapevines' reference genome and accounting for their functional implications. We performed variant calling analyses at three different organizational levels: i) between haplotypes of clone CVT 71 (primary assembly vs haplotigs), ii) between 'Nebbiolo' and 'Cabernet Sauvignon' assemblies and iii) between clones CVT 71 and CVT 185, representing different 'Nebbiolo' biotypes. The cumulative size of non-redundant merged SVs indicated a total of 79.6 Mbp for the first comparison and 136.1 Mbp for the second one, while no SVs were detected for the third comparison. Interestingly, SVs differentiating cultivars and haplotypes affected similar numbers of coding genes. Conclusions: Our results suggest that SVs accumulation rate and their functional implications in 'Nebbiolo' genome are highly-dependent on the organizational level under study. SVs are abundant when comparing 'Nebbiolo' to a different cultivar or the two haplotypes of the same individual, while they turned absent between the two analysed clones.
  9. Parrett, J. M., Chmielewski, S., Aydogdu Lohaus, E., Łukasiewicz, A., Rombauts, S., Szubert-Kruszyńska, A., … Radwan, J. (2022). Genomic evidence that a sexually selected trait captures genome-wide variation and facilitates the purging of genetic load. NATURE ECOLOGY & EVOLUTION, 6(9), 1330–1342.
    The evolution of costly traits such as deer antlers and peacock trains, which drove the formation of Darwinian sexual selection theory, has been suggested to both reflect and affect patterns of genetic variance across the genome, but direct tests are missing. Here, we used an evolve and resequence approach to reveal patterns of genome-wide diversity associated with the expression of a sexually selected weapon that is dimorphic among males of the bulb mite, Rhizoglyphus robini. Populations selected for the weapon showed reduced genome-wide diversity compared to populations selected against the weapon, particularly in terms of the number of segregating non-synonymous positions, indicating enhanced purifying selection. This increased purifying selection reduced inbreeding depression, but outbred female fitness did not improve, possibly because any benefits were offset by increased sexual antagonism. Most single nucleotide polymorphisms (SNPs) that consistently diverged in response to selection were initially rare and overrepresented in exons, and enriched in regions under balancing or relaxed selection, suggesting they are probably moderately deleterious variants. These diverged SNPs were scattered across the genome, further demonstrating that selection for or against the weapon and the associated changes to the mating system can both capture and influence genome-wide variation.
  10. Van Dingenen, J., García Méndez, S., Beirinckx, S., Vlaminck, L., De Keyser, A., Stuer, N., … Goormachtig, S. (2022). Flemish soils contain rhizobia partners for Northwestern Europe‐adapted soybean cultivars. ENVIRONMENTAL MICROBIOLOGY, 24(8), 3334–3354.
    In Europe, soybean (Glycine max) used for food and feed has to be imported, causing negative socioeconomic and environmental impacts. To increase the local production, breeding generated varieties that grow in colder climates, but the yield using the commercial inoculants is not satisfactory in Belgium because of variable nodulation efficiencies. To look for indigenous nodulating strains possibly adapted to the local environment, we initiated a nodulation trap by growing early-maturing cultivars under natural and greenhouse conditions in 107 garden soils in Flanders. Nodules occurred in 18 and 21 soils in the garden and greenhouse experiments respectively. By combining 16S rRNA PCR on single isolates with HiSeq 16S metabarcoding on nodules, we found a large bacterial richness and diversity from different soils. Furthermore, using Oxford Nanopore Technologies sequencing of DNA from one nodule, we retrieved the entire genome of a Bradyrhizobium species, not previously isolated, but profusely present in that nodule. These data highlight the need of combining diverse identification techniques to capture the true nodule rhizobial community. Eight selected rhizobial isolates were subdivided by whole-genome analysis in three genera containing six genetically distinct species that, except for two, aligned with known type strains and were all able to nodulate soybean in the laboratory.
  11. Farhat, S., LE, P., Kayal, E., Noel, B., Bigeard, E., Corre, E., … Guillou, L. (2021). Rapid protein evolution, organellar reductions, and invasive intronic elements in the marine aerobic parasite dinoflagellate Amoebophrya spp. BMC BIOLOGY, 19(1).
    Background: Dinoflagellates are aquatic protists particularly widespread in the oceans worldwide. Some are responsible for toxic blooms while others live in symbiotic relationships, either as mutualistic symbionts in corals or as parasites infecting other protists and animals. Dinoflagellates harbor atypically large genomes (similar to 3 to 250 Gb), with gene organization and gene expression patterns very different from closely related apicomplexan parasites. Here we sequenced and analyzed the genomes of two early-diverging and co-occurring parasitic dinoflagellate Amoebophrya strains, to shed light on the emergence of such atypical genomic features, dinoflagellate evolution, and host specialization. Results: We sequenced, assembled, and annotated high-quality genomes for two Amoebophrya strains (A25 and A120), using a combination of Illumina paired-end short-read and Oxford Nanopore Technology (ONT) MinION long-read sequencing approaches. We found a small number of transposable elements, along with short introns and intergenic regions, and a limited number of gene families, together contribute to the compactness of the Amoebophrya genomes, a feature potentially linked with parasitism. While the majority of Amoebophrya proteins (63.7% of A25 and 59.3% of A120) had no functional assignment, we found many orthologs shared with Dinophyceae. Our analyses revealed a strong tendency for genes encoded by unidirectional clusters and high levels of synteny conservation between the two genomes despite low interspecific protein sequence similarity, suggesting rapid protein evolution. Most strikingly, we identified a large portion of non-canonical introns, including repeated introns, displaying a broad variability of associated splicing motifs never observed among eukaryotes. Those introner elements appear to have the capacity to spread over their respective genomes in a manner similar to transposable elements. Finally, we confirmed the reduction of organelles observed in Amoebophrya spp., i.e., loss of the plastid, potential loss of a mitochondrial genome and functions. Conclusion: These results expand the range of atypical genome features found in basal dinoflagellates and raise questions regarding speciation and the evolutionary mechanisms at play while parastitism was selected for in this particular unicellular lineage.
  12. Raharimalala, N., Rombauts, S., McCarthy, A., Garavito, A., Orozco-Arias, S., Bellanger, L., … Crouzillat, D. (2021). The absence of the caffeine synthase gene is involved in the naturally decaffeinated status of Coffea humblotiana, a wild species from Comoro archipelago. SCIENTIFIC REPORTS, 11(1).
    Caffeine is the most consumed alkaloid stimulant in the world. It is synthesized through the activity of three known N-methyltransferase proteins. Here we are reporting on the 422-Mb chromosome-level assembly of the Coffea humblotiana genome, a wild and endangered, naturally caffeine-free, species from the Comoro archipelago. We predicted 32,874 genes and anchored 88.7% of the sequence onto the 11 chromosomes. Comparative analyses with the African Robusta coffee genome (C. canephora) revealed an extensive genome conservation, despite an estimated 11 million years of divergence and a broad diversity of genome sizes within the Coffea genus. In this genome, the absence of caffeine is likely due to the absence of the caffeine synthase gene which converts theobromine into caffeine through an illegitimate recombination mechanism. These findings pave the way for further characterization of caffeine-free species in the Coffea genus and will guide research towards naturally-decaffeinated coffee drinks for consumers.
  13. Tien, N. Q. D., Ma, X., Man, L. Q., Chi, D. T. K., Huy, N. X., Nhut, D.-T., … Loc, N. H. (2021). De novo whole-genome assembly and discovery of genes involved in triterpenoid saponin biosynthesis of Vietnamese ginseng (Panax vietnamensis Ha et Grushv.). PHYSIOLOGY AND MOLECULAR BIOLOGY OF PLANTS, 27(10), 2215–2229.
    Vietnamese ginseng (Panax vietnamensis Ha et Grushv.), also known as Ngoc Linh ginseng, is a high-value herb in Vietnam. Vietnamese ginseng has been proven to be effective in enhancing the immune system, human memory, anti-stress, anti-inflammatory, anti-cancer, and prevent aging. The present study reports the first draft whole-genome of Vietnamese ginseng and the identification of potential genes involved in the triterpenoid metabolic pathway. De novo whole-genome assembly was performed successfully from a data of approximately 139 Gbps of 394,802,120 high quality reads to generate 9815 scaffolds with an N50 value of 572,722 bp from the leaf of Vietnamese ginseng. The assembled genome of Vietnamese ginseng is 3,001,967,204 bp long containing 79,374 gene models. Among them, there are 55,012 genes (69.30%) were annotated by various public molecular biology databases. The potential genes involved in triterpenoid saponin biosynthesis in Vietnamese ginseng and their metabolic pathway were also predicted." Three genes encoding squalene monooxygenase isozymes in Vietnamese ginseng were cloned, sequenced and characterized. Moreover, expression levels of several key genes involved in terpenoid biosynthesis in different parts of Vietnamese ginseng were also analyzed. The SSR markers were detected by various programs from both of assembly full dataset of Vietnamese ginseng genome and predicted genes. The present work provided important data of the draft whole-genome of Vietnamese ginseng for further studies to understand the role of genes involved in ginsenoside biosynthesis and their metabolic pathway at the molecular level of this rare medicinal species.
  14. Vidal-Quist, J. C., Vidal, C., Escolar, F., Lambrecht, B., Rombauts, S., & Hernandez-Crespo, P. (2021). RNA viruses in the house dust mite Dermatophagoides pteronyssinus, detection in environmental samples and in commercial allergen extracts used for in vivo diagnosis. ALLERGY, 76, 3743–3754.
    Background Allergy to house dust mites (HDM), the most important source of indoor allergens worldwide, is diagnosed and treated using natural extracts from cultures that can contain immunoactive components from the HDM microbiome, including mite-infecting viruses. This study aimed to contribute to the discovery and characterization of RNA viruses from Dermatophagoides pteronyssinus, followed by their detection in different mite-derived sources. Methods Viruses were assembled after in silico metatranscriptomic analysis of D. pteronyssinus RNA samples, visualized by electron microscopy, and RNA detected by direct RT-PCR or data mining. Mite culture performance was evaluated in vivo. Results Seven RNA viruses were identified in our laboratory stock colony. Picornavirus-like viral particles were detected in epithelial cells of the digestive system and in fecal pellets. Most of these viruses could be persistently transmitted to an inbred virus-free colony by inoculating fecal material from the stock colony. Upon viral infection, no significant effect could be seen on mite population growth. Transcriptomic screening confirmed the presence of homolog sequences to these viruses in independent laboratory stocks of D. pteronyssinus and in other Astigmata mites. Noteworthy, RNA from most of the viruses could be detected by RT-PCR on house dust samples, reference standards, and/or commercial diagnostic D. pteronyssinus extracts. Conclusions Our results show that viral infections are common and widespread in D. pteronyssinus, both in natural and culture-based growth conditions. Potential effects on the mites themselves and consequences toward allergenicity in humans whether exposed naturally or after immunotherapy are discussed.
  15. Bartley, K., Chen, W., Lloyd Mills, R. I., Nunn, F., Price, D. R. G., Rombauts, S., … Burgess, S. T. G. (2021). Transcriptomic analysis of the poultry red mite, Dermanyssus gallinae, across all stages of the lifecycle. BMC GENOMICS, 22(1).
    Background The blood feeding poultry red mite (PRM), Dermanyssus gallinae, causes substantial economic damage to the egg laying industry worldwide, and is a serious welfare concern for laying hens and poultry house workers. In this study we have investigated the temporal gene expression across the 6 stages/sexes (egg, larvae, protonymph and deutonymph, adult male and adult female) of this neglected parasite in order to understand the temporal expression associated with development, parasitic lifestyle, reproduction and allergen expression. Results RNA-seq transcript data for the 6 stages were mapped to the PRM genome creating a publicly available gene expression atlas (on the OrcAE platform in conjunction with the PRM genome). Network analysis and clustering of stage-enriched gene expression in PRM resulted in 17 superclusters with stage-specific or multi-stage expression profiles. The 6 stage specific superclusters were clearly demarked from each other and the adult female supercluster contained the most stage specific transcripts (2725), whilst the protonymph supercluster the fewest (165). Fifteen pairwise comparisons performed between the different stages resulted in a total of 6025 Differentially Expressed Genes (DEGs) (P > 0.99). These data were evaluated alongside a Venn/Euler analysis of the top 100 most abundant genes in each stage. An expanded set of cuticle proteins and enzymes (chitinase and metallocarboxypeptidases) were identified in larvae and underpin cuticle formation and ecdysis to the protonymph stage. Two mucin/peritrophic-A salivary proteins (DEGAL6771g00070, DEGAL6824g00220) were highly expressed in the blood-feeding stages, indicating peritrophic membrane formation during feeding. Reproduction-associated vitellogenins were the most abundant transcripts in adult females whilst, in adult males, an expanded set of serine and cysteine proteinases and an epididymal protein (DEGAL6668g00010) were highly abundant. Assessment of the expression patterns of putative homologues of 32 allergen groups from house dust mites indicated a bias in their expression towards the non-feeding larval stage of PRM. Conclusions This study is the first evaluation of temporal gene expression across all stages of PRM and has provided insight into developmental, feeding, reproduction and survival strategies employed by this mite. The publicly available PRM resource on OrcAE offers a valuable tool for researchers investigating the biology and novel interventions of this parasite.
  16. De Vos, S., Rombauts, S., Coussement, L., Dermauw, W., Vuylsteke, M., Sorgeloos, P., … Bossier, P. (2021). The genome of the extremophile Artemia provides insight into strategies to cope with extreme environments. BMC GENOMICS, 22(1).
    Background: Brine shrimp Artemia have an unequalled ability to endure extreme salinity and complete anoxia. This study aims to elucidate its strategies to cope with these stressors. Results and discussion: Here, we present the genome of an inbred A. franciscana Kellogg, 1906. We identified 21,828 genes of which, under high salinity, 674 genes and under anoxia, 900 genes were differentially expressed (42%, respectively 30% were annotated). Under high salinity, relevant stress genes and pathways included several Heat Shock Protein and Leaf Embryogenesis Abundant genes, as well as the trehalose metabolism. In addition, based on differential gene expression analysis, it can be hypothesized that a high oxidative stress response and endocytosis/exocytosis are potential salt management strategies, in addition to the expression of major facilitator superfamily genes responsible for transmembrane ion transport. Under anoxia, genes involved in mitochondrial function, mTOR signalling and autophagy were differentially expressed. Both high salt and anoxia enhanced degradation of erroneous proteins and protein chaperoning. Compared with other branchiopod genomes, Artemia had 0.03% contracted and 6% expanded orthogroups, in which 14% of the genes were differentially expressed under high salinity or anoxia. One phospholipase D gene family, shown to be important in plant stress response, was uniquely present in both extremophiles Artemia and the tardigrade Hypsibius dujardini, yet not differentially expressed under the described experimental conditions. Conclusions: A relatively complete genome of Artemia was assembled, annotated and analysed, facilitating research on its extremophile features, and providing a reference sequence for crustacean research.
  17. Greenhalgh, R., Dermauw, W., Glas, J. J., Rombauts, S., Wybouw, N., Thomas, J., … Kant, M. R. (2020). Genome streamlining in a minute herbivore that manipulates its host plant. ELIFE, 9.
    The tomato russet mite, Aculops lycopersici, is among the smallest animals on earth. It is a worldwide pest on tomato and can potently suppress the host's natural resistance. We sequenced its genome, the first of an eriophyoid, and explored whether there are genomic features associated with the mite's minute size and lifestyle. At only 32.5 Mb, the genome is the smallest yet reported for any arthropod and, reminiscent of microbial eukaryotes, exceptionally streamlined. It has few transposable elements, tiny intergenic regions, and is remarkably intronpoor, as more than 80% of coding genes are intronless. Furthermore, in accordance with ecological specialization theory, this defense-suppressing herbivore has extremely reduced environmental response gene families such as those involved in chemoreception and detoxification. Other losses associate with this species' highly derived body plan. Our findings accelerate the understanding of evolutionary forces underpinning metazoan life at the limits of small physical and genome size.
  18. Yau, S., Krasovec, M., Benites, L. F., Rombauts, S., Groussin, M., Vancaester, E., … Piganeau, G. (2020). Virus-host coexistence in phytoplankton through the genomic lens. SCIENCE ADVANCES, 6(14).
    Virus-microbe interactions in the ocean are commonly described by "boom and bust" dynamics, whereby a numerically dominant microorganism is lysed and replaced by a virus-resistant one. Here, we isolated a microalga strain and its infective dsDNA virus whose dynamics are characterized instead by parallel growth of both the microalga and the virus. Experimental evolution of clonal lines revealed that this viral production originates from the lysis of a minority of virus-susceptible cells, which are regenerated from resistant cells. Whole-genome sequencing demonstrated that this resistant-susceptible switch involved a large deletion on one chromosome. Mathematical modeling explained how the switch maintains stable microalga-virus population dynamics consistent with their observed growth pattern. Comparative genomics confirmed an ancient origin of this "accordion" chromosome despite a lack of sequence conservation. Together, our results show how dynamic genomic rearrangements may account for a previously overlooked coexistence mechanism in microalgae-virus interactions.
  19. Linsmith, G., Rombauts, S., Montanari, S., Deng, C. H., Celton, J.-M., Guérif, P., … Bianco, L. (2019). Pseudo-chromosome-length genome assembly of a double haploid “Bartlett” pear (Pyrus communis L.). GIGASCIENCE, 8(12).
    BACKGROUND: We report an improved assembly and scaffolding of the European pear (Pyrus communis L.) genome (referred to as BartlettDHv2.0), obtained using a combination of Pacific Biosciences RSII long-read sequencing, Bionano optical mapping, chromatin interaction capture (Hi-C), and genetic mapping. The sample selected for sequencing is a double haploid derived from the same "Bartlett" reference pear that was previously sequenced. Sequencing of di-haploid plants makes assembly more tractable in highly heterozygous species such as P. communis. FINDINGS: A total of 496.9 Mb corresponding to 97% of the estimated genome size were assembled into 494 scaffolds. Hi-C data and a high-density genetic map allowed us to anchor and orient 87% of the sequence on the 17 pear chromosomes. Approximately 50% (247 Mb) of the genome consists of repetitive sequences. Gene annotation confirmed the presence of 37,445 protein-coding genes, which is 13% fewer than previously predicted. CONCLUSIONS: We showed that the use of a doubled-haploid plant is an effective solution to the problems presented by high levels of heterozygosity and duplication for the generation of high-quality genome assemblies. We present a high-quality chromosome-scale assembly of the European pear Pyrus communis and demostrate its high degree of synteny with the genomes of Malus x Domestica and Pyrus x bretschneideri.
  20. Burgess, S. T., Marr, E. J., Bartley, K., Nunn, F. G., Down, R. E., Weaver, R. J., … Nisbet, A. J. (2019). A genomic analysis and transcriptomic atlas of gene expression in Psoroptes ovis reveals feeding- and stage-specific patterns of allergen expression.
  21. Burgess, S. T., Marr, E. J., Bartley, K., Nunn, F. G., Down, R. E., Weaver, R. J., … Nisbet, A. J. (2019). A genomic analysis and transcriptomic atlas of gene expression in Psoroptes ovis reveals feeding- and stage-specific patterns of allergen expression. BMC GENOMICS, 20.
    Background: Psoroptic mange, caused by infestation with the ectoparasitic mite, Psoroptes ovis, is highly contagious, resulting in intense pruritus and represents a major welfare and economic concern for the livestock industry Worldwide. Control relies on injectable endectocides and organophosphate dips, but concerns over residues, environmental contamination, and the development of resistance threaten the sustainability of this approach, highlighting interest in alternative control methods. However, development of vaccines and identification of chemotherapeutic targets is hampered by the lack of P. ovis transcriptomic and genomic resources. Results: Building on the recent publication of the P. ovis draft genome, here we present a genomic analysis and transcriptomic atlas of gene expression in P. ovis revealing feeding- and stage-specific patterns of gene expression, including novel multigene families and allergens. Network-based clustering revealed 14 gene clusters demonstrating either single- or multi-stage specific gene expression patterns, with 3075 female-specific, 890 male-specific and 112, 217 and 526 transcripts showing larval, protonymph and tritonymph specific-expression, respectively. Detailed analysis of P. ovis allergens revealed stage-specific patterns of allergen gene expression, many of which were also enriched in "fed" mites and tritonymphs, highlighting an important feeding-related allergenicity in this developmental stage. Pair-wise analysis of differential expression between life-cycle stages identified patterns of sex-biased gene expression and also identified novel P. ovis multigene families including known allergens and novel genes with high levels of stage-specific expression. Conclusions: The genomic and transcriptomic atlas described here represents a unique resource for the acarid-research community, whilst the OrcAE platform makes this freely available, facilitating further community-led curation of the draft P. ovis genome.
  22. Navia, D., Novelli, V. M., Rombauts, S., Freitas-Astua, J., de Mendonca, R. S., Nunes, M. A., … Van de Peer, Y. (2019). Draft genome assembly of the false spider mite Brevipalpus yothersi. MICROBIOLOGY RESOURCE ANNOUNCEMENTS, 8(6).
    The false spider mite Brevipalpus yothersi infests a broad host plant range and has become one of the most economically important species within the genus Brevipalpus. This phytophagous mite inflicts damage by both feeding on plants and transmitting plant viruses. Here, we report the first draft genome sequence of the false spider mite, which is also the first plant virus mite vector to be sequenced. The similar to 72 Mb genome (sequenced at 42x coverage) encodes similar to 16,000 predicted protein-coding genes.
  23. De Vos, S., Van Stappen, G., Sorgeloos, P., Vuylsteke, M., Rombauts, S., & Bossier, P. (2019). Identification of salt stress response genes using the Artemia transcriptome. AQUACULTURE, 500, 305–314.
    Habitat salinity is a major abiotic factor governing the activity, physiology, biology and distribution of aquatic animals. Salinity changes cause salt stress, affecting crustaceans reared in aquaculture both on an ecological and economic level. Current salt stress research in aquatic animals is mainly focused on salt stress in the gills at relatively low salinity ranges. Knowledge about whole-body salinity response in crustaceans and other organisms is lacking, especially in hypersaline conditions. Artemia franciscana is a small halophilic model crustacean able to withstand high salinities up to 300 g/l and strong osmotic shocks thanks to its mitigating strategies for fluctuating salinity levels, such as its unique larval salt gland and osmoregulatory capacity. This study aims to identify the genes responsible for Anemia's unique hypersalinity tolerance by differential expression analysis. First, the full transcriptome of A. franciscana in different metabolic and life cycle stages was assembled de novo (assembly statistics: N50 = 1,430; GC content = 35.63%; transcript number = 64,972) and functionally annotated (annotated transcripts = 36%). Then, naupliar RNA-Seq reads generated under respectively hypersaline and marine conditions were pseudo-aligned to the A. franciscana transcriptome. Expression levels in both conditions were finally compared and 177 differentially expressed, functionally annotated transcripts were identified, of which 113 transcripts with GO annotations. Signalling genes, such as EIF and several genes from the glutathione and the chitin metabolic pathways were induced in Artemia under hypersaline conditions. Hypersalinity also activated gene regulation mechanisms (expression, transcription and post transcription) in the nucleus for DNA repair, ubiquitination, and also for cell cycle arrest through La-related protein. Several lipid metabolic genes and lipid transporters were upregulated, potentially to provide energy for ion balance and to maintain membrane structure integrity. Transport of metal ions and other ions was upregulated as well to maintain ion homeostasis. Lastly, known crustacean stress response genes such as Heat shock 70 kDa protein cognate were upregulated. This work shows that salt stress in Artemia nauplii, through signal transduction, gene regulation, lipid metabolism, transport and stress response genes, has an important influence on known and novel homeostasis-repairing mechanisms in Artemia.
  24. Tzfadia, O., Bocobza, S., Defoort, J., Almekias-Siegl, E., Panda, S., Levy, M., … Aharoni, A. (2018). The “TranSeq” 3’-end sequencing method for high-throughput transcriptomics and gene space refinement in plant genomes. PLANT JOURNAL, 96(1), 223–232.
    High-throughput RNA sequencing has proven invaluable not only to explore gene expression but also for both gene prediction and genome annotation. However, RNA sequencing, carried out on tens or even hundreds of samples, requires easy and cost-effective sample preparation methods using minute RNA amounts. Here, we present TranSeq, a high-throughput 3'-end sequencing procedure that requires 10- to 20-fold fewer sequence reads than the current transcriptomics procedures. TranSeq significantly reduces costs and allows a greater increase in size of sample sets analyzed in a single experiment. Moreover, in comparison with other 3'-end sequencing methods reported to date, we demonstrate here the reliability and immediate applicability of TranSeq and show that it not only provides accurate transcriptome profiles but also produces precise expression measurements of specific gene family members possessing high sequence similarity. This is difficult to achieve in standard RNA-seq methods, in which sequence reads cover the entire transcript. Furthermore, mapping TranSeq reads to the reference tomato genome facilitated the annotation of new transcripts improving >45% of the existing gene models. Hence, we anticipate that using TranSeq will boost large-scale transcriptome assays and increase the spatial and temporal resolution of gene expression data, in both model and non-model plant species. Moreover, as already performed for tomato (ITAG3.0;, we strongly advocate its integration into current and future genome annotations.
  25. Krasovec, M., Vancaester, E., Rombauts, S., Bucchini, F., Yau, S., Hemon, C., … Piganeau, G. (2018). Genome analyses of the microalga Picochlorum provide insights into the evolution of thermotolerance in the green lineage. GENOME BIOLOGY AND EVOLUTION, 10(9), 2347–2365.
    While the molecular events involved in cell responses to heat stress have been extensively studied, our understanding of the genetic basis of basal thermotolerance, and particularly its evolution within the green lineage, remains limited. Here, we present the 13.3-Mb haploid genome and transcriptomes of a halotolerant and thermotolerant unicellular green alga, Picochlorum costavermella (Trebouxiophyceae) to investigate the evolution of the genomic basis of thermotolerance. Differential gene expression at high and standard temperatures revealed that more of the gene families containing up-regulated genes at high temperature were recently evolved, and less originated at the ancestor of green plants. Inversely, there was an excess of ancient gene families containing transcriptionally repressed genes. Interestingly, there is a striking overlap between the thermotolerance and halotolerance transcriptional rewiring, as more than one-third of the gene families up-regulated at 35 degrees C were also up-regulated under variable salt concentrations in Picochlorum SE3. Moreover, phylogenetic analysis of the 9,304 protein coding genes revealed 26 genes of horizontally transferred origin in P. costavermella, of which five were differentially expressed at higher temperature. Altogether, these results provide new insights about how the genomic basis of adaptation to halo- and thermotolerance evolved in the green lineage.
  26. Nishiyama, T., Sakayama, H., de Vries, J., Buschmann, H., Saint-Marcoux, D., Ullrich, K. K., … Rensing, S. A. (2018). The Chara genome : secondary complexity and implications for plant terrestrialization. CELL, 174(2), 448–464.
    Land plants evolved from charophytic algae, among which Charophyceae possess the most complex body plans. We present the genome of Chara braunii; comparison of the genome to those of land plants identified evolutionary novelties for plant terrestrialization and land plant heritage genes. C. braunii employs unique xylan synthases for cell wall biosynthesis, a phragmoplast (cell separation) mechanism similar to that of land plants, and many phytohormones. C. braunii plastids are controlled via landplant- like retrograde signaling, and transcriptional regulation is more elaborate than in other algae. The morphological complexity of this organism may result from expanded gene families, with three cases of particular note: genes effecting tolerance to reactive oxygen species (ROS), LysM receptor-like kinases, and transcription factors (TFs). Transcriptomic analysis of sexual reproductive structures reveals intricate control by TFs, activity of the ROS gene network, and the ancestral use of plant-like storage and stress protection proteins in the zygote.
  27. Burgess, S. T., Bartley, K., Marr, E. J., Wright, H. W., Weaver, R. J., Prickett, J. C., … Nisbet, A. J. (2018). Draft genome assembly of the sheep scab mite, Psoroptes ovis. MICROBIOLOGY RESOURCE ANNOUNCEMENTS, 6(16).
    Sheep scab, caused by infestation with Psoroptes ovis, is highly contagious, results in intense pruritus, and represents a major welfare and economic concern. Here, we report the first draft genome assembly and gene prediction of P. ovis based on PacBio de novo sequencing. The ∼63.2-Mb genome encodes 12,041 protein-coding genes.
  28. Burgess, S. T., Bartley, K., Nunn, F., Wright, H. W., Hughes, M., Gemmell, M., … Nisbet, A. J. (2018). Draft genome assembly of the poultry red mite, Dermanyssus gallinae. MICROBIOLOGY RESOURCE ANNOUNCEMENTS, 7(18).
    The poultry red mite, Dermanyssus gallinae, is a major worldwide concern in the egg-laying industry. Here, we report the first draft genome assembly and gene prediction of Dermanyssus gallinae, based on combined PacBio and MinION long-read de novo sequencing. The ∼959-Mb genome is predicted to encode 14,608 protein-coding genes.
  29. Orr, R. J. S., Rombauts, S., Van de Peer, Y., & Shalchian-Tabrizi, K. (2017). Draft genome sequences of two unclassified bacteria, Hydrogenophaga sp. strains IBVHS1 and IBVHS2, isolated from environmental samples. GENOME ANNOUNCEMENTS, 5(34). genomeA.00884-17
    We report here the draft genome sequences of Hydrogenophaga sp. strains IBVHS1 and IBVHS2, two bacteria assembled from the metagenomes of surface samples from freshwater lakes. The genomes are >95% complete and may represent new species within the Hydrogenophaga genus, indicating a larger diversity than currently identified.
  30. Orr, R. J. S., Rombauts, S., Van de Peer, Y., & Shalchian-Tabrizi, K. (2017). Draft genome sequences of two unclassified bacteria, Sphingomonas sp. strains IBVSS1 and IBVSS2, isolated from environmental samples. GENOME ANNOUNCEMENTS, 5(34).
    We report here the draft genome sequences of Sphingomonas sp. IBVSS1 and IBVSS2, two bacteria assembled from the metagenomes of surface samples from freshwater lakes. The genomes are >99% complete and may represent new species within the Sphingomonas genus, indicating a larger diversity than currently identified.
  31. Orr, R. J., Rombauts, S., Van de Peer, Y., & Shalchian-Tabrizi, K. (2017). Draft genome sequences of two unclassified Chitinophagaceae bacteria, IBVUCB1 and IBVUCB2, isolated from environmental samples. GENOME ANNOUNCEMENTS, 5(33).
    We report here the draft genome sequences of two Chitinophagaceae bacteria, IBVUCB1 and IBVUCB2, assembled from metagenomes of surface samples from freshwater lakes. The genomes are >99% complete and may represent new genera within the Chitinophagaceae family, indicating a larger diversity than currently identified.
  32. Canaguier, A., Grimplet, J., Di Gaspero, G., Scalabrin, S., Duchêne, E., Choisne, N., … Adam-Blondon, A.-F. (2017). A new version of the grapevine reference genome assembly (12X.v2) and of its annotation (VCost.v3). GENOMICS DATA, 14, 56–62.
  33. Miclotte, G., Plaisance, S., Rombauts, S., Van de Peer, Y., Audenaert, P., & Fostier, J. (2017). OMSim : a simulator for optical map data. BIOINFORMATICS, 33(17), 2740–2742.
    Motivation: The Bionano Genomics platform allows for the optical detection of short sequence patterns in very long DNA molecules (up to 2.5 Mbp). Molecules with overlapping patterns can be assembled to generate a consensus optical map of the entire genome. In turn, these optical maps can be used to validate or improve de novo genome assembly projects or to detect large-scale structural variation in genomes. Simulated optical map data can assist in the development and benchmarking of tools that operate on those data, such as alignment and assembly software. Additionally, it can help to optimize the experimental setup for a genome of interest. Such a simulator is currently not available. Results: We have developed a simulator, OMSim, that produces synthetic optical map data that mimics real Bionano Genomics data. These simulated data have been tested for compatibility with the Bionano Genomics Irys software system and the Irys-scaffolding scripts. OMSim is capable of handling very large genomes (over 30 Gbp) with high throughput and low memory requirements.
  34. Vidal-Quist, J., Ortego, F., Rombauts, S., Castanera, P., & Hernandez-Crespo, P. (2017). Dietary shifts have consequences for the repertoire of allergens produced by the European house dust mite. MEDICAL AND VETERINARY ENTOMOLOGY, 31(3), 272–280.
    Products manufactured from mass-cultured house dust mites, currently commercialized for the diagnosis and immunotherapy of allergy, are heterogeneous in terms of allergen composition and thus present concerns to regulatory authorities. The most abundant species, Dermatophagoides pteronyssinus (Trouessart) (Astigmata: Pyroglyphidae), produces 19 allergenic proteins. Many of these are putatively involved in mite digestive physiology and metabolism. This study aimed to evaluate the effects of mite-rearing media on allergen production. Mites were adapted to feed on culture media supplemented with proteins, lipids, carbohydrates or beard shavings, and collected to quantify major allergens (Der p 1 and 2) by immunodetection, transcription of allergen genes by real-time quantitative polymerase chain reaction, and allergen-related enzymatic activities. All culture media significantly affected the content of major allergens. Modification of macronutrients in the diet produced minor effects on the transcription of allergen genes, but significantly altered mite allergen-related activities. The most remarkable impacts were detected in mites feeding on beard shavings and were reflected in reductions in the content of major allergens, alterations in the transcription of nine allergen genes, and changes in eight allergen-related activities. These results demonstrate the importance of culture media to the quality and consistency of mite extracts used for pharmaceuticals, and highlight the need to further elucidate allergen production by mites in the laboratory and in domestic environments.
  35. Miclotte, G., Heydari, M., Demeester, P., Rombauts, S., Van de Peer, Y., Audenaert, P., & Fostier, J. (2016). Jabba: hybrid error correction for long sequencing reads. ALGORITHMS FOR MOLECULAR BIOLOGY, 11, 10.
    Background: Third generation sequencing platforms produce longer reads with higher error rates than second generation technologies. While the improved read length can provide useful information for downstream analysis, underlying algorithms are challenged by the high error rate. Error correction methods in which accurate short reads are used to correct noisy long reads appear to be attractive to generate high-quality long reads. Methods that align short reads to long reads do not optimally use the information contained in the second generation data, and suffer from large runtimes. Recently, a new hybrid error correcting method has been proposed, where the second generation data is first assembled into a de Bruijn graph, on which the long reads are then aligned. Results: In this context we present Jabba, a hybrid method to correct long third generation reads by mapping them on a corrected de Bruijn graph that was constructed from second generation data. Unique to our method is the use of a pseudo alignment approach with a seed-and-extend methodology, using maximal exact matches (MEMs) as seeds. In addition to benchmark results, certain theoretical results concerning the possibilities and limitations of the use of MEMs in the context of third generation reads are presented. Conclusion: Jabba produces highly reliable corrected reads: almost all corrected reads align to the reference, and these alignments have a very high identity. Many of the aligned reads are error-free. Additionally, Jabba corrects reads using a very low amount of CPU time. From this we conclude that pseudo alignment with MEMs is a fast and reliable method to map long highly erroneous sequences on a de Bruijn graph.
  36. Cao, T. N. P., Greenhalgh, R., Dermauw, W., Rombauts, S., Bajda-Wybouw, S., Zhurov, V., … Clark, R. M. (2016). Complex evolutionary dynamics of massively expanded chemosensory receptor families in an extreme generalist chelicerate herbivore. GENOME BIOLOGY AND EVOLUTION, 8(11), 3323–3339.
    While mechanisms to detoxify plant produced, anti-herbivore compounds have been associated with plant host use by herbivores, less is known about the role of chemosensory perception in their life histories. This is especially true for generalists, including chelicerate herbivores that evolved herbivory independently from the more studied insect lineages. To shed light on chemosensory perception in a generalist herbivore, we characterized the chemosensory receptors (CRs) of the chelicerate two-spotted spider mite, Tetranychus urticae, an extreme generalist. Strikingly, T. urticae has more CRs than reported in any other arthropod to date. Including pseudogenes, 689 gustatory receptors were identified, as were 136 degenerin/Epithelial Na+ Channels (ENaCs) that have also been implicated as CRs in insects. The genomic distribution of T. urticae gustatory receptors indicates recurring bursts of lineage-specific proliferations, with the extent of receptor clusters reminiscent of those observed in the CR-rich genomes of vertebrates or C. elegans. Although pseudogenization of many gustatory receptors within clusters suggests relaxed selection, a subset of receptors is expressed. Consistent with functions as CRs, the genomic distribution and expression of ENaCs in lineage-specific T. urticae expansions mirrors that observed for gustatory receptors. The expansion of ENaCs in T. urticae to > 3-fold that reported in other animals was unexpected, raising the possibility that ENaCs in T. urticae have been co-opted to fulfill a major role performed by unrelated CRs in other animals. More broadly, our findings suggest an elaborate role for chemosensory perception in generalist herbivores that are of key ecological and agricultural importance.
  37. Saltykova, A., Pulido-Tamayo, S., Pazoutova, M., Rensing, S. A., Nishiyama, T., Van de Peer, Y., … Rombauts, S. (2015). Identifying prokaryotic consortia that live in close interactions with algae. EUROPEAN JOURNAL OF PHYCOLOGY, 50(suppl. 1), 145–146.
  38. Blanc-Mathieu, R., Verhelst, B., Derelle, E., Rombauts, S., Bouget, F.-Y., Carre, I., … Piganeau, G. (2014). An improved genome of the model marine alga Ostreococcus tauri unfolds by assessing Illumina de novo assemblies. BMC GENOMICS, 15.
    Background: Cost effective next generation sequencing technologies now enable the production of genomic datasets for many novel planktonic eukaryotes, representing an understudied reservoir of genetic diversity. O. tauri is the smallest free-living photosynthetic eukaryote known to date, a coccoid green alga that was first isolated in 1995 in a lagoon by the Mediterranean sea. Its simple features, ease of culture and the sequencing of its 13 Mb haploid nuclear genome have promoted this microalga as a new model organism for cell biology. Here, we investigated the quality of genome assemblies of Illumina GAIIx 75 bp paired end reads from Ustreococcus touri, thereby also improving the existing assembly and showing the genome to be stably maintained in culture. Results: The 3 assemblers used, ABySS, CLCBio and Velvet, produced 95% complete genomes in 1402 to 2080 scaffolds with a very low rate of misassembly. Reciprocally, these assemblies improved the original genome assembly by filling in 930 gaps. Combined with additional analysis of raw reads and PCR sequencing effort, 1194 gaps have been solved in total adding up to 460 kb of sequence. Mapping of RNAseq II lumina data on this updated genome led to a twofold reduction in the proportion of multi-exon protein coding genes, representing 19% of the total 7699 protein coding genes. The comparison of the DNA extracted in 2001 and 2009 revealed the fixation of 8 single nucleotide substitutions and 2 deletions during the approximately 6000 generations in the lab. The deletions either knocked out or truncated two predicted transmembrane proteins, including a glutamate receptor like gene. Conclusion: High coverage (>80 fold) paired end Illumina sequencing enables a high quality 95% complete genome assembly of a compact 13 Mb haploid eukaryote. This genome sequence has remained stable for 6000 generations of lab culture.
  39. Grimplet, J., Adam-Blondon, A.-F., Bert, P.-F., Bitz, O., Cantu, D., Davies, C., … Cramer, G. R. (2014). The grapevine gene nomenclature system. BMC GENOMICS, 15.
    Background: Grapevine (Vitis vinifera L.) is one of the most important fruit crops in the world and serves as a valuable model for fruit development in woody species. A major breakthrough in grapevine genomics was achieved in 2007 with the sequencing of the Vitis vinifera cv. PN40024 genome. Subsequently, data on structural and functional characterization of grape genes accumulated exponentially. To better exploit the results obtained by the international community, we think that a coordinated nomenclature for gene naming in species with sequenced genomes is essential. It will pave the way for the accumulation of functional data that will enable effective scientific discussion and discovery. The exploitation of data that were generated independently of the genome release is hampered by their heterogeneous nature and by often incompatible and decentralized storage. Classically, large amounts of data describing gene functions are only available in printed articles and therefore remain hardly accessible for automatic text mining. On the other hand, high throughput "Omics" data are typically stored in public repositories, but should be arranged in compendia to better contribute to the annotation and functional characterization of the genes. Results: With the objective of providing a high quality and highly accessible annotation of grapevine genes, the International Grapevine Genome Project (IGGP) commissioned an international Super-Nomenclature Committee for Grape Gene Annotation (sNCGGa) to coordinate the effort of experts to annotate the grapevine genes. The goal of the committee is to provide a standard nomenclature for locus identifiers and to define conventions for a gene naming system in this paper. Conclusions: Learning from similar initiatives in other plant species such as Arabidopsis, rice and tomato, a versatile nomenclature system has been developed in anticipation of future genomic developments and annotation issues. The sNCGGa's first outreach to the grape community has been focused on implementing recommended guidelines for the expert annotators by: (i) providing a common annotation platform that enables community-based gene curation, (ii) developing a gene nomenclature scheme reflecting the biological features of gene products that is consistent with that used in other organisms in order to facilitate comparative analyses.
  40. Andolfo, G., Sanseverino, W., Rombauts, S., Van de Peer, Y., Bradeen, J., Carputo, D., … Ercolano, M. (2013). Overview of tomato (Solanum lycopersicum) candidate pathogen recognition genes reveals important Solanum R locus dynamics. NEW PHYTOLOGIST, 197(1), 223–237.
    To investigate the genome-wide spatial arrangement of R loci, a complete catalogue of tomato (Solanum lycopersicum) and potato (Solanum tuberosum) nucleotide-binding site (NBS) NBS, receptor-like protein (RLP) and receptor-like kinase (RLK) gene repertories was generated. Candidate pathogen recognition genes were characterized with respect to structural diversity, phylogenetic relationships and chromosomal distribution. NBS genes frequently occur in clusters of related gene copies that also include RLP or RLK genes. This scenario is compatible with the existence of selective pressures optimizing coordinated transcription. A number of duplication events associated with lineage-specific evolution were discovered. These findings suggest that different evolutionary mechanisms shaped pathogen recognition gene cluster architecture to expand and to modulate the defence repertoire. Analysis of pathogen recognition gene clusters associated with documented resistance function allowed the identification of adaptive divergence events and the reconstruction of the evolution history of these loci. Differences in candidate pathogen recognition gene number and organization were found between tomato and potato. Most candidate pathogen recognition gene orthologues were distributed at less than perfectly matching positions, suggesting an ongoing lineage-specific rearrangement. Indeed, a local expansion of Toll/Interleukin-1 receptor (TIR)-NBS-leucine-rich repeat (LRR) (TNL) genes in the potato genome was evident. Taken together, these findings have implications for improved understanding of the mechanisms of molecular adaptive selection at Solanum R loci.
  41. Dermauw, W., Wybouw, N., Rombauts, S., Menten, B., Vontas, J., Grbić, M., … Van Leeuwen, T. (2013). A link between host plant adaptation and pesticide resistance in the polyphagous spider mite Tetranychus urticae. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 110(2), E113–E122.
    Plants produce a wide range of allelochemicals to defend against herbivore attack, and generalist herbivores have evolved mechanisms to avoid, sequester, or detoxify a broad spectrum of natural defense compounds. Successful arthropod pests have also developed resistance to diverse classes of pesticides and this adaptation is of critical importance to agriculture. To test whether mechanisms to overcome plant defenses predispose the development of pesticide resistance, we examined adaptation of the generalist two-spotted spider mite, Tetranychus urticae, to host plant transfer and pesticides. T. urticae is an extreme polyphagous pest with more than 1,100 documented hosts and has an extraordinary ability to develop pesticide resistance. When mites from a pesticide-susceptible strain propagated on bean were adapted to a challenging host (tomato), transcriptional responses increased over time with ∼7.5% of genes differentially expressed after five generations. Whereas many genes with altered expression belonged to known detoxification families (like P450 monooxygenases), new gene families not previously associated with detoxification in other herbivores showed a striking response, including ring-splitting dioxygenase genes acquired by horizontal gene transfer. Strikingly, transcriptional profiles of tomato-adapted mites resembled those of multipesticide-resistant strains, and adaptation to tomato decreased the susceptibility to unrelated pesticide classes. Our findings suggest key roles for both an expanded environmental response gene repertoire and transcriptional regulation in the life history of generalist herbivores. They also support a model whereby selection for the ability to mount a broad response to the diverse defense chemistry of plants predisposes the evolution of pesticide resistance in generalists.
  42. Pollier, J., Rombauts, S., & Goossens, A. (2013). Analysis of RNA-Seq data with TopHat and Cufflinks for genome-wide expression analysis of jasmonate-treated plants and plant cell cultures. In A. Goossens & L. Pauwels (Eds.), Jasmonate signaling : methods and protocols (Vol. 1011, pp. 305–315).
    The recent development of various deep sequencing techniques has led to the most powerful transcript profiling method available to date, RNA sequencing or RNA-Seq. Besides the identification of new genes and new splice variants of known genes, RNA-Seq allows to compare the whole transcriptome of any organism under two or more experimental conditions, such as before and after jasmonate treatment. However, the vast amounts of data generated during RNA-Seq experiments require complex computational methods for read mapping and expression quantification. Here, we describe a detailed protocol for the analysis of deep sequencing data, starting from the raw RNA-Seq reads. First, a quality check is performed on the raw reads to assess the quality of the sequencing. Subsequently, adapters and low-quality sequences are trimmed off the raw reads. The resulting processed reads are mapped to the reference genome, and the mapped reads are counted to generate expression data for the annotated genes for each sample. This method can be used for the analysis of RNA-Seq data of any organism for which a reference genome is available.
  43. Zimmer, A. D., Lang, D., Buchta, K., Rombauts, S., Nishiyama, T., Hasebe, M., … Reski, R. (2013). Reannotation and extended community resources for the genome of the non-seed plant Physcomitrella patens provide insights into the evolution of plant gene structures and functions. BMC GENOMICS, 14.
    Background: The moss Physcomitrella patens as a model species provides an important reference for early-diverging lineages of plants and the release of the genome in 2008 opened the doors to genome-wide studies. The usability of a reference genome greatly depends on the quality of the annotation and the availability of centralized community resources. Therefore, in the light of accumulating evidence for missing genes, fragmentary gene structures, false annotations and a low rate of functional annotations on the original release, we decided to improve the moss genome annotation. Results: Here, we report the complete moss genome re-annotation (designated V1.6) incorporating the increased transcript availability from a multitude of developmental stages and tissue types. We demonstrate the utility of the improved P. patens genome annotation for comparative genomics and new extensions to the resource as a central repository for this plant "flagship" genome. The structural annotation of 32,275 protein-coding genes results in 8387 additional loci including 1456 loci with known protein domains or homologs in Plantae. This is the first release to include information on transcript isoforms, suggesting alternative splicing events for at least 10.8% of the loci. Furthermore, this release now also provides information on non-protein-coding loci. Functional annotations were improved regarding quality and coverage, resulting in 58% annotated loci (previously: 41%) that comprise also 7200 additional loci with GO annotations. Access and manual curation of the functional and structural genome annotation is provided via the model organism database. Conclusions: Comparative analysis of gene structure evolution along the green plant lineage provides novel insights, such as a comparatively high number of loci with 5'-UTR introns in the moss. Comparative analysis of functional annotations reveals expansions of moss house-keeping and metabolic genes and further possibly adaptive, lineage-specific expansions and gains including at least 13% orphan genes.
  44. Van Moerkercke, A., Fabris, M., Pollier, J., Baart, G., Rombauts, S., Hasnain, G., … Goossens, A. (2013). CathaCyc, a metabolic pathway database built from Catharanthus roseus RNA-Seq data. PLANT AND CELL PHYSIOLOGY, 54(5), 673–685.
    drugs vinblastine and vincristine. The TIA pathway operates in a complex metabolic network that steers plant growth and survival. Pathway databases and metabolic networks reconstructed from 'omics' sequence data can help to discover missing enzymes, study metabolic pathway evolution and, ultimately, engineer metabolic pathways. To date, such databases have mainly been built for model plant species with sequenced genomes. Although genome sequence data are not available for most medicinal plant species, next-generation sequencing is now extensively employed to create comprehensive medicinal plant transcriptome sequence resources. Here we report on the construction of CathaCyc, a detailed metabolic pathway database, from C. roseus RNA-Seq data sets. CathaCyc (version 1.0) contains 390 pathways with 1,347 assigned enzymes and spans primary and secondary metabolism. Curation of the pathways linked with the synthesis of TIAs and triterpenoids, their primary metabolic precursors, and their elicitors, the jasmonate hormones, demonstrated that RNA-Seq resources are suitable for the construction of pathway databases. CathaCyc is accessible online ( ext-link-type="uri" xlink:href="" xmlns:xlink=""> and offers a range of tools for the visualization and analysis of metabolic networks and 'omics' data. Overlay with expression data from publicly available RNA-Seq resources demonstrated that two well-characterized C. roseus terpenoid pathways, those of TIAs and triterpenoids, are subject to distinct regulation by both developmental and environmental cues. We anticipate that databases such as CathaCyc will become key to the study and exploitation of the metabolism of medicinal plants.
  45. Pavlidi, N., Dermauw, W., Rombauts, S., Chrisargiris, A., Van Leeuwen, T., & Vontas, J. (2013). Analysis of the olive fruit fly Bactrocera oleae transcriptome and phylogenetic classification of the major detoxification gene families. PLOS ONE, 8(6).
    he olive fruit fly Bactrocera oleae has a unique ability to cope with olive flesh, and is the most destructive pest of olives worldwide. Its control has been largely based on the use of chemical insecticides, however, the selection of insecticide resistance against several insecticides has evolved. The study of detoxification mechanisms, which allow the olive fruit fly to defend against insecticides, and/or phytotoxins possibly present in the mesocarp, has been hampered by the lack of genomic information in this species. In the NCBI database less than 1,000 nucleotide sequences have been deposited, with less than 10 detoxification gene homologues in total. We used 454 pyrosequencing to produce, for the first time, a large transcriptome dataset for B. oleae. A total of 482,790 reads were assembled into 14,204 contigs. More than 60% of those contigs (8,630) were larger than 500 base pairs, and almost half of them matched with genes of the order of the Diptera. Analysis of the Gene Ontology (GO) distribution of unique contigs, suggests that, compared to other insects, the assembly is broadly representative for the B. oleae transcriptome. Furthermore, the transcriptome was found to contain 55 P450, 43 GST-, 15 CCE- and 18 ABC transporter-genes. Several of those detoxification genes, may putatively be involved in the ability of the olive fruit fly to deal with xenobiotics, such as plant phytotoxins and insecticides. In summary, our study has generated new data and genomic resources, which will substantially facilitate molecular studies in B. oleae, including elucidation of detoxification mechanisms of xenobiotic, as well as other important aspects of olive fruit fly biology.
  46. Sato, S., Tabata, S., Hirakawa, H., Asamizu, E., Shirasawa, K., Isobe, S., … Gianese, G. (2012). The tomato genome sequence provides insights into fleshy fruit evolution. NATURE, 485(7400), 635–641.
    Tomato (Solanum lycopersicum) is a major crop plant and a model system for fruit development. Solanum is one of the largest angiosperm genera(1) and includes annual and perennial plants from diverse habitats. Here we present a high-quality genome sequence of domesticated tomato, a draft sequence of its closest wild relative, Solanum pimpinellifolium(2), and compare them to each other and to the potato genome (Solanum tuberosum). The two tomato genomes show only 0.6% nucleotide divergence and signs of recent admixture, but show more than 8% divergence from potato, with nine large and several smaller inversions. In contrast to Arabidopsis, but similar to soybean, tomato and potato small RNAs map predominantly to gene-rich chromosomal regions, including gene promoters. The Solanum lineage has experienced two consecutive genome triplications: one that is ancient and shared with rosids, and a more recent one. These triplications set the stage for the neofunctionalization of genes controlling fruit characteristics, such as colour and fleshiness.
  47. Moreau, H., Verhelst, B., Couloux, A., Derelle, E., Rombauts, S., Grimsley, N., … Vandepoele, K. (2012). Gene functionalities and genome structure in Bathycoccus prasinos reflect cellular specializations at the base of the green lineage. GENOME BIOLOGY, 13(8).
    Background: Bathycoccus prasinos is an extremely small cosmopolitan marine green alga whose cells are covered with intricate spider's web patterned scales that develop within the Golgi cisternae before their transport to the cell surface. The objective of this work is to sequence and analyze its genome, and to present a comparative analysis with other known genomes of the green lineage. Research: Its small genome of 15 Mb consists of 19 chromosomes and lacks transposons. Although 70% of all B. prasinos genes share similarities with other Viridiplantae genes, up to 428 genes were probably acquired by horizontal gene transfer, mainly from other eukaryotes. Two chromosomes, one big and one small, are atypical, an unusual synapomorphic feature within the Mamiellales. Genes on these atypical outlier chromosomes show lower GC content and a significant fraction of putative horizontal gene transfer genes. Whereas the small outlier chromosome lacks colinearity with other Mamiellales and contains many unknown genes without homologs in other species, the big outlier shows a higher intron content, increased expression levels and a unique clustering pattern of housekeeping functionalities. Four gene families are highly expanded in B. prasinos, including sialyltransferases, sialidases, ankyrin repeats and zinc ion-binding genes, and we hypothesize that these genes are associated with the process of scale biogenesis. Conclusion: The minimal genomes of the Mamiellophyceae provide a baseline for evolutionary and functional analyses of metabolic processes in green plants.
  48. Veenstra, J. A., Rombauts, S., & Grbić, M. (2012). In silico cloning of genes encoding neuropeptides, neurohormones and their putative G-protein coupled receptors in a spider mite. INSECT BIOCHEMISTRY AND MOLECULAR BIOLOGY, 42(4), 277–295.
  49. Fabris, M., Matthijs, M., Rombauts, S., Vyverman, W., Goossens, A., & Baart, G. (2012). The metabolic blueprint of Phaeodactylum tricornutum reveals a eukaryotic Entner-Doudoroff glycolytic pathway. PLANT JOURNAL, 70(6), 1004–1014.
    Diatoms are one of the most successful groups of unicellular eukaryotic algae. Successive endosymbiotic events contributed to their flexible metabolism, making them competitive in variable aquatic habitats. Although the recently sequenced genomes of the model diatoms Phaeodactylum tricornutum and Thalassiosira pseudonana have provided the first insights into their metabolic organization, the current knowledge on diatom biochemistry remains fragmentary. By means of a genome-wide approach, we developed DiatomCyc, a detailed pathway/genome database of P. tricornutum. DiatomCyc contains 286 pathways with 1719 metabolic reactions and 1613 assigned enzymes, spanning both the central and parts of the secondary metabolism of P. tricornutum. Central metabolic pathways, such as those of carbohydrates, amino acids and fatty acids, were covered. Furthermore, our understanding of the carbohydrate model in P. tricornutum was extended. In particular we highlight the discovery of a functional EntnerDoudoroff pathway, an ancient alternative for the glycolytic EmbdenMeyerhofParnas pathway, and a putative phosphoketolase pathway, both uncommon in eukaryotes. DiatomCyc is accessible online (), and offers a range of software tools for the visualization and analysis of metabolic networks and omics data. We anticipate that DiatomCyc will be key to gaining further understanding of diatom metabolism and, ultimately, will feed metabolic engineering strategies for the industrial valorization of diatoms.
  50. Young, N. D., Debellé, F., Oldroyd, G. E., Geurts, R., Cannon, S. B., Udvardi, M. K., … Roe, B. A. (2011). The Medicago genome provides insight into the evolution of rhizobial symbioses. NATURE, 480(7378), 520–524.
    Legumes (Fabaceae or Leguminosae) are unique among cultivated plants for their ability to carry out endosymbiotic nitrogen fixation with rhizobial bacteria, a process that takes place in a specialized structure known as the nodule. Legumes belong to one of the two main groups of eurosids, the Fabidae, which includes most species capable of endosymbiotic nitrogen fixation(1). Legumes comprise several evolutionary lineages derived from a common ancestor 60 million years ago (Myr ago). Papilionoids are the largest clade, dating nearly to the origin of legumes and containing most cultivated species(2). Medicago truncatula is a long-established model for the study of legume biology. Here we describe the draft sequence of the M. truncatula euchromatin based on a recently completed BAC assembly supplemented with Illumina shotgun sequence, together capturing similar to 94% of all M. truncatula genes. A whole-genome duplication (WGD) approximately 58 Myr ago had a major role in shaping the M. truncatula genome and thereby contributed to the evolution of endosymbiotic nitrogen fixation. Subsequent to the WGD, the M. truncatula genome experienced higher levels of rearrangement than two other sequenced legumes, Glycine max and Lotus japonicus. M. truncatula is a close relative of alfalfa (Medicago sativa), a widely cultivated crop with limited genomics tools and complex autotetraploid genetics. As such, the M. truncatula genome sequence provides significant opportunities to expand alfalfa's genomic toolbox.
  51. Mortier, V., Fenta, B. A., Martens, C., Rombauts, S., Holsters, M., Kunert, K., & Goormachtig, S. (2011). Search for nodulation-related CLE genes in the genome of Glycine max. JOURNAL OF EXPERIMENTAL BOTANY, 62(8), 2571–2583.
    CLE peptides are potentially involved in nodule organ development and in the autoregulation of nodulation (AON), a systemic process that restricts nodule number. A genome-wide survey of CLE peptide genes in the soybean glycine max genome resulted in the identification of 39 GmCLE genes, the majority of which have not yet been annotated. qRT-PCR analysis indicated two different nodulation-related CLE expression patterns, one linked with nodule primordium development and a new one linked with nodule maturation. Moreover, two GmCLE gene pairs, encoding group-III CLE peptides that were previously shown to be involved in AON, had a transient expression pattern during nodule development, were induced by the essential nodulation hormone cytokinin, and one pair was also slightly induced by the addition of nitrate. Hence, our data support the hypothesis that group-III CLE peptides produced in the nodules are involved in primordium homeostasis and intertwined in activating AON, but not in sustaining it.
  52. Grbić, M., Van Leeuwen, T., Clark, R. M., Rombauts, S., Rouzé, P., Grbić, V., … Van de Peer, Y. (2011). The genome of Tetranychus urticae reveals herbivorous pest adaptations. NATURE, 479(7374), 487–492.
    The spider mite Tetranychus urticae is a cosmopolitan agricultural pest with an extensive host plant range and an extreme record of pesticide resistance. Here we present the completely sequenced and annotated spider mite genome, representing the first complete chelicerate genome. At 90 megabases T. urticae has the smallest sequenced arthropod genome. Compared with other arthropods, the spider mite genome shows unique changes in the hormonal environment and organization of the Hox complex, and also reveals evolutionary innovation of silk production. We find strong signatures of polyphagy and detoxification in gene families associated with feeding on different hosts and in new gene families acquired by lateral gene transfer. Deep transcriptome analysis of mites feeding on different plants shows how this pest responds to a changing host environment. The T. urticae genome thus offers new insights into arthropod evolution and plant-herbivore interactions, and provides unique opportunities for developing novel plant protection strategies.
  53. Mortier, V., Den Herder, G., Whitford, R., Van De Velde, W., Rombauts, S., D’haeseleer, K., … Goormachtig, S. (2010). CLE peptides control Medicago truncatula nodulation locally and systemically. PLANT PHYSIOLOGY, 153(1), 222–237.
  54. Rehrauer, H., Aquino, C., Gruissem, W., Henz, S. R., Hilson, P., Laubinger, S., … Hennig, L. (2010). AGRONOMICS1: A New Resource for Arabidopsis Transcriptome Profiling. PLANT PHYSIOLOGY, 152(2), 487–499.
  55. Boruc, J., Mylle, E., Duda, M., De Clercq, R., Rombauts, S., Geelen, D., … Russinova, E. (2010). Systematic localization of the Arabidopsis core cell cycle proteins reveals novel cell division complexes. PLANT PHYSIOLOGY, 152(2), 553–565.
    Cell division depends on the correct localization of the cyclin-dependent kinases that are regulated by phosphorylation, cyclin proteolysis, and protein-protein interactions. Although immunological assays can define cell cycle protein abundance and localization, they are not suitable for detecting the dynamic rearrangements of molecular components during cell division. Here, we applied an in vivo approach to trace the subcellular localization of 60 Arabidopsis (Arabidopsis thaliana) core cell cycle proteins fused to green fluorescent proteins during cell division in tobacco (Nicotiana tabacum) and Arabidopsis. Several cell cycle proteins showed a dynamic association with mitotic structures, such as condensed chromosomes and the preprophase band in both species, suggesting a strong conservation of targeting mechanisms. Furthermore, colocalized proteins were shown to bind in vivo, strengthening their localization-function connection. Thus, we identified unknown spatiotemporal territories where functional cell cycle protein interactions are most likely to occur.
  56. Boruc, J., Van Den Daele, H., Hollunder, J., Rombauts, S., Mylle, E., Hilson, P., … Russinova, E. (2010). Functional modules in the Arabidopsis core cell cycle binary protein-protein interaction network. PLANT CELL, 22(4), 1264–1280.
    As in other eukaryotes, cell division in plants is highly conserved and regulated by cyclin-dependent kinases (CDKs) that are themselves predominantly regulated at the posttranscriptional level by their association with proteins such as cyclins. Although over the last years the knowledge of the plant cell cycle has considerably increased, little is known on the assembly and regulation of the different CDK complexes. To map protein-protein interactions between core cell cycle proteins of Arabidopsis thaliana, a binary protein-protein interactome network was generated using two complementary high-throughput interaction assays, yeast two-hybrid and bimolecular fluorescence complementation. Pairwise interactions among 58 core cell cycle proteins were tested, resulting in 357 interactions, of which 293 have not been reported before. Integration of the binary interaction results with cell cycle phase-dependent expression information and localization data allowed the construction of a dynamic interaction network. The obtained interaction map constitutes a framework for further in-depth analysis of the cell cycle machinery.
  57. Mueller, L., Klein Lankhorst, R., Tanksley, S. D., Giovannoni, J. J., White, R., Vrebalov, J., … Stiekema, W. (2009). A snapshot of the emerging tomato genome sequence. PLANT GENOME, 2(1), 78–92.
    The genome of tomato (Solanum lycopersicum L.) is being sequenced by an international consortium of 10 countries (Korea, China, the United Kingdom, India, the Netherlands, France, Japan, Spain, Italy, and the United States) as part of the larger “International Solanaceae Genome Project (SOL): Systems Approach to Diversity and Adaptation” initiative. The tomato genome sequencing project uses an ordered bacterial artificial chromosome (BAC) approach to generate a high-quality tomato euchromatic genome sequence for use as a reference genome for the Solanaceae and euasterids. Sequence is deposited at GenBank and at the SOL Genomics Network (SGN). Currently, there are around 1000 BACs finished or in progress, representing more than a third of the projected euchromatic portion of the genome. An annotation effort is also underway by the International Tomato Annotation Group. The expected number of genes in the euchromatin is ∼40,000, based on an estimate from a preliminary annotation of 11% of finished sequence. Here, we present this first snapshot of the emerging tomato genome and its annotation, a short comparison with potato (Solanum tuberosum L.) sequence data, and the tools available for the researchers to exploit this new resource are also presented. In the future, whole-genome shotgun techniques will be combined with the BAC-by-BAC approach to cover the entire tomato genome. The high-quality reference euchromatic tomato sequence is expected to be near completion by 2010.
  58. de Almeida Engler, J., De Veylder, L., De Groodt, R., Rombauts, S., Boudolf, V., De Meyer, B., … Engler, G. (2009). Systematic analysis of cell-cycle gene expression during Arabidopsis development. Plant Journal, 59(4), 645–660.
    The steady-state distribution of cell-cycle transcripts in Arabidopsis thaliana seedlings was studied in a broad in situ survey to provide a better understanding of the expression of cell-cycle genes during plant development. The 61 core cell-cycle genes analyzed were expressed at variable levels throughout the different plant tissues: 23 genes generally in dividing and young differentiating tissues, 34 genes mostly in both dividing and differentiated tissues and four gene transcripts primarily in differentiated tissues. Only 21 genes had a typical patchy expression pattern, indicating tight cell-cycle regulation. The increased expression of 27 cell-cycle genes in the root elongation zone hinted at their involvement in the switch from cell division to differentiation. The induction of 20 cell-cycle genes in differentiated cortical cells of etiolated hypocotyls pointed to their possible role in the process of endoreduplication. Of seven cyclin-dependent kinase inhibitor genes, five were upregulated in etiolated hypocotyls, suggesting a role in cell-cycle arrest. Nineteen genes were preferentially expressed in pericycle cells activated by auxin that give rise to lateral root primordia. Approximately 1800 images have been collected and can be queried via an online database. Our in situ analysis revealed that 70% of the cell-cycle genes, although expressed at different levels, show a large overlap in their localization. The lack of regulatory motifs in the upstream regions of the analyzed genes suggests the absence of a universal transcriptional control mechanism for all cell-cycle genes.
  59. Worden, A. Z., Lee, J.-H., Mock, T., Rouzé, P., Simmons, M. P., Aerts, A. L., … Grigoriev, I. V. (2009). Green evolution and dynamic adaptations revealed by genomes of the marine picoeukaryotes Micromonas. SCIENCE, 324(5924), 268–272.
    Picoeukaryotes are a taxonomically diverse group of organisms less than 2 micrometers in diameter. Photosynthetic marine picoeukaryotes in the genus Micromonas thrive in ecosystems ranging from tropical to polar and could serve as sentinel organisms for biogeochemical fluxes of modern oceans during climate change. These broadly distributed primary producers belong to an anciently diverged sister clade to land plants. Although Micromonas isolates have high 18S ribosomal RNA gene identity, we found that genomes from two isolates shared only 90% of their predicted genes. Their independent evolutionary paths were emphasized by distinct riboswitch arrangements as well as the discovery of intronic repeat elements in one isolate, and in metagenomic data, but not in other genomes. Divergence appears to have been facilitated by selection and acquisition processes that actively shape the repertoire of genes that are mutually exclusive between the two isolates differently than the core genes. Analyses of the Micromonas genomes offer valuable insights into ecological differentiation and the dynamic nature of early plant evolution.
  60. Den Herder, G., De Keyser, A., De Rycke, R., Rombauts, S., Van De Velde, W., Clemente, M. R., … Goormachtig, S. (2008). Seven in absentia proteins affect plant growth and nodulation in Medicago truncatula. PLANT PHYSIOLOGY, 148(1), 369–382.
    Protein ubiquitination is a posttranslational regulatory process essential for plant growth and interaction with the environment. E3 ligases, to which the seven in absentia (SINA) proteins belong, determine the specificity by selecting the target proteins for ubiquitination. SINA proteins are found in animals as well as in plants, and a small gene family with highly related members has been identified in the genome of rice (Oryza sativa), Arabidopsis (Arabidopsis thaliana), Medicago truncatula, and poplar (Populus trichocarpa). To acquire insight into the function of SINA proteins in nodulation, a dominant negative form of the Arabidopsis SINAT5 was ectopically expressed in the model legume M. truncatula. After rhizobial inoculation of the 35S:SINAT5DN transgenic plants, fewer nodules were formed than in control plants, and most nodules remained small and white, a sign of impaired symbiosis. Defects in rhizobial infection and symbiosome formation were observed by extensive microscopic analysis. Besides the nodulation phenotype, transgenic plants were affected in shoot growth, leaf size, and lateral root number. This work illustrates a function for SINA E3 ligases in a broad spectrum of plant developmental processes, including nodulation.
  61. Rensing, S. A., Lang, D., Zimmer, A. D., Terry, A., Salamov, A., Shapiro, H., … Boore, J. L. (2008). The Physcomitrella genome reveals evolutionary insights into the conquest of land by plants. SCIENCE, 319(5859), 64–69.
    We report the draft genome sequence of the model moss Physcomitrella patens and compare its features with those of flowering plants, from which it is separated by more than 400 million years, and unicellular aquatic algae. This comparison reveals genomic changes concomitant with the evolutionary movement to land, including a general increase in gene family complexity; loss of genes associated with aquatic environments ( e. g., flagellar arms); acquisition of genes for tolerating terrestrial stresses ( e. g., variation in temperature and water availability); and the development of the auxin and abscisic acid signaling pathways for coordinating multicellular growth and dehydration response. The Physcomitrella genome provides a resource for phylogenetic inferences about gene function and for experimental analysis of plant processes through this plant's unique facility for reverse genetics.
  62. Foissac, S., Gouzy, J., Rombauts, S., Mathé, C., Amselem, J., Sterck, L., … Schiex, T. (2008). Genome annotation in plants and fungi: EuGène as a model platform. CURRENT BIOINFORMATICS, 3(2), 87–97.
    In this era of whole genome sequencing, reliable genome annotations ( identification of functional regions) are the cornerstones for many subsequent analyses. Not only is careful annotation important for studying the gene and gene family content of a genome and its host, but also for wide-scale transcriptome and proteome analyses attempting to describe a certain biological process or to get a global picture of a cell's behavior. Although the number of sequenced genomes is increasing thanks to the application of new technologies, genome-wide analyses will critically depend on the quality of the genome annotations. However, the annotation process is more complicated in the plant field than in the animal field because of the limited funding that leads to much fewer experimental data and less annotation expertise. This situation calls for highly automated annotation platforms that can make the best use of all available data, experimental or not. We discuss how the gene prediction (the process of predicting protein gene structures in genomic sequences) research field increasingly shifts from methods that typically exploited one or two types of data to more integrative approaches that simultaneously deal with various experimental, statistical, or other in silico evidence. We illustrate the importance of integrative approaches for producing high-quality automatic annotations of genomes of plants and algae as well as of fungi that live in close association with plants using the platform EuGene as an example.
  63. Benhamed, M., Martin-Magniette, M.-L., Taconnat, L., Bitton, F., Servet, C., De Clercq, R., … Hilson, P. (2008). Genome-scale Arabidopsis promoter array identifies targets of the histone acetyltransferase GCN5. PLANT JOURNAL, 56(3), 493–504.
    We have assembled approximately 20 000 Arabidopsis thaliana promoter regions, compatible with functional studies that require cloning and with microarray applications. The promoter fragments can be captured as modular entry clones (MultiSite Gateway format) via site-specific recombinational cloning, and transferred into vectors of choice to investigate transcriptional networks. The fragments can also be amplified by PCR and printed on glass arrays. In combination with immunoprecipitation of protein-DNA complexes (ChIP-chip), these arrays enable characterization of binding sites for chromatin-associated proteins or the extent of chromatin modifications at genome scale. The Arabidopsis histone acetyltransferase GCN5 associated with 40% of the tested promoters. At most sites, binding did not depend on the integrity of the GCN5 bromodomain. However, the presence of the bromodomain was necessary for binding to 11% of the promoter regions, and correlated with acetylation of lysine 14 of histone H3 in these promoters. Combined analysis of ChIP-chip and transcriptomic data indicated that binding of GCN5 does not strictly correlate with gene activation. GCN5 has previously been shown to be required for light-regulated gene expression and growth, and we found that GCN5 targets were enriched in early light-responsive genes. Thus, in addition to its transcriptional activation function, GCN5 may play an important role in priming activation of inducible genes under non-induced conditions.
  64. Capoen, W., Den Herder, J., Rombauts, S., De Gussem, J., De Keyser, A., Holsters, M., & Goormachtig, S. (2007). Comparative transcriptome analysis reveals common and specific tags for root hair and crack-entry invasion in Sesbania rostrata. PLANT PHYSIOLOGY, 144(4), 1878–1889.
    The tropical legume Sesbania rostrata provides its microsymbiont Azorhizobium caulinodans with versatile invasion strategies to allow nodule formation in temporarily flooded habitats. In aerated soils, the bacteria enter via the root hair curling mechanism. Submergence prevents this epidermal invasion by accumulation of inhibiting concentrations of ethylene and, under these conditions, the bacterial colonization occurs via intercellular cortical infection at lateral root bases. The transcriptome of both invasion ways was compared by cDNA-amplified fragment length polymorphism analysis. Clusters of gene tags were identified that were specific for either epidermal or cortical invasion or were shared by both. The data provide insight into mechanisms that control infection and illustrate that entry via the epidermis adds a layer of complexity to rhizobial invasion.
  65. Den Herder, J., Lievens, S., Rombauts, S., Holsters, M., & Goormachtig, S. (2007). A symbiotic plant peroxidase involved in bacterial invasion of the tropical legume Sesbania rostrata. PLANT PHYSIOLOGY, 144(2), 717–727.
    Aquatic nodulation on the tropical legume Sesbania rostrata occurs at lateral root bases via intercellular crack-entry invasion. A gene was identified (Srprx1) that is transiently up-regulated during the nodulation process and codes for a functional class III plant peroxidase. The expression strictly depended on bacterial nodulation factors (NFs) and could be modulated by hydrogen peroxide, a downstream signal for crack-entry invasion. Expression was not induced after wounding or pathogen attack, indicating that the peroxidase is a symbiosis-specific isoform. In situ hybridization showed Srprx1 transcripts around bacterial infection pockets and infection threads until they reached the central tissue of the nodule. A root nodule extensin (SrRNE1) colocalized with Srprx1 both in time and space and had the same NF requirement, suggesting a function in a similar process. Finally, in mixed inoculation nodules that were invaded by NF-deficient bacteria and differed in infection thread progression, infection-associated peroxidase transcripts were not observed. Lack of Srprx1 gene expression could be one of the causes for the aberrant structure of the infection threads.
  66. Sterck, L., Rombauts, S., Vandepoele, K., Rouzé, P., & Van de Peer, Y. (2007). How many genes are there in plants (... and why are they there)? CURRENT OPINION IN PLANT BIOLOGY, 10(2), 199–203.
    Annotation of the first few complete plant genomes has revealed that plants have many genes. For Arabidopsis, over 26 500 gene loci have been predicted, whereas for rice, the number adds up to 41 000. Recent analysis of the poplar genome suggests more than 45 000 genes, and partial sequence data from Medicago and Lotus also suggest that these plants contain more than 40 000 genes. Nevertheless, estimations suggest that ancestral angiosperms had no more than 12 000-14 000 genes. One explanation for the large increase in gene number during angiosperm evolution is gene duplication. It has been shown previously that the retention of duplicates following small- and large-scale duplication events in plants is substantial. Taking into account the function of genes that have been duplicated, we are now beginning to understand why many plant genes might have been retained, and how their retention might be linked to the typical lifestyle of plants.
  67. Palenik, B., Grimwood, J., Aerts, A., Rouzé, P., Salamov, A., Putnam, N., … Grigoriev, I. V. (2007). The tiny eukaryote Ostreococcus provides genomic insights into the paradox of plankton speciation. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 104(18), 7705–7710.
    The smallest known eukaryotes, at approximate to 1-mu m diameter, are ostreococcus tauri and related species of marine phytoplankton. The genome of Ostreococcus lucimarinus has been completed and compared with that of O. tauri. This comparison reveals surprising differences across orthologous chromosomes in the two species from highly syntenic chromosomes in most cases to chromosomes with almost no similarity. Species divergence in these phytoplankton is occurring through multiple mechanisms acting differently on different chromosomes and likely including acquisition of new genes through horizontal gene transfer. We speculate that this latter process may be involved in altering the cell-surface characteristics of each species. In addition, the genome of O. lucimarinus provides insights into the unique metal metabolism of these organisms, which are predicted to have a large number of selenocysteine-containing proteins. Selenoenzymes are more catalytically active than similar enzymes lacking selenium, and thus the cell may require less of that protein. As reported here, selenoenzymes, novel fusion proteins, and loss of some major protein families including ones associated with chromatin are likely important adaptations for achieving a small cell size.
  68. Fawcett, J., Rombauts, S., Pattyn, P., Sterck, L., & Van de Peer, Y. (2007). The annotation and analysis of the genome of Arabidopsis lyrata. GENES & GENETIC SYSTEMS, 82(6), 520–520.
  69. Ruttink, T., Arend, M., Morreel, K., Storme, V., Rombauts, S., Fromm, J., … Rohde, A. (2007). A molecular timetable for apical bud formation and dormancy induction in poplar. PLANT CELL, 19(8), 2370–2390.
    The growth of perennial plants in the temperate zone alternates with periods of dormancy that are typically initiated during bud development in autumn. In a systems biology approach to unravel the underlying molecular program of apical bud development in poplar (Populus tremula 3 Populus alba), combined transcript and metabolite profiling were applied to a high-resolution time course from short-day induction to complete dormancy. Metabolite and gene expression dynamics were used to reconstruct the temporal sequence of events during bud development. Importantly, bud development could be dissected into bud formation, acclimation to dehydration and cold, and dormancy. To each of these processes, specific sets of regulatory and marker genes and metabolites are associated and provide a reference frame for future functional studies. Light, ethylene, and abscisic acid signal transduction pathways consecutively control bud development by setting, modifying, or terminating these processes. Ethylene signal transduction is positioned temporally between light and abscisic acid signals and is putatively activated by transiently low hexose pools. The timing and place of cell proliferation arrest (related to dormancy) and of the accumulation of storage compounds (related to acclimation processes) were established within the bud by electron microscopy. Finally, the identification of a large set of genes commonly expressed during the growth-to-dormancy transitions in poplar apical buds, cambium, or Arabidopsis thaliana seeds suggests parallels in the underlying molecular mechanisms in different plant organs.
  70. Derelle, E., Ferraz, C., Rombauts, S., Rouzé, P., Worden, A. Z., Robbens, S., … Moreau, H. (2006). Genome analysis of the smallest free-living eukaryote Ostreococcus tauri unveils many unique features. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 103(31), 11647–11652.
    The green lineage is reportedly 1,500 million years old, evolving shortly after the endosymbiosis event that gave rise to early photosynthetic eukaryotes. In this study, we unveil the complete genome sequence of an ancient member of this lineage, the unicellular green alga Ostreococcus tauri (Prasinophyceae). This cosmopolitan marine primary producer is the world's smallest free-living eukaryote known to date. Features likely reflecting optimization of environmentally relevant pathways, including resource acquisition, unusual photosynthesis apparatus, and genes potentially involved in C-4 photosynthesis, were observed, as was downsizing of many gene families. Overall, the 12.56-Mb nuclear genome has an extremely high gene density, in part because of extensive reduction of intergenic regions and other forms of compaction such as gene fusion. However, the genome is structurally complex. It exhibits previously unobserved levels of heterogeneity for a eukaryote. Two chromosomes differ structurally from the other eighteen. Both have a significantly biased G+C content, and, remarkably, they contain the majority of transposable elements. Many chromosome 2 genes also have unique codon usage and splicing, but phylogenetic analysis and composition do not support alien gene origin. In contrast, most chromosome 19 genes show no similarity to green lineage genes and a large number of them are specialized in cell surface processes. Taken together, the complete genome sequence, unusual features, and downsized gene families, make O. tauri an ideal model system for research on eukaryotic genome evolution, including chromosome specialization and green lineage ancestry.
  71. Cannon, S. B., Sterck, L., Rombauts, S., Sato, S., Cheung, F., Gouzy, J., … Young, N. D. (2006). Legume genome evolution viewed through the Medicago truncatula and Lotus japonicus genomes. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 103(40), 14959–14964.
    Genome sequencing of the model legumes, Medicago truncatula and Lotus japonicus, provides an opportunity for large-scale sequence-based comparison of two genomes in the same plant family. Here we report synteny comparisons between these species, including details about chromosome relationships, large-scale synteny blocks, microsynteny within blocks, and genome regions lacking clear correspondence. The Lotus and Medicago genomes share a minimum of 10 large-scale synteny blocks, each with substantial collinearity and frequently extending the length of whole chromosome arms. The proportion of genes syntenic and collinear within each synteny block is relatively homogeneous. Medicago-Lotus comparisons also indicate similar and largely homogeneous gene densities, although gene-containing regions in Mt occupy 20-30% more space than Lj counterparts, primarily because of larger numbers of Mt retrotransposons. Because the interpretation of genome comparisons is complicated by large-scale genome duplications, we describe synteny, synonymous substitutions and phylogenetic analyses to identify and date a probable whole-genome duplication event. There is no direct evidence for any recent large-scale genome duplication in either Medicago or Lotus but instead a duplication predating speciation. Phylogenetic comparisons place this duplication within the Rosid I clade, clearly after the split between legumes and Salicaceae (poplar).
  72. Van De Velde, W., Pérez Guerra, J. C., De Keyser, A., De Rycke, R., Rombauts, S., Maunoury, N., … Goormachtig, S. (2006). Aging in legume symbiosis : a molecular view on nodule senescence in Medicago truncatula. PLANT PHYSIOLOGY, 141(2), 711–720.
    Rhizobia reside as symbiosomes in the infected cells of legume nodules to fix atmospheric nitrogen. The symbiotic relation is strictly controlled, lasts for some time, but eventually leads to nodule senescence. We present a comprehensive transcriptomics study to understand the onset of nodule senescence in the legume Medicago truncatula. Distinct developmental stages with characteristic gene expression were delineated during which the two symbiotic partners were degraded consecutively, marking the switch in nodule tissue status from carbon sink to general nutrient source. Cluster analysis discriminated an early expression group that harbored regulatory genes that might be primary tools to interfere with pod filling-related or stress-induced nodule senescence, ultimately causing prolonged nitrogen fixation. Interestingly, the transcriptomes of nodule and leaf senescence had a high degree of overlap, arguing for the recruitment of similar pathways.
  73. Tuskan, G., DiFazio, S., Jansson, S., Bohlmann, J., Grigoriev, I., Hellsten, U., … Rokhsar, D. (2006). The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). SCIENCE, 313(5793), 1596–1604.
    We report the draft genome of the black cottonwood tree, Populus trichocarpa. Integration of shotgun sequence assembly with genetic mapping enabled chromosome-scale reconstruction of the genome. More than 45,000 putative protein-coding genes were identified. Analysis of the assembled genome revealed a whole-genome duplication event; about 8000 pairs of duplicated genes from that event survived in the Populus genome. A second, older duplication event is indistinguishably coincident with the divergence of the Populus and Arabidopsis lineages. Nucleotide substitution, tandem gene duplication, and gross chromosomal rearrangement appear to proceed substantially more slowly in Populus than in Arabidopsis. Populus has more protein-coding genes than Arabidopsis, ranging on average from 1.4 to 1.6 putative Populus homologs for each Arabidopsis gene. However, the relative frequency of protein domains in the two genomes is similar. Overrepresented exceptions in Populus include genes associated with lignocellulosic wall biosynthesis, meristem development, disease resistance, and metabolite transport.
  74. Vanderauwera, S., Zimmermann, P., Rombauts, S., Vandenabeele, S., Langebartels, C., Gruissem, W., … Van Breusegem, F. (2005). Genome-wide analysis of hydrogen peroxide-regulated gene expression in Arabidopsis reveals a high light-induced transcriptional cluster involved in anthocyanin biosynthesis. PLANT PHYSIOLOGY, 139(2), 806–821.
    In plants, reactive oxygen species and, more particularly, hydrogen peroxide (H2O2) play a dual role as toxic by-products of normal cell metabolism and as regulatory molecules in stress perception and signal transduction. Peroxisomal catalases are an important sink for photorespiratory H2O2. Using ATH1 Affymetrix microarrays, expression profiles were compared between control and catalase-deficient Arabidopsis (Arabidopsis thaliana) plants. Reduced catalase levels already provoked differences in nuclear gene expression under ambient growth conditions, and these effects were amplified by high light exposure in a sun simulator for 3 and 8 h. This genome-wide expression analysis allowed us to reveal the expression characteristics of complete pathways and functional categories during H2O2 stress. In total, 349 transcripts were significantly up- regulated by high light in catalase-deficient plants and 88 were down-regulated. From this data set, H2O2 was inferred to play a key role in the transcriptional up- regulation of small heat shock proteins during high light stress. In addition, several transcription factors and candidate regulatory genes involved in H2O2 transcriptional gene networks were identified. Comparisons with other publicly available transcriptome data sets of abiotically stressed Arabidopsis revealed an important intersection with H2O2-deregulated genes, positioning elevated H2O2 levels as an important signal within abiotic stress-induced gene expression. Finally, analysis of transcriptional changes in a combination of a genetic (catalase deficiency) and an environmental (high light) perturbation identified a transcriptional cluster that was strongly and rapidly induced by high light in control plants, but impaired in catalase-deficient plants. This cluster comprises the complete known anthocyanin regulatory and biosynthetic pathway, together with genes encoding unknown proteins.
  75. Robbens, S., Rombauts, S., Rouzé, P., Wuyts, J., Saeys, Y., Moreau, H., & Van de Peer, Y. (2005). Genome analysis of the world’s smallest free-living eukaryote Ostreococcus tauri unveils unique genome heterogeneity. Proceedings of the Molecular Biology and Evolution Conference (MBE) 2005.
  76. Aubourg, S., Brunaud, V., Bruyère, C., Cock, M., Cooke, R., Cottet, A., … Lecharny, A. (2005). GeneFarm, structural and functional annotation of Arabidopsis gene and protein families by a network of experts. NUCLEIC ACIDS RESEARCH, 33(suppl. 1), D641–D646.
    Genomic projects heavily depend on genome annotations and are limited by the current deficiencies in the published predictions of gene structure and function. It follows that, improved annotation will allow better data mining of genomes, and more secure planning and design of experiments. The purpose of the GeneFarm project is to obtain homogeneous, reliable, documented and traceable annotations for Arabidopsis nuclear genes and gene products, and to enter them into an added-value database. This re-annotation project is being performed exhaustively on every member of each gene family. Performing a family-wide annotation makes the task easier and more efficient than a gene-by-gene approach since many features obtained for one gene can be extrapolated to some or all the other genes of a family. A complete annotation procedure based on the most efficient prediction tools available is being used by 16 partner laboratories, each contributing annotated families from its field of expertise. A database, named GeneFarm, and an associated user-friendly interface to query the annotations have been developed. More than 3000 genes distributed over 300 families have been annotated and are available at Furthermore, collaboration with the Swiss Institute of Bioinformatics is underway to integrate the GeneFarm data into the protein knowledgebase Swiss-Prot.
  77. Sterck, L., Rombauts, S., Jansson, S., Sterky, F., Rouzé, P., & Van de Peer, Y. (2005). EST data suggest that poplar is an ancient polyploid. NEW PHYTOLOGIST, 167(1), 165–170.
    We analysed the publicly available expressed sequence tag (EST) collections for the genus Populus to examine whether evidence can be found for large-scale gene-duplication events in the evolutionary past of this genus. The ESTs were clustered into unigenes for each poplar species examined. Gene families were constructed for all proteins deduced from these unigenes, and K-S dating was performed on all paralogs within a gene family. The fraction of paralogs was then plotted against the K-S values, which resulted in a distribution reflecting the age of duplicated genes in poplar. Sufficient EST data were available for seven different poplar species spanning four of the six sections of the genus Populus. For all these species, there was evidence that a large-scale gene-duplication event had occurred. From our analysis it is clear that all poplar species have shared the same large-scale gene-duplication event, suggesting that this event must have occurred in the ancestor of poplar, or at least very early in the evolution of the Populus genus.
  78. Lescot, M., Rombauts, S., Zhang, J., Aubourg, S., Mathé, C., Jansson, S., … Boerjan, W. (2004). Annotation of a 95-kb Populus deltoides genomic sequence reveals a disease resistance gene cluster and novel class I and class II transposable elements. THEORETICAL AND APPLIED GENETICS, 109(1), 10–22.
    Poplar has become a model system for functional genomics in woody plants. Here, we report the sequencing and annotation of the first large contiguous stretch of genomic sequence (95 kb) of poplar, corresponding to a bacterial artificial chromosome clone mapped 0.6 centiMorgan from the Melampsora larici-populina resistance locus. The annotation revealed 15 putative genetic objects, of which five were classified as hypothetical genes that were similar only with expressed sequence tags from poplar. Ten putative objects showed similarity with known genes, of which one was similar to a kinase. Three other objects corresponded to the toll/interleukin-1 receptor/nucleotide-binding site/leucine-rich repeat class of plant disease resistance genes, of which two were predicted to encode an amino terminal nuclear localization signal. Four objects were homologous to the Ty1/copia family of class I transposable elements, one of which was designated Retropop and interrupted one of the disease resistance genes. Two other objects constituted a novel Spm-like class II transposable element, which we designated Magali.
  79. Ral, J.-P., Derelle, E., Ferraz, C., Wattebled, F., Farinas, B., Corellou, F., … Ball, S. (2004). Starch division and partitioning: a mechanism for granule propagation and maintenance in the picophytoplanktonic green alga Ostreococcus tauri. PLANT PHYSIOLOGY, 136(2), 3333–3340.
    Whereas Glc is stored in small-sized hydrosoluble glycogen particles in archaea, eubacteria, fungi, and animal cells, photosynthetic eukaryotes have resorted to building starch, which is composed of several distinct polysaccharide fractions packed into a highly organized semicrystalline granule. In plants, both the initiation of polysaccharide synthesis and the nucleation mechanism leading to formation of new starch granules are currently not understood. Ostreococcus tauri, a unicellular green alga of the Prasinophyceae family, defines the tiniest eukaryote with one of the smallest genomes. We show that it accumulates a single starch granule at the chloroplast center by using the same pathway as higher plants. At the time of plastid division, we observe elongation of the starch and division into two daughter structures that are partitioned in each newly formed chloroplast. These observations suggest that in this system the information required to initiate crystalline polysaccharide growth of a new granule is contained within the preexisting polysaccharide structure and the design of the plastid division machinery.
  80. Vandenabeele, S., Vanderauwera, S., Vuylsteke, M., Rombauts, S., Langebartels, C., Seidlitz, H. K., … Van Breusegem, F. (2004). Catalase deficiency drastically affects gene expression induced by high light in Arabidopsis thaliana. PLANT JOURNAL, 39(1), 45–58.
    In plants, hydrogen peroxide (H2O2) plays a major signaling role in triggering both a defense response and cell death. Increased cellular H2O2 levels and subsequent redox imbalances are managed at the production and scavenging levels. Because catalases are the major H2O2 scavengers that remove the bulk of cellular H2O2, altering their levels allows in planta modulation of H2O2 concentrations. Reduced peroxisomal catalase activity increased sensitivity toward both ozone and photorespiratory H2O2-induced cell death in transgenic catalase-deficient Arabidopsis thaliana. These plants were used as a model system to build a comprehensive inventory of transcriptomic variations, which were triggered by photorespiratory H2O2 induced by high-light (HL) irradiance. In addition to an H2O2-dependent and -independent type of transcriptional response during light stress, microarray analysis on both control and transgenic catalase-deficient plants, exposed to 0, 3, 8, and 23 h of HL, revealed several specific regulatory patterns of gene expression. Thus, photorespiratory H2O2 has a direct impact on transcriptional programs in plants.
  81. Rombauts, S., Florquin, K., Lescot, M., Marchal, K., Rouzé, P., & Van de Peer, Y. (2003). Computational approaches to identify promoters and cis-regulatory elements in plant genomes. PLANT PHYSIOLOGY, 132(3), 1162–1176.
    The identification of promoters and their regulatory elements is one of the major challenges in bioinformatics and integrates comparative, structural, and functional genomics. Many different approaches have been developed to detect conserved motifs in a set of genes that are either coregulated or orthologous. However, although recent approaches seem promising, in general, unambiguous identification of regulatory elements is not straightforward. The delineation of promoters is even harder, due to its complex nature, and in silico promoter prediction is still in its infancy. Here, we review the different approaches that have been developed for identifying promoters and their regulatory elements. We discuss the detection of cis-acting regulatory elements using word-counting or probabilistic methods (so-called "search by signal" methods) and the delineation of promoters by considering both sequence content and structural features ("search by content" methods). As an example of search by content, we explored in greater detail the association of promoters with CpG islands. However, due to differences in sequence content, the parameters used to detect CpG islands in humans and other vertebrates cannot be used for plants. Therefore, a preliminary attempt was made to define parameters that could possibly define CpG and CpNpG islands in Arabidopsis, by exploring the compositional landscape around the transcriptional start site. To this end, a data set of more than 5,000 gene sequences was built, including the promoter region, the 5'-untranslated region, and the first introns and coding exons. Preliminary analysis shows that promoter location based on the detection of potential CpG/CpNpG islands in the Arabidopsis genome is not straightforward. Nevertheless, because the landscape of CpG/ CpNpG islands differs considerably between promoters and introns on the one side and exons (whether coding or not) on the other, more sophisticated approaches can probably be developed for the successful detection of "putative" CpG and CpNpG islands in plants.
  82. Rombauts, S., Van de Peer, Y., & Rouzé, P. (2003). AFLPinSilico, simulating AFLP fingerprints. BIOINFORMATICS, 19(6), 776–777.
    A drawback of the Amplified Fragment Length Polymorphism (AFLP) fingerprinting method is the difficulty to correlate the different fragments with their DNA sequence. The AFLPinSilico application presented here simulates AFLP experiments run on either cDNA or genomic sequences, producing virtual fingerprints that allow high throughput identification of AFLP fragments. The program also enables biologists to manage experiments through simulations done beforehand, thereby reducing the number of experiments that have to be run. AFLPinSilico is available through the www or as a stand-alone version, through a command line executable (available upon request, for any platform running PERL).
  83. De Bodt, S., Raes, J., Florquin, K., Rombauts, S., Rouzé, P., Theißen, G., & Van de Peer, Y. (2003). Genomewide structural annotation and evolutionary analysis of the type I MADS-box genes in plants. JOURNAL OF MOLECULAR EVOLUTION, 56(5), 573–586.
    The type I MADS-box genes constitute a largely unexplored subfamily of the extensively studied MADS-box gene family, well known for its role in flower development. Genes of the type I MADS-box subfamily possess the characteristic MADS box but are distinguished from type II MADS-box genes by the absence of the keratin-like box. In this in silico study, we have structurally annotated all 47 members of the type I MADS-box gene family in Arabidopsis thaliana and exerted a thorough analysis of the C-terminal regions of the translated proteins. On the basis of conserved motifs in the C-terminal region, we could classify the gene family into three main groups, two of which could be further subdivided. Phylogenetic trees were inferred to study the evolutionary relationships within this large MADS-box gene subfamily. These suggest for plant type I genes a dynamic of evolution that is significantly different from the mode of both animal type I (SRF) and plant type II (MIKC-type) gene phylogeny. The presence of conserved motifs in the majority of these genes, the identification of Oryza sativa MADS-box type I homologues, and the detection of expressed sequence tags for Arabidopsis thaliana and other plant type I genes suggest that these genes are indeed of functional importance to plants. It is therefore even more intriguing that, from an experimental point of view, almost nothing is known about the function of these MADS-box type I genes.
  84. Breyne, P., Dreesen, R., Cannoot, B., Rombaut, D., Vandepoele, K., Rombauts, S., … Zabeau, M. (2003). Quantitative cDNA-AFLP analysis for genome-wide expression studies. MOLECULAR GENETICS AND GENOMICS, 269(2), 173–179.
    An improved cDNA-AFLP method for genome-wide expression analysis has been developed. We demonstrate that this method is an efficient tool for quantitative transcript profiling and a valid alternative to microarrays. Unique transcript tags, generated from reverse-transcribed messenger RNA by restriction enzymes, were screened through a series of selective PCR amplifications. Based on in silico analysis, an enzyme combination was chosen that ensures that at least 60% of all the mRNAs were represented by an informative sequence tag. The sensitivity and specificity of the method allows one to detect poorly expressed genes and distinguish between homologous sequences. Accurate gene expression profiles were determined by quantitative analysis of band intensities, and subtle differences in transcriptional activity were revealed. A detailed screen for cell cycle-modulated genes in tobacco demonstrates the usefulness of the technology for genome-wide expression analysis.
  85. Vlieghe, K., Vuylsteke, M., Florquin, K., Rombauts, S., Maes, S., Ormenese, S., … De Veylder, L. (2003). Microarray analysis of E2Fa-DPa-overexpressing plants uncovers a cross-talking genetic network between DNA replication and nitrogen assimilation. JOURNAL OF CELL SCIENCE, 116(20), 4249–4259.
    Previously we have shown that overexpression of the heterodimeric E2Fa-DPa transcription factor in Arabidopsis thaliana results in ectopic cell division, increased endoreduplication, and an early arrest in development. To gain a better insight into the phenotypic behavior of E2Fa-DPa transgenic plants and to identify E2Fa-DPa target genes, a transcriptomic microarray analysis was performed. Out of 4,390 unique genes, a total of 188 had a twofold or more up- (84) or down-regulated (104) expression level in E2Fa-DPa transgenic plants compared to wild-type lines. Detailed promoter analysis allowed the identification of novel E2Fa-DPa target genes, mainly involved in DNA replication. Secondarily induced genes encoded proteins involved in cell wall biosynthesis, transcription and signal transduction or had an unknown function. A large number of metabolic genes were modified as well, among which, surprisingly, many genes were involved in nitrate assimilation. Our data suggest that the growth arrest observed upon E2Fa-DPa overexpression results at least partly from a nitrogen drain to the nucleotide synthesis pathway, causing decreased synthesis of other nitrogen compounds, such as amino acids and storage proteins.
  86. Vandepoele, K., Raes, J., De Veylder, L., Rouzé, P., Rombauts, S., & Inzé, D. (2002). Genome-wide analysis of core cell cycle genes in Arabidopsis. PLANT CELL, 14(4), 903–916.
    Cyclin-dependent kinases and cyclins regulate with the help of different interacting proteins the progression through the eukaryotic cell cycle. A high-quality, homology-based annotation protocol was applied to determine the core cell cycle genes in the recently completed Arabidopsis genome sequence. In total, 61 genes were identified belonging to seven selected families of cell cycle regulators, for which 30 are new or corrections of the existing annotation. A new class of putative cell cycle regulators was found that probably are competitors of E2F/DP transcription factors, which mediate the G1-to-S progression. In addition, the existing nomenclature for cell cycle genes of Arabidopsis was updated, and the physical positions of all genes were compared with segmentally duplicated blocks in the genome, showing that 22 core cell cycle genes emerged through block duplications. This genome-wide analysis illustrates the complexity of the plant cell cycle machinery and provides a tool for elucidating the function of new family members in the future.
  87. Moreau, Y., Thijs, G., Marchal, K., De Smet, F., Mathys, J., Lescot, M., … De Moor, B. (2002). Integrating quality-based clustering of microarray data with Gibbs sampling for the discovery of regulatory motifs. JOBIM 2002 : Journées Ouvertes Biologie, Informatique, Mathématique. Presented at the Journées Ouvertes Biologie, Informatique, Mathématique 2002 (JOBIM 2002), St Malo, France.
  88. Lescot, M., Thijs, G., Rombauts, S., Déhais, P., Martin, D., Thieffry, D., … van Helden, J. (2002). Deciphering cis-acting regulatory elements in plant and drosophila promoter sequences. In J. Nicolas & C. Thermes (Eds.), JOBIM 2002 : journées ouvertes biologie, informatique, mathématique (pp. 349–350). Rocquencourt, France: INRIA.
  89. Rombauts, S., Lescot, M., Thijs, G., Marchal, K., Moreau, Y., Déhais, P., … Rouzé, P. (2002). The PlantCARE database and tools for in silico search of plant cis-acting regulatory elements. JOBIM 2002 : Journées Ouvertes Biologie, Informatique, Mathématique, 183–184.
  90. Breyne, P., Dreesen, R., Vandepoele, K., De Veylder, L., Van Breusegem, F., Callewaert, L., … Zabeau, M. (2002). Transcriptome analysis during cell division in plants. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 99(23), 14825–14830.
    Using synchronized tobacco Bright Yellow-2 cells and cDNA-amplified fragment length polymorphism-based genomewide expression analysis, we built a comprehensive collection of plant cell cycle-modulated genes. Approximately 1,340 periodically expressed genes were identified, including known cell cycle control genes as well as numerous unique candidate regulatory genes. A number of plant-specific genes were found to be cell cycle modulated. Other transcript tags were derived from unknown plant genes showing homology to cell cycle-regulatory genes of other organisms. Many of the genes encode novel or uncharacterized proteins, indicating that several processes underlying cell division are still largely unknown.
  91. Rensing, S. A., Rombauts, S., Van de Peer, Y., & Reski, R. (2002). Moss transcriptome and beyond.
    The ancient land plant Physcomitrella patens is a model system that is becoming increasingly important for plant functional genomics because gene knockouts can be produced with relative ease. Recently, several EST-sequencing projects have been launched as a first step towards a thorough functional characterization of the moss. However, for careful comparison with other plant model systems, the complete genomic sequence is needed as well as the transcriptome.
  92. Thijs, G., Marchal, K., Lescot, M., Rombauts, S., De Moor, B., Rouzé, P., & Moreau, Y. (2002). A Gibbs sampling method to detect overrepresented motifs in the upstream regions of coexpressed genes. JOURNAL OF COMPUTATIONAL BIOLOGY, 9(2), 447–464.
    Microarray experiments can reveal important information about transcriptional regulation. In our case, we look for potential promoter regulatory elements in the upstream region of coexpressed genes. Here we present two modifications of the original Gibbs sampling algorithm for motif finding (Lawrence et al., 1993). First, we introduce the use of a probability distribution to estimate the number of copies of the motif in a sequence. Second, we describe the technical aspects of the incorporation of a higher-order background model whose application we discussed in Thijs et al. (2001). Our implementation is referred to as the Motif Sampler. We successfully validate our algorithm on several data sets. First, we show results for three sets of upstream sequences containing known motifs: 1) the G-box light-response element in plants, 2) elements involved in methionine response in Saccharomyces cerevisiae, and 3) the FNR O-2-responsive element in bacteria. We use these data sets to explain the influence of the parameters on the performance of our algorithm. Second, we show results for upstream sequences from four clusters of coexpressed genes identified in a microarray experiment on wounding in Arabidopsis thaliana. Several motifs could be matched to regulatory elements from plant defence pathways in our database of plant cis-acting regulatory elements (PlantCARE). Some other strong motifs do not have corresponding motifs in PlantCARE but are promising candidates for further analysis.
  93. Thijs, G., Moreau, Y., De Smet, F., Mathys, J., Lescot, M., Rombauts, S., … Marchal, K. (2002). INCLUSive: INtegrated Clustering, Upstream of sequence retrieval and motif Sampling. BIOINFORMATICS, 18(2), 331–332.
    INCLUSive allows automatic multistep analysis of microarray data (clustering and motif finding). The clustering algorithm (adaptive quality-based clustering) groups together genes with highly similar expression profiles. The upstream sequences of the genes belonging to a cluster are automatically retrieved from GenBank and can be fed directly into Motif Sampler, a Gibbs sampling algorithm that retrieves statistically over-represented motifs in sets of sequences, in this case upstream regions of co-expressed genes.
  94. Lescot, M., Déhais, P., Thijs, G., Marchal, K., Moreau, Y., Van de Peer, Y., … Rombauts, S. (2002). PlantCARE, a database of plant cis-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences. NUCLEIC ACIDS RESEARCH, 30(1), 325–327.
    PlantCARE is a database of plant cis-acting regulatory elements, enhancers and repressors. Regulatory elements are represented by positional matrices, consensus sequences and individual sites on particular promoter sequences. Links to the EMBL, TRANSFAC and MEDLINE databases are provided when available. Data about the transcription sites are extracted mainly from the literature, supplemented with an increasing number of in silico predicted data. Apart from a general description for specific transcription factor sites, levels of confidence for the experimental evidence, functional information and the position on the promoter are given as well. New features have been implemented to search for plant cis-acting regulatory elements in a query sequence. Furthermore, links are now provided to a new clustering and motif search method to investigate clusters of co-expressed genes. New regulatory elements can be sent automatically and will be added to the database after curation.
  95. Thijs, G., Lescot, M., Marchal, K., Rombauts, S., De Moor, B., Rouzé, P., & Moreau, Y. (2001). A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling. BIOINFORMATICS, 17(12), 1113–1122.
    Motivation: Transcriptome analysis allows detection and clustering of genes that are coexpressed under various biological circumstances. Under the assumption that coregulated genes share cis-acting regulatory elements, it is important to investigate the upstream sequences controlling the transcription of these genes. To improve the robustness of the Gibbs sampling algorithm to noisy data sets we propose an extension of this algorithm for motif finding with a higher-order background model. Results: Simulated data and real biological data sets with well-described regulatory elements are used to test the influence of the different background models on the performance of the motif detection algorithm. We show that the use of a higher-order model considerably enhances the performance of our motif finding algorithm in the presence of noisy data. For Arabidopsis thaliana, a reliable background model based on a set of carefully selected intergenic sequences was constructed.
  96. Boudolf, V., Rombauts, S., Naudts, M., Inzé, D., & De Veylder, L. (2001). Identification of novel cyclin-dependent kinases interacting with the CKS1 protein of Arabidopsis. JOURNAL OF EXPERIMENTAL BOTANY, 52(359), 1381–1382.
    The SUC1/CKS1 proteins interact with cyclin-dependent kinases (CDKs) and play an essential, but yet not entirely resolved, role in the regulation of the cell cycle. With the Arabidopsis thaliana CKS1At: protein as bait in a two-hybrid screen. two novel Arabidopsis CDKs, Arath;CDKB1;2 and Arath;CDKB2;1, were isolated. A closely related homologue of Arath;CDKB2;1 was discovered in the databases and was nominated Arath;CDKB2;2. Transcript analysis of the five known Arath;CDKA and Arath;CDKB genes revealed that they all had the highest expression in flowers and cell suspensions. Differences in the expression patterns in roots, leaves and stems suggest unique roles for each CDK.
  97. Lescot, M., Rombauts, S., Thijs, G., Marchal, K., De Moor, B., Moreau, Y., & Rouzé, P. (2001). In silico search of plant cis-acting regulatory elements. In L. Duret, C. Gaspin, & T. Schiex (Eds.), JOBIM 2001 : journées ouvertes biologie, informatique, mathématique (pp. 227–228). Toulouse, France: Institut National de la Recherche Agronomique (INRA).
  98. Magyar, Z., Atanassova, A., De Veylder, L., Rombauts, S., & Inzé, D. (2000). Characterization of two distinct DP-related genes from Arabidopsis thaliana. FEBS LETTERS, 486(1), 79–87.
  99. Mathé, C., Déhais, P., Pavy, N., Rombauts, S., Van Montagu, M., & Rouzé, P. (2000). Gene prediction and gene classes in Arabidopsis thaliana. JOURNAL OF BIOTECHNOLOGY, 78(3), 293–299.
  100. Thijs, G., Rombauts, S., Lescot, M., Marchal, K., De Moor, B., Moreau, Y., & Rouzé, P. (2000). Detection of cis-acting regulatory elements in plants : a GIBBS sampling approach. Proceedings of the Second International Conference on Bioinformatics of Genome Regulation and Structure, 1, 118–121. Novosibirsk, Russia: Institute of Cytology and Genetics (ICG).
  101. Thijs, G., Moreau, Y., Rombauts, S., De Moor, B., & Rouzé, P. (1999). Recognition of gene regulatory sequences by bagging of neural networks. IEE Conference Publications, 470, 988–993. Edison, NJ, USA: Institute of Electrical Engineers INSPEC.
    We use an ensemble of multilayer perceptrons to build a model for a type of gene regulatory sequence called a G-box. A variant of the bagging method (bootstrap-and-aggregate) improves the performance of the ensemble over that of a single network. Through a decomposition of the generalization error of the ensemble into bias and variance components, we estimate this error from the hold-out samples of the individual networks. We test the model on putative G-boxes, on sequences upstream of light-regulated genes, and on a control group and demonstrate that the model separates these groups efficiently.
  102. Pavy, N., Mathé, C., Rombauts, S., & Rouzé, P. (1999). Génomique et bio-informatique. OCL-OLEAGINEUX CORPS GRAS LIPIDES, 6(2), 148–154.
    Genomics projects produce huge amounts of data of different kinds whose interpretation stimulated the development of bioinformatics, a recent discipline based on theoretical aspects of informatics and mathematics, as well as on biology. Bioinformatics enables the staring and organizing of genome-wide molecular data, provides tools to analyze them and to convert raw data into biological knowledge. We illustrate how the combination of data management and of sequence analysis tools has already brought fruitful perspectives for gene discovery.
  103. Rombauts, S., Déhais, P., Van Montagu, M., & Rouzé, P. (1999). PlantCARE, a plant cis-acting regulatory element database. NUCLEIC ACIDS RESEARCH, 27(1), 295–296.
    PlantCARE is a database of plant cis-acting regulatory elements, enhancers and repressors. Besides the transcription motifs found on a sequence, it also offers a link to the EMBL entry that contains the full gene sequence as well as a description of the conditions in which a motif becomes functional. The information on these sites is given by matrices, consensus and individual site sequences on particular genes, depending on the available information.
  104. Terryn, N., Heijnen, L., De Keyser, A., Van Asseldonck, M., De Clercq, R., Verbakel, H., … Vos, P. (1999). Evidence for an ancient chromosomal duplication in Arabidopsis thaliana by sequencing and analyzing a 400-kb contig at the APETALA2 locus on chromosome 4. FEBS LETTERS, 445(2–3), 237–245.
  105. Pavy, N., Rombauts, S., Déhais, P., Mathé, C., Ramana, D. V., Leroy, P., & Rouzé, P. (1999). Evaluation of gene prediction software using a genomic data set: application to Arabidopsis thaliana sequences. BIOINFORMATICS, 15(11), 887–899.
    Motivation: The annotation of the Arabidopsis thaliana genome remains a problem in terms of time and quality. To improve the annotation process, we want to choose the most appropriate tools to use inside a computer-assisted annotation platform. We therefore need evaluation of prediction programs with Arabidopsis sequences containing multiple genes. Results: We have developed AraSet, a data set of contigs of validated genes, enabling the evaluation of multi-gene models for the Arabidopsis genome. Besides conventional metrics to evaluate gene prediction at the site and the exon levels, new measures were introduced for the prediction at the protein sequence level as well as for the evaluation of gene models. This evaluation method is of general interest and could apply to any new gene prediction software and to any eukaryotic genome. The GeneMark.hmm program appears to be the most accurate software at all three level's for the Arabidopsis genomic sequences. Gene modeling could be further improved by combination of prediction software.
  106. Rouzé, P., Pavy, N., & Rombauts, S. (1999). Genome annotation: which tools do we have for it? CURRENT OPINION IN PLANT BIOLOGY, 2(2), 90–95.
    Genome data have to be converted into knowledge to be useful to biologists. Many valuable computational tools have already been developed to help annotation of plant genome sequences, and these may be improved further, for example by identification of more gene regulatory elements. The lack of a standard computer-assisted annotation platform for eukaryotic genomes remains a major bottle-neck.
  107. Rouzé, P., Rombauts, S., Van Laere, G., Van Wiemeersch, L., & Van Montagu, M. (1996). Gene prediction in Arabidopsis thaliana: genomic sequences. ARCHIVES OF PHYSIOLOGY AND BIOCHEMISTRY, 104(3), B50–B50.