Zhen Li

Zhen Li — Assistant Professor
Joined the group in 2012

As a bioinformatician with a great passion for evolution, I would like to understand the mechanisms of evolution as well as apply the knowledge to solve practical issues when applicable. My passion for evolution has eventually landed on two primary research interests. The first is to discover how different plants have evolved and contributed to the extraordinary diversity of life on Earth by exploring novel plant genomes. I have joined several plant genome projects with different functional roles since my master's study. Such experience helped me develop various skills in annotating genomes and analyzing sequencing data. Also, these international genome projects urged me of the importance of communication in modern science and trained my scientific communication skills.

My other research interest is about polyploidy. Surprisingly, genomes of many, if not all, flowering plants illustrate the history of ancient polyploidization in their common ancestors. Therefore, understanding the effects and processes of polyploidization is essential to unveil its impacts on creating evolutionary novelties and increasing species adaptation. In a previous work, we used comparative genomics and phylogenomics to study the different fates of duplicated genes after genome duplication by integrating large omics datasets. We identified genes that are tolerant or intolerant to gene and genome duplications in flowering plants by comparing 37 angiosperm genomes. To further understand the underlying molecular mechanisms following polyploidization, natural and synthesized polyploids contain valuable information about both gene expression and natural selection on duplicated genes. Therefore, integrating the knowledge of gene duplicability and data from natural and synthesized polyploids is an innovative approach to investigate gene duplicate retention following polyploidization systematically.

February 2022 - present: Assistant Professor, Department of Plant Biotechnology and Bioinformatics, Ghent University, Belgium
May 2018 - present: Postdoctoral researcher, Bioinformatics & Evolutionary Genomics, VIB - UGent Center for Plant Systems Biology, Belgium.
September 2012 - April 2018: PhD student, Bioinformatics & Evolutionary Genomics, Department of Plant Systems Biology, VIB, Ghent, Belgium.
September 2008 - June 2011: Master of Science in Bioinformatics, Laboratory of Computational Molecular Biology, College of Life Sciences, Beijing Normal University
September 2004 - June 2008: Bachelor of Science in Biology, College of Life Sciences, Beijing Normal University


  1. Ma, L., Liu, K.-W., Li, Z., Hsiao, Y.-Y., Qi, Y., Fu, T., … Liu, Z.-J. (2023). Diploid and tetraploid genomes of Acorus and the evolution of monocots. NATURE COMMUNICATIONS, 14(1). https://doi.org/10.1038/s41467-023-38829-3
    Monocots are a major taxon within flowering plants, have unique morphological traits, and show an extraordinary diversity in lifestyle. To improve our understanding of monocot origin and evolution, we generate chromosome-level reference genomes of the diploid Acorus gramineus and the tetraploid Ac. calamus, the only two accepted species from the family Acoraceae, which form a sister lineage to all other monocots. Comparing the genomes of Ac. gramineus and Ac. calamus, we suggest that Ac. gramineus is not a potential diploid progenitor of Ac. calamus, and Ac. calamus is an allotetraploid with two subgenomes A, and B, presenting asymmetric evolution and B subgenome dominance. Both the diploid genome of Ac. gramineus and the subgenomes A and B of Ac. calamus show clear evidence of whole-genome duplication (WGD), but Acoraceae does not seem to share an older WGD that is shared by most other monocots. We reconstruct an ancestral monocot karyotype and gene toolkit, and discuss scenarios that explain the complex history of the Acorus genome. Our analyses show that the ancestors of monocots exhibit mosaic genomic features, likely important for that appeared in early monocot evolution, providing fundamental insights into the origin, evolution, and diversification of monocots. Acorales is sister to all other monocots and contains only one family with just one genus, Acorus. Here, the authors assemble the genome of the diploid Ac. gramineus and the tetraploid Ac. calamus, reconstruct an ancestral monocot karyotype and gene toolkit, and discuss the origin and evolution of the two species and other monocots.
  2. Chen, H., Fang, Y., Zwaenepoel, A., Huang, S., Van de Peer, Y., & Li, Z. (2023). Revisiting ancient polyploidy in leptosporangiate ferns. NEW PHYTOLOGIST, 237(4), 1405–1417. https://doi.org/10.1111/nph.18607
    Ferns, and particularly homosporous ferns, have long been assumed to have experienced recurrent whole-genome duplication (WGD) events because of their substantially large genome sizes, surprisingly high chromosome numbers, and high degrees of polyploidy among many extant members. As the number of sequenced fern genomes is limited, recent studies have employed transcriptome data to find evidence for WGDs in ferns. However, they have reached conflicting results concerning the occurrence of ancient polyploidy, for instance, in the lineage of leptosporangiate ferns. Because identifying WGDs in a phylogenetic context is the foremost step in studying the contribution of ancient polyploidy to evolution, we here revisited earlier identified WGDs in leptosporangiate ferns, mainly the core leptosporangiate ferns, by building KS-age distributions and applying substitution rate corrections, and by conducting statistical gene tree – species tree reconciliation analyses. Our integrative analyses confidently identified four ancient WGDs in the sampled core leptosporangiate ferns but also identified false positives and false negatives for WGDs that recent studies have reported earlier. In conclusion, we underscore the significance of substitution rate corrections and uncertainties in gene tree – species tree reconciliations in calling WGD events and advance an exemplar workflow to overcome such often-overlooked issues.
  3. Xue, J., Li, Z., Hu, S., Kao, S.-M., Zhao, T., Wang, J., … Van de Peer, Y. (2023). The Saururus chinensis genome provides insights into the evolution of pollination strategies and herbaceousness in magnoliids. PLANT JOURNAL, 113, 1021–1034. https://doi.org/10.1111/tpj.16097
    Saururus chinensis, an herbaceous magnoliid without perianth, represents a clade of early-diverging angiosperms that have gone through woodiness-herbaceousness transition and pollination obstacles: the characteristic white leaves underneath inflorescence during flowering time are considered to be a substitute for perianth to attract insect pollinators. Here, using the newly sequenced S. chinensis genome, we revisited the phylogenetic position of magnoliids within mesangiosperms, and recovered a sister relationship for magnoliids and Chloranthales. By considering differentially expressed genes, we identified candidate genes that are involved in the morphogenesis of the white leaves in S. chinensis. Among those genes, we verified - in a transgenic experiment with Arabidopsis - that increasing the expression of the 'pseudo-etiolation in light' gene (ScPEL) can inhibit the biosynthesis of chlorophyll. ScPEL is thus likely being responsible for the switches between green and white leaves, suggesting that changes in gene expression may underlie the evolution of pollination strategies. Despite being an herbaceous plant, S. chinensis still has vascular cambium and maintains the potential for secondary growth as a woody plant, because the necessary machinery, i.e., the entire gene set involved in lignin biosynthesis, is well preserved. However, similar expression levels of two key genes (CCR and CAD) between the stem and other tissues in the lignin biosynthesis pathway are possibly associated with the herbaceous nature of S. chinensis. In conclusion, the S. chinensis genome provides valuable insights into the adaptive evolution of pollination in Saururaceae and reveals a possible mechanism for the evolution of herbaceousness in magnoliids.
  4. Chang, J., Duong, T. A., Schoeman, C., Ma, X., Roodt, D., Barker, N., … Mizrachi, E. (2023). The genome of the king protea, Protea cynaroides. PLANT JOURNAL, 113(2), 262–276. https://doi.org/10.1111/tpj.16044
    The king protea (Protea cynaroides), an early-diverging eudicot, is the most iconic species from the Megadiverse Cape Floristic Region, and the national flower of South Africa. Perhaps best known for its iconic flower head, Protea is a key genus for the South African horticulture industry and cut-flower market. Ecologically, the genus and the family Proteaceae are important models for radiation and adaptation, particularly to soils with limited phosphorus bio-availability. Here, we present a high-quality chromosome-scale assembly of the P. cynaroides genome as the first representative of the Fynbos biome. We reveal an ancestral Whole-Genome Duplication (WGD) event that occurred in the Proteaceae around the late Cretaceous that preceded the divergence of all crown groups within the family and its extant diversity in all Southern continents. The relatively stable genome structure of P. cynaroides is invaluable for comparative studies and for unveiling paleopolyploidy in other groups, such as the distantly related sister group Ranunculales. Comparative genomics in sequenced genomes of the Proteales shows loss of key arbuscular mycorrhizal symbiosis genes likely ancestral to the Family, and possibly the Order. The P. cynaroides genome empowers new research in plant diversification, horticulture, and adaptation, particularly to nutrient-poor soils.
  5. Hao, F., Liu, X., Zhou, B., Tian, Z., Zhou, L., Zong, H., … Cai, J. (2023). Chromosome-level genomes of three key Allium crops and their trait evolution. NATURE GENETICS, 55(11), 1976–1986. https://doi.org/10.1038/s41588-023-01546-0
    Allium crop breeding remains severely hindered due to the lack of high-quality reference genomes. Here we report high-quality chromosome-level genome assemblies for three key Allium crops (Welsh onion, garlic and onion), which are 11.17 Gb, 15.52 Gb and 15.78 Gb in size with the highest recorded contig N50 of 507.27 Mb, 109.82 Mb and 81.66 Mb, respectively. Beyond revealing the genome evolutionary process of Allium species, our pathogen infection experiments and comparative metabolomic and genomic analyses showed that genes encoding enzymes involved in the metabolic pathway of Allium-specific flavor compounds may have evolved from an ancient uncharacterized plant defense system widely existing in many plant lineages but extensively boosted in alliums. Using in situ hybridization and spatial RNA sequencing, we obtained an overview of cell-type categorization and gene expression changes associated with spongy mesophyll cell expansion during onion bulb formation, thus indicating the functional roles of bulb formation genes.
  6. Li, J., Van de Peer, Y., & Li, Z. (2023). Inference of ancient polyploidy using transcriptome data. In Y. Van de Peer (Ed.), Polyploidy : methods and protocols (Vol. 2545, pp. 47–76). https://doi.org/10.1007/978-1-0716-2561-3_3
    Polyploidizations, or whole-genome duplications (WGDs), in plants have increased biological complexity, facilitated evolutionary innovation, and likely enabled adaptation under harsh conditions. Besides genomic data, transcriptome data have been widely employed to detect WGDs, due to their efficient accessibility to the gene space of a species. Age distributions based on synonymous substitutions (so-called KS age distributions) for paralogs assembled from transcriptome data have identified numerous WGDs in plants, paving the way for further studies on the importance of WGDs for the evolution of seed and flowering plants. However, it is still unclear how transcriptome-based age distributions compare to those based on genomic data. In this chapter, we implemented three different de novo transcriptome assembly pipelines with two popular assemblers, i.e., Trinity and SOAPdenovo-Trans. We selected six plant species with published genomes and transcriptomes to evaluate how assembled transcripts from different pipelines perform when using KS distributions to detect previously documented WGDs in the six species. Further, using genes predicted in each genome as references, we evaluated the effects of missing genes, gene family clustering, and de novo assembled transcripts on the transcriptome-based KS distributions. Our results show that, although the transcriptome-based KS distributions differ from the genome-based ones with respect to their shapes and scales, they are still reasonably reliable for unveiling WGDs, except in species where most duplicates originated from a recent WGD. We also discuss how to overcome some possible pitfalls when using transcriptome data to identify WGDs.
  7. Qiu, Y., Li, Z., Walther, D., & Köhler, C. (2023). Updated phylogeny and protein structure predictions revise the hypothesis on the origin of MADS-box transcription factors in land plants. MOLECULAR BIOLOGY AND EVOLUTION, 40(9). https://doi.org/10.1093/molbev/msad194
    MADS-box transcription factors (TFs), among the first TFs extensively studied, exhibit a wide distribution across eukaryotes and play diverse functional roles. Varying by domain architecture, MADS-box TFs in land plants are categorized into Type I (M-type) and Type II (MIKC-type). Type I and II genes have been considered orthologous to the SRF and MEF2 genes in animals, respectively, presumably originating from a duplication before the divergence of eukaryotes. Here, we exploited the increasing availability of eukaryotic MADS-box sequences and reassessed their evolution. While supporting the ancient duplication giving rise to SRF- and MEF2-types, we found that Type I and II genes originated from the MEF2-type genes through another duplication in the most recent common ancestor (MRCA) of land plants. Protein structures predicted by AlphaFold2 and OmegaFold support our phylogenetic analyses, with plant Type I and II TFs resembling the MEF2-type structure, rather than SRFs. We hypothesize that the ancestral SRF-type TFs were lost in the MRCA of Archaeplastida (the kingdom Plantae sensu lato). The retained MEF2-type TFs acquired a Keratin-like domain and became MIKC-type before the divergence of Streptophyta. Subsequently in the MRCA of land plants, M-type TFs evolved from a duplicated MIKC-type precursor through loss of the Keratin-like domain, leading to the Type I clade. Both Type I and II TFs expanded and functionally differentiated in concert with the increasing complexity of land plant body architecture. The recruitment of these originally stress-responsive TFs into developmental programs, including those underlying reproduction, may have facilitated the adaptation to the terrestrial environment.
  8. Qi, W., Chen, J., Han, Y., Li, Z., Su, X., & Yeo, F. K. S. (2023). Editorial : omics-driven crop improvement for stress tolerance. https://doi.org/10.3389/fpls.2023.1172228
  9. Xu, Z., Li, Z., Ren, F., Gao, R., Wang, Z., Zhang, J., … Song, J. (2022). The genome of Corydalis reveals the evolution of benzylisoquinoline alkaloid biosynthesis in Ranunculales. PLANT JOURNAL, 111(1), 217–230. https://doi.org/10.1111/tpj.15788
    Species belonging to the order Ranunculales have attracted much attention because of their phylogenetic position as a sister group to all other eudicot lineages and their ability to produce unique yet diverse benzylisoquinoline alkaloids (BIAs). The Papaveraceae family in Ranunculales is often used as a model system for studying BIA biosynthesis. Here, we report the chromosome-level genome assembly of Corydalis tomentella, a species of Fumarioideae-one of the two subfamilies of Papaveraceae. Based on the comparisons of sequenced Ranunculalean species, we present clear evidence of a shared whole-genome duplication (WGD) event that has occurred before the divergence of Ranunculales but after its divergence from other eudicot lineages. The C. tomentella genome enabled us to integrate isotopic labelling and comparative genomics to reconstruct the BIA biosynthetic pathway for both sanguinarine biosynthesis shared by papaveraceous species and the cavidine biosynthesis specific to Corydalis. Also, our comparative analysis revealed that gene duplications, especially tandem gene duplications, underlie the diversification of BIA biosynthetic pathways in Ranunculales. In particular, tandemly duplicated berberine bridge enzyme-like genes appear to be involved in cavidine biosynthesis. In conclusion, our study of the C. tomentella genome provides important insights into the occurrence of WGDs during the early evolution of eudicots as well as into the evolution of BIA biosynthesis in Ranunculales.
  10. Sun, C., Xie, Y., Li, Z., Liu, Y., Sun, X., Li, J., … Zhang, S. (2022). The Larix kaempferi genome reveals new insights into wood properties. JOURNAL OF INTEGRATIVE PLANT BIOLOGY, 64(7), 1364–1373. https://doi.org/10.1111/jipb.13265
    Here, through single-molecule real-time (SMRT) sequencing, we present a high-quality genome sequence of the Japanese larch (Larix kaempferi), a conifer species with great value for wood production and ecological afforestation. The assembled genome is 10.97 Gb in size, harboring 45,828 protein-coding genes. 66.8% of the genome consists of repeat sequences, of which LTR-RTs are dominant and make up 69.86%. We find that tandem duplications have been responsible for the expansion of genes involved in transcriptional regulation and stress responses, unveiling their crucial roles in adaptive evolution. Population transcriptome analysis reveals that lignin content in L. kaempferi is mainly determined by the process of monolignols polymerization. The expression values of six genes (LkCOMT7, LkCOMT8, LkLAC23, LkLAC102, LkPRX148 and LkPRX166) have significantly positive correlations with lignin content. These results indicated that the increased expression of these six genes might be responsible for the high lignin content of the larches' wood. Overall, this study provides new genome resources for investigating the evolution and biological function of conifer trees, and also offers new insights into wood properties of larches.
  11. Li, M.-H., Liu, K.-W., Li, Z., Lu, H.-C., Ye, Q.-L., Zhang, D., … Liu, Z.-J. (2022). Genomes of leafy and leafless Platanthera orchids illuminate the evolution of mycoheterotrophy. NATURE PLANTS, 8(4), 373–388. https://doi.org/10.1038/s41477-022-01127-9
    Analyses of the genome sequences and expression data for two closely related mycoheterotrophic orchid species provide insights into the genomic basis underlying the evolution of mycoheterotrophy. To improve our understanding of the origin and evolution of mycoheterotrophic plants, we here present the chromosome-scale genome assemblies of two sibling orchid species: partially mycoheterotrophic Platanthera zijinensis and holomycoheterotrophic Platanthera guangdongensis. Comparative analysis shows that mycoheterotrophy is associated with increased substitution rates and gene loss, and the deletion of most photoreceptor genes and auxin transporter genes might be linked to the unique phenotypes of fully mycoheterotrophic orchids. Conversely, trehalase genes that catalyse the conversion of trehalose into glucose have expanded in most sequenced orchids, in line with the fact that the germination of orchid non-endosperm seeds needs carbohydrates from fungi during the protocorm stage. We further show that the mature plant of P. guangdongensis, different from photosynthetic orchids, keeps expressing trehalase genes to hijack trehalose from fungi. Therefore, we propose that mycoheterotrophy in mature orchids is a continuation of the protocorm stage by sustaining the expression of trehalase genes. Our results shed light on the molecular mechanism underlying initial, partial and full mycoheterotrophy.
  12. Xu, H., Li, Z., Jiang, P., Zhao, L., Qu, C., Van de Peer, Y., … Zeng, Q. (2022). Divergence of active site motifs among different classes of Populus glutaredoxins results in substrate switches. PLANT JOURNAL, 110(1), 129–146. https://doi.org/10.1111/tpj.15660
    Enzymes are essential components of all biological systems. The key characteristics of proteins functioning as enzymes are their substrate specificities and catalytic efficiencies. In plants, most genes encoding enzymes are members of large gene families. Within such families, the contribution of active site motif to the functional divergence of duplicate genes has not been well elucidated. In this study, we identified 41 glutaredoxin (GRX) genes in Populus trichocarpa genome. GRXs are ubiquitous enzymes in plants that play important roles in developmental and stress-tolerance processes. In poplar, GRX genes were divided into four classes based on clear differences in gene structure and expression pattern, subcellular localization, enzymatic activity, and substrate specificity of the encoded proteins. Using site-directed mutagenesis, this study revealed that the divergence of the active site motif among different classes of GRX proteins resulted in substrate switches and thus provided new insights into the molecular evolution of these important plant enzymes.
  13. Fang, Y., Qin, X., Liao, Q., Du, R., Luo, X., Zhou, Q., … Yan, J. (2022). The genome of homosporous maidenhair fern sheds light on the euphyllophyte evolution and defences. NATURE PLANTS, 8(9), 1024–1037. https://doi.org/10.1038/s41477-022-01222-x
    Euphyllophytes encompass almost all extant plants, including two sister clades, ferns and seed plants. Decoding genomes of ferns is the key to deep insight into the origin of euphyllophytes and the evolution of seed plants. Here we report a chromosome-level genome assembly of Adiantum capillus-veneris L., a model homosporous fern. This fern genome comprises 30 pseudochromosomes with a size of 4.8-gigabase and a contig N50 length of 16.22 Mb. Gene co-expression network analysis uncovered that homospore development in ferns has relatively high genetic similarities with that of the pollen in seed plants. Analysing fern defence response expands understanding of evolution and diversity in endogenous bioactive jasmonates in plants. Moreover, comparing fern genomes with those of other land plants reveals changes in gene families important for the evolutionary novelties within the euphyllophyte clade. These results lay a foundation for studies on fern genome evolution and function, as well as the origin and evolution of euphyllophytes.
  14. Xue, J., Dong, S., Wang, M., Song, T., Zhou, G., Li, Z., … Hang, Y. (2022). Mitochondrial genes from 18 angiosperms fill sampling gaps for phylogenomic inferences of the early diversification of flowering plants. JOURNAL OF SYSTEMATICS AND EVOLUTION, 60(4), 773–788. https://doi.org/10.1111/jse.12708
    The early diversification of angiosperms is thought to have been a rapid process, which may complicate phylogenetic analyses of early angiosperm relationships. Plastid and nuclear phylogenomic studies have raised several conflicting hypotheses regarding overall angiosperm phylogeny, but mitochondrial genomes have been largely ignored as a relevant source of information. Here we sequenced mitochondrial genomes from 18 angiosperms to fill taxon‐sampling gaps in Austrobaileyales, magnoliids, Chloranthales, Ceratophyllales, and major lineages of eudicots and monocots. We assembled a data matrix of 38 mitochondrial genes from 107 taxa to assess how well mitochondrial genomic data address current uncertainties in angiosperm relationships. Although we recovered conflicting phylogenies based on different datasets and analytical methods, we also observed congruence regarding deep relationships of several major angiosperm lineages: Chloranthales were always inferred to be the sister group of Ceratophyllales, Austrobaileyales to mesangiosperms, and the unplaced Dilleniales was consistently resolved as the sister to superasterids. Substitutional saturation, GC‐compositional heterogeneity, and codon‐usage bias are possible reasons for the noise/conflict that may impact phylogenetic reconstruction; and angiosperm mitochondrial genes may not be substantially affected by these factors. The third codon positions of the mitochondrial genes appear to contain more parsimony‐informative sites than the first and second codon positions, and therefore produced better resolved phylogenetic relationships with generally strong support. The relationships among these major lineages remain incompletely resolved, perhaps as a result of the rapidity of early radiations. Nevertheless, data from mitochondrial genomes provides additional evidence and alternative hypotheses for exploring the early evolution and diversification of the angiosperms.
  15. Sun, W., Li, Z., Xiang, S., Ni, L., Zhang, D., Chen, D., … Zou, S. (2021). The Euscaphis japonica genome and the evolution of malvids. PLANT JOURNAL, 108(5), 1382–1399. https://doi.org/10.1111/tpj.15518
    Malvids is one of the largest clades of rosids, includes 58 families and exhibits remarkable morphological and ecological diversity. Here, we report a high-quality chromosome-level genome assembly for Euscaphis japonica, an early-diverging species within malvids. Genome-based phylogenetic analysis suggests that the unstable phylogenetic position of E. japonica may result from incomplete lineage sorting and hybridization event during the diversification of the ancestral population of malvids. Euscaphis japonica experienced two polyploidization events: the ancient whole genome triplication event shared with most eudicots (commonly known as the γ event) and a more recent whole genome duplication event, unique to E. japonica. By resequencing 101 samples from 11 populations, we speculate that the temperature has led to the differentiation of the evergreen and deciduous of E. japonica and the completely different population histories of these two groups. In total, 1012 candidate positively selected genes in the evergreen were detected, some of which are involved in flower and fruit development. We found that reddening and dehiscence of the E. japonica pericarp and long fruit-hanging time promoted the reproduction of E. japonica populations, and revealed the expression patterns of genes related to fruit reddening, dehiscence and abscission. The key genes involved in pentacyclic triterpene synthesis in E. japonica were identified, and different expression patterns of these genes may contribute to pentacyclic triterpene diversification. Our work sheds light on the evolution of E. japonica and malvids, particularly on the diversification of E. japonica and the genetic basis for their fruit dehiscence and abscission.
  16. Wan, T., Liu, Z., Leitch, I. J., Xin, H., Maggs-Kölling, G., Gong, Y., … Wang, Q. (2021). The Welwitschia genome reveals a unique biology underpinning extreme longevity in deserts. NATURE COMMUNICATIONS, 12(1). https://doi.org/10.1038/s41467-021-24528-4
    The gymnosperm Welwitschia mirabilis belongs to the ancient, enigmatic gnetophyte lineage. It is a unique desert plant with extreme longevity and two ever-elongating leaves. We present a chromosome-level assembly of its genome (6.8Gb/1C) together with methylome and transcriptome data to explore its astonishing biology. We also present a refined, high-quality assembly of Gnetum montanum to enhance our understanding of gnetophyte genome evolution. The Welwitschia genome has been shaped by a lineage-specific ancient, whole genome duplication (similar to 86 million years ago) and more recently (1-2 million years) by bursts of retrotransposon activity. High levels of cytosine methylation (particularly at CHH motifs) are associated with retrotransposons, whilst long-term deamination has resulted in an exceptionally GC-poor genome. Changes in copy number and/or expression of gene families and transcription factors (e.g. R2R3MYB, SAUR) controlling cell growth, differentiation and metabolism underpin the plant's longevity and tolerance to temperature, nutrient and water stress.
  17. Cheng, S.-P., Jia, K.-H., Liu, H., Zhang, R.-G., Li, Z.-C., Zhou, S.-S., … Mao, J.-F. (2021). Haplotype-resolved genome assembly and allele-specific gene expression in cultivated ginger. HORTICULTURE RESEARCH, 8(1). https://doi.org/10.1038/s41438-021-00599-8
    Ginger (Zingiber officinale) is one of the most valued spice plants worldwide; it is prized for its culinary and folk medicinal applications and is therefore of high economic and cultural importance. Here, we present a haplotype-resolved, chromosome-scale assembly for diploid ginger anchored to 11 pseudochromosome pairs with a total length of 3.1 Gb. Remarkable structural variation was identified between haplotypes, and two inversions larger than 15 Mb on chromosome 4 may be associated with ginger infertility. We performed a comprehensive, spatiotemporal, genome-wide analysis of allelic expression patterns, revealing that most alleles are coordinately expressed. The alleles that exhibited the largest differences in expression showed closer proximity to transposable elements, greater coding sequence divergence, more relaxed selection pressure, and more transcription factor binding site differences. We also predicted the transcription factors potentially regulating 6-gingerol biosynthesis. Our allele-aware assembly provides a powerful platform for future functional genomics, molecular breeding, and genome editing in ginger.
  18. Cao, Y.-L., Li, Y., Fan, Y.-F., Li, Z., Yoshida, K., Wang, J.-Y., … Liu, Z.-J. (2021). Wolfberry genomes and the evolution of Lycium (Solanaceae). COMMUNICATIONS BIOLOGY, 4(1). https://doi.org/10.1038/s42003-021-02152-8
    Wolfberry Lycium, an economically important genus of the Solanaceae family, contains approximately 80 species and shows a fragmented distribution pattern among the Northern and Southern Hemispheres. Although several herbaceous species of Solanaceae have been subjected to genome sequencing, thus far, no genome sequences of woody representatives have been available. Here, we sequenced the genomes of 13 perennial woody species of Lycium, with a focus on Lycium barbarum. Integration with other genomes provides clear evidence supporting a whole-genome triplication (WGT) event shared by all hitherto sequenced solanaceous plants, which occurred shortly after the divergence of Solanaceae and Convolvulaceae. We identified new gene families and gene family expansions and contractions that first appeared in Solanaceae. Based on the identification of self-incompatibility related-gene families, we inferred that hybridization hotspots are enriched for genes that might be functioning in gametophytic self-incompatibility pathways in wolfberry. Extremely low expression of LOCULE NUBER (LC) and COLORLESS NON-RIPENING (CNR) orthologous genes during Lycium fruit development and ripening processes suggests functional diversification of these two genes between Lycium and tomato. The existence of additional flowering locus C-like MADS-box genes might correlate with the perennial flowering cycle of Lycium. Differential gene expression involved in the lignin biosynthetic pathway between Lycium and tomato likely illustrates woody and herbaceous differentiation. We also provide evidence that Lycium migrated from Africa into Asia, and subsequently from Asia into North America. Our results provide functional insights into Solanaceae origins, evolution and diversification. Cao, Li, et al. sequence 13 perennial woody plant species of Lycium, and specifically provide a draft assembly of L. ruthenicum and a chromosome-level assembly of L. barbarum, the wolfberry or Goji berry. From a phylogenetic tree the authors identify an ancient hexaploidization event, and report the evolution of gene families including fruit ripening, fruit coloration, polysaccharide synthesis and self-incompatibility within Solanaceae and the general biogeography of L. barbarum.
  19. Zhao, T., Zwaenepoel, A., Xue, J.-Y., Kao, S.-M., Li, Z., Schranz, M. E., & Van de Peer, Y. (2021). Whole-genome microsynteny-based phylogeny of angiosperms. NATURE COMMUNICATIONS, 12(1). https://doi.org/10.1038/s41467-021-23665-0
    Plant genomes vary greatly in size, organization, and architecture. Such structural differences may be highly relevant for inference of genome evolution dynamics and phylogeny. Indeed, microsynteny-the conservation of local gene content and order-is recognized as a valuable source of phylogenetic information, but its use for the inference of large phylogenies has been limited. Here, by combining synteny network analysis, matrix representation, and maximum likelihood phylogenetic inference, we provide a way to reconstruct phylogenies based on microsynteny information. Both simulations and use of empirical data sets show our method to be accurate, consistent, and widely applicable. As an example, we focus on the analysis of a large-scale whole-genome data set for angiosperms, including more than 120 available high-quality genomes, representing more than 50 different plant families and 30 orders. Our 'microsynteny-based' tree is largely congruent with phylogenies proposed based on more traditional sequence alignment-based methods and current phylogenetic classifications but differs for some long-contested and controversial relationships. For instance, our synteny-based tree finds Vitales as early diverging eudicots, Saxifragales within superasterids, and magnoliids as sister to monocots. We discuss how synteny-based phylogenetic inference can complement traditional methods and could provide additional insights into some long-standing controversial phylogenetic relationships. Molecular phylogenies are traditionally based on sequence variation, but genome rearrangements also contain phylogenetic information. Here, Zhao et al. develop an approach to reconstruct phylogenies based on microsynteny and illustrate it with a reconstruction of the angiosperm phylogeny.
  20. Li, Z., & Van de Peer, Y. (2021). A non-duplicated magnoliid genome. https://doi.org/10.1038/s41477-021-00989-9
  21. Ai, Y., Li, Z., Sun, W.-H., Chen, J., Zhang, D., Ma, L., … Liu, Z.-J. (2021). The Cymbidium genome reveals the evolution of unique morphological traits. HORTICULTURE RESEARCH, 8(1). https://doi.org/10.1038/s41438-021-00683-z
    The marvelously diverse Orchidaceae constitutes the largest family of angiosperms. The genus Cymbidium in Orchidaceae is well known for its unique vegetation, floral morphology, and flower scent traits. Here, a chromosome-scale assembly of the genome of Cymbidium ensifolium (Jianlan) is presented. Comparative genomic analysis showed that C. ensifolium has experienced two whole-genome duplication (WGD) events, the most recent of which was shared by all orchids, while the older event was the tau event shared by most monocots. The results of MADS-box genes analysis provided support for establishing a unique gene model of orchid flower development regulation, and flower shape mutations in C. ensifolium were shown to be associated with the abnormal expression of MADS-box genes. The most abundant floral scent components identified included methyl jasmonate, acacia alcohol and linalool, and the genes involved in the floral scent component network of C. ensifolium were determined. Furthermore, the decreased expression of photosynthesis-antennae and photosynthesis metabolic pathway genes in leaves was shown to result in colorful striped leaves, while the increased expression of MADS-box genes in leaves led to perianth-like leaves. Our results provide fundamental insights into orchid evolution and diversification.
  22. Verlinden, H., Sterck, L., Li, J., Li, Z., Yssel, A., Gansemans, Y., … Vanden Broeck, J. (2020). First draft genome assembly of the desert locust, Schistocerca gregaria. F1000RESEARCH, 9. https://doi.org/10.12688/f1000research.25148.1
    Background: At the time of publication, the most devastating desert locust crisis in decades is affecting East Africa, the Arabian Peninsula and South-West Asia. The situation is extremely alarming in East Africa, where Kenya, Ethiopia and Somalia face an unprecedented threat to food security and livelihoods. Most of the time, however, locusts do not occur in swarms, but live as relatively harmless solitary insects. The phenotypically distinct solitarious and gregarious locust phases differ markedly in many aspects of behaviour, physiology and morphology, making them an excellent model to study how environmental factors shape behaviour and development. A better understanding of the extreme phenotypic plasticity in desert locusts will offer new, more environmentally sustainable ways of fighting devastating swarms. Methods: High molecular weight DNA derived from two adult males was used for Mate Pair and Paired End Illumina sequencing and PacBio sequencing. A reliable reference genome of Schistocerca gregaria was assembled using the ABySS pipeline, scaffolding was improved using LINKS. Results: In total, 1,316 Gb Illumina reads and 112 Gb PacBio reads were produced and assembled. The resulting draft genome consists of 8,817,834,205 bp organised in 955,015 scaffolds with an N50 of 157,705 bp, making the desert locust genome the largest insect genome sequenced and assembled to date. In total, 18,815 protein-encoding genes are predicted in the desert locust genome, of which 13,646 (72.53%) obtained at least one functional assignment based on similarity to known proteins. Conclusions: The desert locust genome data will contribute greatly to studies of phenotypic plasticity, physiology, neurobiology, molecular ecology, evolutionary genetics and comparative genomics, and will promote the desert locust’s use as a model system. The data will also facilitate the development of novel, more sustainable strategies for preventing or combating swarms of these infamous insects.
  23. Zhang, L., Chen, F., Zhang, X., Li, Z., Zhao, Y., Lohaus, R., … Tang, H. (2020). The water lily genome and the early evolution of flowering plants. NATURE, 577(7788), 79–84. https://doi.org/10.1038/s41586-019-1852-5
    Water lilies belong to the angiosperm order Nymphaeales. Amborellales, Nymphaeales and Austrobaileyales together form the so-called ANA-grade of angiosperms, which are extant representatives of lineages that diverged the earliest from the lineage leading to the extant mesangiosperms1,2,3. Here we report the 409-megabase genome sequence of the blue-petal water lily (Nymphaea colorata). Our phylogenomic analyses support Amborellales and Nymphaeales as successive sister lineages to all other extant angiosperms. The N. colorata genome and 19 other water lily transcriptomes reveal a Nymphaealean whole-genome duplication event, which is shared by Nymphaeaceae and possibly Cabombaceae. Among the genes retained from this whole-genome duplication are homologues of genes that regulate flowering transition and flower development. The broad expression of homologues of floral ABCE genes in N. colorata might support a similarly broadly active ancestral ABCE model of floral organ determination in early angiosperms. Water lilies have evolved attractive floral scents and colours, which are features shared with mesangiosperms, and we identified their putative biosynthetic genes in N. colorata. The chemical compounds and biosynthetic genes behind floral scents suggest that they have evolved in parallel to those in mesangiosperms. Because of its unique phylogenetic position, the N. colorata genome sheds light on the early evolution of angiosperms.
  24. de María, N., Guevara, M. Á., Perdiguero, P., Vélez, M. D., Cabezas, J. A., López‐Hinojosa, M., … Cervera, M. T. (2020). Molecular study of drought response in the Mediterranean conifer Pinus Pinaster Ait. : differential transcriptomic profiling reveals constitutive water deficit‐independent drought tolerance mechanisms. ECOLOGY AND EVOLUTION, 10(18), 9788–9807. https://doi.org/10.1002/ece3.6613
    Adaptation of long‐living forest trees to respond to environmental changes is essential to secure their performance under adverse conditions. Water deficit is one of the most significant stress factors determining tree growth and survival. Maritime pine (Pinus pinaster Ait.), the main source of softwood in southwestern Europe, is subjected to recurrent drought periods which, according to climate change predictions for the years to come, will progressively increase in the Mediterranean region. The mechanisms regulating pine adaptive responses to environment are still largely unknown. The aim of this work was to go a step further in understanding the molecular mechanisms underlying maritime pine response to water stress and drought tolerance at the whole plant level. A global transcriptomic profiling of roots, stems, and needles was conducted to analyze the performance of siblings showing contrasted responses to water deficit from an ad hoc designed full‐sib family. Although P. pinaster is considered a recalcitrant species for vegetative propagation in adult phase, the analysis was conducted using vegetatively propagated trees exposed to two treatments: well‐watered and moderate water stress. The comparative analyses led us to identify organ‐specific genes, constitutively expressed as well as differentially expressed when comparing control versus water stress conditions, in drought‐sensitive and drought‐tolerant genotypes. Different response strategies can point out, with tolerant individuals being pre‐adapted for coping with drought by constitutively expressing stress‐related genes that are detected only in latter stages on sensitive individuals subjected to drought.
  25. Tyrmi, J. S., Vuosku, J., Acosta, J. J., Li, Z., Sterck, L., Cervera, M. T., … Pyhäjärvi, T. (2020). Genomics of clinal local adaptation in Pinus sylvestris under continuous environmental and spatial genetic setting. G3-GENES GENOMES GENETICS, 10(8), 2683–2696. https://doi.org/10.1534/g3.120.401285
    Understanding the consequences of local adaptation at the genomic diversity is a central goal in evolutionary genetics of natural populations. In species with large continuous geographical distributions the phenotypic signal of local adaptation is frequently clear, but the genetic basis often remains elusive. We examined the patterns of genetic diversity inPinus sylvestris, a keystone species in many Eurasian ecosystems with a huge distribution range and decades of forestry research showing that it is locally adapted to the vast range of environmental conditions. MakingP. sylvestrisan even more attractive subject of local adaptation study, population structure has been shown to be weak previously and in this study. However, little is known about the molecular genetic basis of adaptation, as the massive size of gymnosperm genomes has prevented large scale genomic surveys. We generated a both geographically and genomically extensive dataset using a targeted sequencing approach. By applying divergence-based and landscape genomics methods we identified several loci contributing to local adaptation, but only few with large allele frequency changes across latitude. We also discovered a very large (ca. 300 Mbp) putative inversion potentially under selection, which to our knowledge is the first such discovery in conifers. Our results call for more detailed analysis of structural variation in relation to genomic basis of local adaptation, emphasize the lack of large effect loci contributing to local adaptation in the coding regions and thus point out the need for more attention toward multi-locus analysis of polygenic adaptation.
  26. Li, L., Wang, S., Wang, H., Sahu, S. K., Marin, B., Li, H., … Liu, H. (2020). The genome of Prasinoderma coloniale unveils the existence of a third phylum within green plants. NATURE ECOLOGY & EVOLUTION, 4(9), 1220–1231. https://doi.org/10.1038/s41559-020-1221-7
    Genome analysis of the pico-eukaryotic marine green algaPrasinoderma colonialeCCMP 1413 unveils the existence of a novel phylum within green plants (Viridiplantae), the Prasinodermophyta, which diverged before the split of Chlorophyta and Streptophyta. Structural features of the genome and gene family comparisons revealed an intermediate position of theP. colonialegenome (25.3 Mb) between the extremely compact, small genomes of picoplanktonic Mamiellophyceae (Chlorophyta) and the larger, more complex genomes of early-diverging streptophyte algae. Reconstruction of the minimal core genome of Viridiplantae allowed identification of an ancestral toolkit of transcription factors and flagellar proteins. Adaptations ofP. colonialeto its deep-water, oligotrophic environment involved expansion of light-harvesting proteins, reduction of early light-induced proteins, evolution of a distinct type of C(4)photosynthesis and carbon-concentrating mechanism, synthesis of the metal-complexing metabolite picolinic acid, and vitamin B-1, B(7)and B(12)auxotrophy. TheP. colonialegenome provides first insights into the dawn of green plant evolution. Genome analysis of the pico-eukaryotic marine green algaPrasinoderma colonialeCCMP 1413 unveils the existence of a novel phylum within green plants (Viridiplantae), the Prasinodermophyta, which diverged before the split of Chlorophyta and Streptophyta.
  27. Li, Z., & Van de Peer, Y. (2020). ’Winter is coming’ : how did polyploid plants survive? https://doi.org/10.1016/j.molp.2019.12.003
  28. Pu, X., Li, Z., Tian, Y., Gao, R., Hao, L., Hu, Y., … Song, J. (2020). The honeysuckle genome provides insight into the molecular mechanism of carotenoid metabolism underlying dynamic flower coloration. NEW PHYTOLOGIST, 227(3), 930–943. https://doi.org/10.1111/nph.16552
    Lonicera japonica is a wide-spread member of the Caprifoliaceae (honeysuckle) family utilized in traditional medical practices. This twining vine honeysuckle is also a much-sought ornamental, in part due to its dynamic flower coloration, which changes from white to gold during development. The molecular mechanism underlying dynamic flower coloration in L. japonica was elucidated by integrating whole genome sequencing, transcriptomic analysis, and biochemical assays. Here, we report a chromosome-level genome assembly of L. japonica, comprising nine pseudo-chromosomes with a total size of 843.2 Mb. We also provide evidence for a whole genome duplication event in the lineage leading to L. japonica, which occurred after its divergence from Dipsacales and Asterales. Moreover, gene expression analysis not only revealed correlated expression of the relevant biosynthetic genes with carotenoid accumulation, but also suggested a role for carotenoid degradation in L. japonica's dynamic flower coloration. The variation of flower color is consistent with not only the observed carotenoid accumulation pattern, but also with the release of volatile apocarotenoids that presumably serve as pollinator attractants. Beyond novel insights into the evolution and dynamics of flower coloration, the high-quality L. japonica genome sequence also provides a foundation for molecular breeding to improve desired characteristics.
  29. Chen, Y.-C., Li, Z., Zhao, Y.-X., Gao, M., Wang, J.-Y., Liu, K.-W., … Wang, Y.-D. (2020). The Litsea genome and the evolution of the laurel family. NATURE COMMUNICATIONS, 11. https://doi.org/10.1038/s41467-020-15493-5
    The laurel family within the Magnoliids has attracted attentions owing to its scents, variable inflorescences, and controversial phylogenetic position. Here, we present a chromosome-level assembly of the Litsea cubeba genome, together with low-coverage genomic and transcriptomic data for many other Lauraceae. Phylogenomic analyses show phylogenetic discordance at the position of Magnoliids, suggesting incomplete lineage sorting during the divergence of monocots, eudicots, and Magnoliids. An ancient whole-genome duplication (WGD) event occurred just before the divergence of Laurales and Magnoliales; subsequently, independent WGDs occurred almost simultaneously in the three Lauralean lineages. The phylogenetic relationships within Lauraceae correspond to the divergence of inflorescences, as evidenced by the phylogeny of FUWA, a conserved gene involved in determining panicle architecture in Lauraceae. Monoterpene synthases responsible for production of specific volatile compounds in Lauraceae are functionally verified. Our work sheds light on the evolution of the Lauraceae, the genetic basis for floral evolution and specific scents.
  30. Roodt, D., Li, Z., Van de Peer, Y., & Mizrachi, E. (2019). Loss of wood formation genes in monocot genomes. GENOME BIOLOGY AND EVOLUTION, 11(7), 1986–1996. https://doi.org/10.1093/gbe/evz115
    Woodiness (secondary xylem derived from vascular cambium) has been gained and lost multiple times in the angiosperms, but has been lost ancestrally in all monocots. Here, we investigate the conservation of genes involved in xylogenesis in fully sequenced angiosperm genomes, hypothesising that monocots have lost some essential orthologs involved in this process. We analysed the conservation of genes preferentially expressed in the developing secondary xylem of two eudicot trees in the sequenced genomes of 26 eudicot and seven monocot species, and the early-diverging angiosperm Amborella trichopoda. We also reconstructed a regulatory model of early vascular cambial cell identity and differentiation and investigated the conservation of orthologs across the angiosperms. Additionally, we analysed the genome of the aquatic seagrass Zostera marina for additional losses of genes otherwise essential to, especially, secondary cell wall formation. Despite almost complete conservation of orthology within the early cambial differentiation gene network, we show a clear pattern of loss of genes preferentially expressed in secondary xylem in the monocots that are highly conserved across eudicot species. Our study provides candidate genes that may have led to the loss of vascular cambium in the monocots, and, by comparing terrestrial angiosperms to an aquatic monocot, highlights genes essential to vasculature on land.
  31. Zwaenepoel, A., Li, Z., Lohaus, R., & Van de Peer, Y. (2019). Finding evidence for whole genome duplications : a reappraisal. MOLECULAR PLANT, 12(2), 133–136. https://doi.org/10.1016/j.molp.2018.12.019
  32. Wan, T., Liu, Z.-M., Li, L.-F., Leitch, A. R., Leitch, I. J., Lohaus, R., … Wang, X.-M. (2018). A genome for gnetophytes and early evolution of seed plants. NATURE PLANTS, 4(2), 82–89. https://doi.org/10.1038/s41477-017-0097-2
    Gnetophytes are an enigmatic gymnosperm lineage comprising three genera, Gnetum, Welwitschia and Ephedra, which are morphologically distinct from all other seed plants. Their distinctiveness has triggered much debate as to their origin, evolution and phylogenetic placement among seed plants. To increase our understanding of the evolution of gnetophytes, and their relation to other seed plants, we report here a high-quality draft genome sequence for Gnetum montanum, the first for any gnetophyte. By using a novel genome assembly strategy to deal with high levels of heterozygosity, we assembled >4 Gb of sequence encoding 27,491 protein-coding genes. Comparative analysis of the G. montanum genome with other gymnosperm genomes unveiled some remarkable and distinctive genomic features, such as a diverse assemblage of retrotransposons with evidence for elevated frequencies of elimination rather than accumulation, considerable differences in intron architecture, including both length distribution and proportions of (retro) transposon elements, and distinctive patterns of proliferation of functional protein domains. Furthermore, a few gene families showed Gnetum-specific copy number expansions (for example, cellulose synthase) or contractions (for example, Late Embryogenesis Abundant protein), which could be connected with Gnetum's distinctive morphological innovations associated with their adaptation to warm, mesic environments. Overall, the G. montanum genome enables a better resolution of ancestral genomic features within seed plants, and the identification of genomic characters that distinguish Gnetum from other gymnosperms.
  33. Li, Z. (2018). The study of plant genome evolution by means of phylogenomics. Ghent University. Faculty of Sciences, Ghent, Belgium.
  34. Causier, B., Li, Z., De Smet, R., Lloyd, J. P., Van de Peer, Y., & Davies, B. (2017). Conservation of nonsense-mediated mRNA decay complex components throughout eukaryotic evolution. SCIENTIFIC REPORTS, 7. https://doi.org/10.1038/s41598-017-16942-w
    Nonsense-mediated mRNA decay (NMD) is an essential eukaryotic process regulating transcript quality and abundance, and is involved in diverse processes including brain development and plant defenses. Although some of the NMD machinery is conserved between kingdoms, little is known about its evolution. Phosphorylation of the core NMD component UPF1 is critical for NMD and is regulated in mammals by the SURF complex (UPF1, SMG1 kinase, SMG8, SMG9 and eukaryotic release factors). However, since SMG1 is reportedly missing from the genomes of fungi and the plant Arabidopsis thaliana, it remains unclear how UPF1 is activated outside the metazoa. We used comparative genomics to determine the conservation of the NMD pathway across eukaryotic evolution. We show that SURF components are present in all major eukaryotic lineages, including fungi, suggesting that in addition to UPF1 and SMG1, SMG8 and SMG9 also existed in the last eukaryotic common ancestor, 1.8 billion years ago. However, despite the ancient origins of the SURF complex, we also found that SURF factors have been independently lost across the Eukarya, pointing to genetic buffering within the essential NMD pathway. We infer an ancient role for SURF in regulating UPF1, and the intriguing possibility of undiscovered NMD regulatory pathways.
  35. Tasdighian, S., Van Bel, M., Li, Z., Van de Peer, Y., Carretero-Paulet, L., & Maere, S. (2017). Reciprocally retained genes in the angiosperm lineage show the hallmarks of dosage balance sensitivity. PLANT CELL, 29(11), 2766–2785. https://doi.org/10.1105/tpc.17.00313
    In several organisms, particular functional categories of genes, such as regulatory and complex-forming genes, are preferentially retained after whole-genome multiplications but rarely duplicate through small-scale duplication, a pattern referred to as reciprocal retention. This peculiar duplication behavior is hypothesized to stem from constraints on the dosage balance between the genes concerned and their interaction context. However, the evidence for a relationship between reciprocal retention and dosage balance sensitivity remains fragmentary. Here, we identified which gene families are most strongly reciprocally retained in the angiosperm lineage and studied their functional and evolutionary characteristics. Reciprocally retained gene families exhibit stronger sequence divergence constraints and lower rates of functional and expression divergence than other gene families, suggesting that dosage balance sensitivity is a general characteristic of reciprocally retained genes. Gene families functioning in regulatory and signaling processes are much more strongly represented at the top of the reciprocal retention ranking than those functioning in multiprotein complexes, suggesting that regulatory imbalances may lead to stronger fitness effects than classical stoichiometric protein complex imbalances. Finally, reciprocally retained duplicates are often subject to dosage balance constraints for prolonged evolutionary times, which may have repercussions for the ease with which genome multiplications can engender evolutionary innovation.
  36. De Smet, R., Sabaghian, E., Li, Z., Saeys, Y., & Van de Peer, Y. (2017). Coordinated functional divergence of genes after genome duplication in Arabidopsis thaliana. PLANT CELL, 29(11), 2786–2800. https://doi.org/10.1105/tpc.17.00531
    Gene and genome duplications have been rampant during the evolution of flowering plants. Unlike small-scale gene duplications, whole-genome duplications (WGDs) copy entire pathways or networks, and as such create the unique situation in which such duplicated pathways or networks could evolve novel functionality through the coordinated sub-or neofunctionalization of its constituent genes. Here, we describe a remarkable case of coordinated gene expression divergence following WGDs in Arabidopsis thaliana. We identified a set of 92 homoeologous gene pairs that all show a similar pattern of tissue-specific gene expression divergence following WGD, with one homoeolog showing predominant expression in aerial tissues and the other homoeolog showing biased expression in tip-growth tissues. We provide evidence that this pattern of gene expression divergence seems to involve genes with a role in cell polarity and that likely function in the maintenance of cell wall integrity. Following WGD, many of these duplicated genes evolved separate functions through subfunctionalization in growth/development and stress response. Uncoupling these processes through genome duplications likely provided important adaptations with respect to growth and morphogenesis and defense against biotic and abiotic stress.
  37. De La Torre, A. R., Li, Z., Van de Peer, Y., & Ingvarsson, P. K. (2017). Contrasting rates of molecular evolution and patterns of selection among gymnosperms and flowering plants. MOLECULAR BIOLOGY AND EVOLUTION, 34(6), 1363–1377. https://doi.org/10.1093/molbev/msx069
    The majority of variation in rates of molecular evolution among seed plants remains both unexplored and unexplained. Although some attention has been given to flowering plants, reports of molecular evolutionary rates for their sister plant clade (gymnosperms) are scarce, and to our knowledge differences in molecular evolution among seed plant clades have never been tested in a phylogenetic framework. Angiosperms and gymnosperms differ in a number of features, of which contrasting reproductive biology, life spans, and population sizes are the most prominent. The highly conserved morphology of gymnosperms evidenced by similarity of extant species to fossil records and the high levels of macrosynteny at the genomic level have led scientists to believe that gymnosperms are slow-evolving plants, although some studies have offered contradictory results. Here, we used 31,968 nucleotide sites obtained from orthologous genes across a wide taxonomic sampling that includes representatives of most conifers, cycads, ginkgo, and many angiosperms with a sequenced genome. Our results suggest that angiosperms and gymnosperms differ considerably in their rates of molecular evolution per unit time, with gymnosperm rates being, on average, seven times lower than angiosperm species. Longer generation times and larger genome sizes are some of the factors explaining the slow rates of molecular evolution found in gymnosperms. In contrast to their slow rates of molecular evolution, gymnosperms possess higher substitution rate ratios than angiosperm taxa. Finally, our study suggests stronger and more efficient purifying and diversifying selection in gymnosperm than in angiosperm species, probably in relation to larger effective population sizes.
  38. Unver, T., Wu, Z., Sterck, L., Turktas, M., Lohaus, R., Li, Z., … Van de Peer, Y. (2017). Genome of wild olive and the evolution of oil biosynthesis. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 114(44), E9413–E9422. https://doi.org/10.1073/pnas.1708621114
    Here we present the genome sequence and annotation of the wild olive tree (Olea europaea var. sylvestris), called oleaster, which is considered an ancestor of cultivated olive trees. More than 50,000 protein-coding genes were predicted, a majority of which could be anchored to 23 pseudochromosomes obtained through a newly constructed genetic map. The oleaster genome contains signatures of two Oleaceae lineage-specific paleopolyploidy events, dated at similar to 28 and similar to 59 Mya. These events contributed to the expansion and neo-functionalization of genes and gene families that play important roles in oil biosynthesis. The functional divergence of oil biosynthesis pathway genes, such as FAD2, SACPD, EAR, and ACPTE, following duplication, has been responsible for the differential accumulation of oleic and linoleic acids produced in olive compared with sesame, a closely related oil crop. Duplicated oleaster FAD2 genes are regulated by an siRNA derived from a transposable element-rich region, leading to suppressed levels of FAD2 gene expression. Additionally, neofunctionalization of members of the SACPD gene family has led to increased expression of SACPD2,3, 5, and 7, consequently resulting in an increased desaturation of steric acid. Taken together, decreased FAD2 expression and increased SACPD expression likely explain the accumulation of exceptionally high levels of oleic acid in olive. The oleaster genome thus provides important insights into the evolution of oil biosynthesis and will be a valuable resource for oil crop genomics.
  39. Cañas, R. A., Li, Z., Pascual, M. B., Castro-Rodríguez, V., Ávila, C., Sterck, L., … Cánovas, F. M. (2017). The gene expression landscape of pine seedling tissues. PLANT JOURNAL, 91(6), 1064–1087. https://doi.org/10.1111/tpj.13617
    Conifers dominate vast regions of the Northern hemisphere. They are the main source of raw materials for timber industry as well as a wide range of biomaterials. Despite their inherent difficulties as experimental models for classical plant biology research, the technological advances in genomics research are enabling fundamental studies on these plants. The use of laser capture microdissection followed by transcriptomic analysis is a powerful tool for unravelling the molecular and functional organization of conifer tissues and specialized cells. In the present work, 14 different tissues from 1-month-old maritime pine (Pinus pinaster) seedlings have been isolated and their transcriptomes analysed. The results increased the sequence information and number of full-length transcripts from a previous reference transcriptome and added 39 841 new transcripts. In total, 2376 transcripts were ubiquitously expressed in all of the examined tissues. These transcripts could be considered the core 'housekeeping genes' in pine. The genes have been clustered in function to their expression profiles. This analysis reduced the number of profiles to 38, most of these defined by their expression in a unique tissue that is much higher than in the other tissues. The expression and localization data are accessible at ConGenIE.org (http://v22.popgenie.org/microdisection/). This study presents an overview of the gene expression distribution in different pine tissues, specifically highlighting the relationships between tissue gene expression and function. This transcriptome atlas is a valuable resource for functional genomics research in conifers.
  40. Zhang, G.-Q., Liu, K.-W., Li, Z., Lohaus, R., Hsiao, Y.-Y., Niu, S.-C., … Liu, Z.-J. (2017). The Apostasia genome and the evolution of orchids. NATURE, 549(7672), 379–383. https://doi.org/10.1038/nature23897
    Constituting approximately 10% of flowering plant species, orchids (Orchidaceae) display unique flower morphologies, possess an extraordinary diversity in lifestyle, and have successfully colonized almost every habitat on Earth(1-3). Here we report the draft genome sequence of Apostasia shenzhenica(4), a representative of one of two genera that form a sister lineage to the rest of the Orchidaceae, providing a reference for inferring the genome content and structure of the most recent common ancestor of all extant orchids and improving our understanding of their origins and evolution. In addition, we present transcriptome data for representatives of Vanilloideae, Cypripedioideae and Orchidoideae, and novel third-generation genome data for two species of Epidendroideae, covering all five orchid subfamilies. A. shenzhenica shows clear evidence of a whole-genome duplication, which is shared by all orchids and occurred shortly before their divergence. Comparisons between A. shenzhenica and other orchids and angiosperms also permitted the reconstruction of an ancestral orchid gene toolkit. We identify new gene families, gene family expansions and contractions, and changes within MADS-box gene classes, which control a diverse suite of developmental processes, during orchid evolution. This study sheds new light on the genetic mechanisms underpinning key orchid innovations, including the development of the labellum and gynostemium, pollinia, and seeds without endosperm, as well as the evolution of epiphytism; reveals relationships between the Orchidaceae subfamilies; and helps clarify the evolutionary history of orchids within the angiosperms.
  41. Li, Z., De La Torre, A. R., Sterck, L., Cánovas, F. M., Avila, C., Merino, I., … Van de Peer, Y. (2017). Single-copy genes as molecular markers for phylogenomic studies in seed plants. GENOME BIOLOGY AND EVOLUTION, 9(5), 1130–1147. https://doi.org/10.1093/gbe/evx070
    Phylogenetic relationships among seed plant taxa, especially within the gymnosperms, remain contested. In contrast to angio-sperms, for which several genomic, transcriptomic and phylogenetic resources are available, there are few, if any, molecular markers that allow broad comparisons among gymnosperm species. With few gymnosperm genomes available, recently obtained transcriptomes in gymnosperms are a great addition to identifying single-copy gene families as molecular markers for phylogenomic analysis in seed plants. Taking advantage of an increasing number of available genomes and transcriptomes, we identified single-copy genes in a broad collection of seed plants and used these to infer phylogenetic relationships between major seed plant taxa. This study aims at extending the current phylogenetic toolkit for seed plants, assessing its ability for resolving seed plant phylogeny, and discussing potential factors affecting phylogenetic reconstruction. In total, we identified 3,072 single-copy genes in 31 gymnosperms and 2,156 single-copy genes in 34 angiosperms. All studied seed plants shared 1,469 single-copy genes, which are generally involved in functions like DNA metabolism, cell cycle, and photosynthesis. A selected set of 106 single-copy genes provided good resolution for the seed plant phylogeny except for gnetophytes. Although some of our analyses support a sister relationship between gnetophytes and other gymnosperms, phylogenetic trees from concatenated alignments without 3rd codon positions and amino acid alignments under the CAT + GTR model, support gnetophytes as a sister group to Pinaceae. Our phylogenomic analyses demonstrate that, in general, single-copy genes can uncover both recent and deep divergences of seed plant phylogeny.
  42. Li, Z., Defoort, J., Tasdighian, S., Maere, S., Van de Peer, Y., & De Smet, R. (2016). Gene duplicability of core genes is highly consistent across all angiosperms. PLANT CELL, 28(2), 326–344. https://doi.org/10.1105/tpc.15.00877
    Gene duplication is an important mechanism for adding to genomic novelty. Hence, which genes undergo duplication and are preserved following duplication is an important question. It has been observed that gene duplicability, or the ability of genes to be retained following duplication, is a nonrandom process, with certain genes being more amenable to survive duplication events than others. Primarily, gene essentiality and the type of duplication (small-scale versus large-scale) have been shown in different species to influence the (long-term) survival of novel genes. However, an overarching view of "gene duplicability" is lacking, mainly due to the fact that previous studies usually focused on individual species and did not account for the influence of genomic context and the time of duplication. Here, we present a large-scale study in which we investigated duplicate retention for 9178 gene families shared between 37 flowering plant species, referred to as angiosperm core gene families. For most gene families, we observe a strikingly consistent pattern of gene duplicability across species, with gene families being either primarily single-copy or multicopy in all species. An intermediate class contains gene families that are often retained in duplicate for periods extending to tens of millions of years after whole-genome duplication, but ultimately appear to be largely restored to singleton status, suggesting that these genes may be dosage balance sensitive. The distinction between single-copy and multicopy gene families is reflected in their functional annotation, with single-copy genes being mainly involved in the maintenance of genome stability and organelle function and multicopy genes in signaling, transport, and metabolism. The intermediate class was overrepresented in regulatory genes, further suggesting that these represent putative dosage-balance-sensitive genes.
  43. Kerchev, P., Waszczak, C., Lewandowska, A., Willems, P., Shapiguzov, A., Li, Z., … Van Breusegem, F. (2016). Lack of GLYCOLATE OXIDASE1, but not GLYCOLATE OXIDASE2, attenuates the photorespiratory phenotype of CATALASE2-deficient Arabidopsis. PLANT PHYSIOLOGY, 171(3), 1704–1719. https://doi.org/10.1104/pp.16.00359
    The genes coding for the core metabolic enzymes of the photorespiratory pathway that allows plants with C3-type photosynthesis to survive in an oxygen-rich atmosphere, have been largely discovered in genetic screens aimed to isolate mutants that are unviable under ambient air. As an exception, glycolate oxidase (GOX) mutants with a photorespiratory phenotype have not been described yet in C3 species. Using Arabidopsis (Arabidopsis thaliana) mutants lacking the peroxisomal CATALASE2 (cat2-2) that display stunted growth and cell death lesions under ambient air, we isolated a second-site loss-of-function mutation in GLYCOLATE OXIDASE1 (GOX1) that attenuated the photorespiratory phenotype of cat2-2. Interestingly, knocking out the nearly identical GOX2 in the cat2-2 background did not affect the photorespiratory phenotype, indicating that GOX1 and GOX2 play distinct metabolic roles. We further investigated their individual functions in single gox1-1 and gox2-1 mutants and revealed that their phenotypes can be modulated by environmental conditions that increase the metabolic flux through the photorespiratory pathway. High light negatively affected the photosynthetic performance and growth of both gox1-1 and gox2-1 mutants, but the negative consequences of severe photorespiration were more pronounced in the absence of GOX1, which was accompanied with lesser ability to process glycolate. Taken together, our results point toward divergent functions of the two photorespiratory GOX isoforms in Arabidopsis and contribute to a better understanding of the photorespiratory pathway.

Other publications

  1. Li, Z., Zhang, Z., Yan, P., Huang, S, Fei, Z., & Lin, K. (2011). RNA-Seq improves annotation of protein-coding genes in the cucumber genome. BMC GENOMICS, 12(1), 540.