Zhen Li

Zhen Li — Postdoc
Joined the group in 2012

As a bioinformatician with a great passion for evolution, I would like to understand the mechanisms of evolution as well as apply the knowledge to solve practical issues when applicable. My passion for evolution has eventually landed on two primary research interests. The first is to discover how different plants have evolved and contributed to the extraordinary diversity of life on Earth by exploring novel plant genomes. I have joined several plant genome projects with different functional roles since my master's study. Such experience helped me develop various skills in annotating genomes and analyzing sequencing data. Also, these international genome projects urged me of the importance of communication in modern science and trained my scientific communication skills.

My other research interest is about polyploidy. Surprisingly, genomes of many, if not all, flowering plants illustrate the history of ancient polyploidization in their common ancestors. Therefore, understanding the effects and processes of polyploidization is essential to unveil its impacts on creating evolutionary novelties and increasing species adaptation. In a previous work, we used comparative genomics and phylogenomics to study the different fates of duplicated genes after genome duplication by integrating large omics datasets. We identified genes that are tolerant or intolerant to gene and genome duplications in flowering plants by comparing 37 angiosperm genomes. To further understand the underlying molecular mechanisms following polyploidization, natural and synthesized polyploids contain valuable information about both gene expression and natural selection on duplicated genes. Therefore, integrating the knowledge of gene duplicability and data from natural and synthesized polyploids is an innovative approach to investigate gene duplicate retention following polyploidization systematically.

May 2018 - present: Postdoctoral researcher, Bioinformatics & Evolutionary Genomics, Department of Plant Systems Biology, VIB, Gent, Belgium.
September 2012 - April 2018: PhD student, Bioinformatics & Evolutionary Genomics, Department of Plant Systems Biology, VIB, Gent, Belgium.
September 2008 - June 2011: Master of Science in Bioinformatics, Laboratory of Computational Molecular Biology, College of Life Sciences, Beijing Normal University
September 2004 - June 2008: Bachelor of Science in Biology, College of Life Sciences, Beijing Normal University

Publications

  1. Xue, J., Dong, S., Wang, M., Song, T., Zhou, G., Li, Z., … Hang, Y. (2021). Mitochondrial genes from 18 angiosperms fill sampling gaps for phylogenomic inferences of the early diversification of flowering plants. JOURNAL OF SYSTEMATICS AND EVOLUTION. https://doi.org/10.1111/jse.12708
    The early diversification of angiosperms is thought to have been a rapid process, which may complicate phylogenetic analyses of early angiosperm relationships. Plastid and nuclear phylogenomic studies have raised several conflicting hypotheses regarding overall angiosperm phylogeny, but mitochondrial genomes have been largely ignored as a relevant source of information. Here we sequenced mitochondrial genomes from 18 angiosperms to fill taxon‐sampling gaps in Austrobaileyales, magnoliids, Chloranthales, Ceratophyllales, and major lineages of eudicots and monocots. We assembled a data matrix of 38 mitochondrial genes from 107 taxa to assess how well mitochondrial genomic data address current uncertainties in angiosperm relationships. Although we recovered conflicting phylogenies based on different datasets and analytical methods, we also observed congruence regarding deep relationships of several major angiosperm lineages: Chloranthales were always inferred to be the sister group of Ceratophyllales, Austrobaileyales to mesangiosperms, and the unplaced Dilleniales was consistently resolved as the sister to superasterids. Substitutional saturation, GC‐compositional heterogeneity, and codon‐usage bias are possible reasons for the noise/conflict that may impact phylogenetic reconstruction; and angiosperm mitochondrial genes may not be substantially affected by these factors. The third codon positions of the mitochondrial genes appear to contain more parsimony‐informative sites than the first and second codon positions, and therefore produced better resolved phylogenetic relationships with generally strong support. The relationships among these major lineages remain incompletely resolved, perhaps as a result of the rapidity of early radiations. Nevertheless, data from mitochondrial genomes provides additional evidence and alternative hypotheses for exploring the early evolution and diversification of the angiosperms.
  2. Zhao, T., Zwaenepoel, A., Xue, J.-Y., Kao, S.-M., Li, Z., Schranz, M. E., & Van de Peer, Y. (2021). Whole-genome microsynteny-based phylogeny of angiosperms. NATURE COMMUNICATIONS, 12(1). https://doi.org/10.1038/s41467-021-23665-0
    Plant genomes vary greatly in size, organization, and architecture. Such structural differences may be highly relevant for inference of genome evolution dynamics and phylogeny. Indeed, microsynteny-the conservation of local gene content and order-is recognized as a valuable source of phylogenetic information, but its use for the inference of large phylogenies has been limited. Here, by combining synteny network analysis, matrix representation, and maximum likelihood phylogenetic inference, we provide a way to reconstruct phylogenies based on microsynteny information. Both simulations and use of empirical data sets show our method to be accurate, consistent, and widely applicable. As an example, we focus on the analysis of a large-scale whole-genome data set for angiosperms, including more than 120 available high-quality genomes, representing more than 50 different plant families and 30 orders. Our 'microsynteny-based' tree is largely congruent with phylogenies proposed based on more traditional sequence alignment-based methods and current phylogenetic classifications but differs for some long-contested and controversial relationships. For instance, our synteny-based tree finds Vitales as early diverging eudicots, Saxifragales within superasterids, and magnoliids as sister to monocots. We discuss how synteny-based phylogenetic inference can complement traditional methods and could provide additional insights into some long-standing controversial phylogenetic relationships. Molecular phylogenies are traditionally based on sequence variation, but genome rearrangements also contain phylogenetic information. Here, Zhao et al. develop an approach to reconstruct phylogenies based on microsynteny and illustrate it with a reconstruction of the angiosperm phylogeny.
  3. Wan, T., Liu, Z., Leitch, I. J., Xin, H., Maggs-Kölling, G., Gong, Y., … Wang, Q. (2021). The Welwitschia genome reveals a unique biology underpinning extreme longevity in deserts. Nature Communications, 12. https://doi.org/10.1038/s41467-021-24528-4
    AbstractThe gymnosperm Welwitschia mirabilis belongs to the ancient, enigmatic gnetophyte lineage. It is a unique desert plant with extreme longevity and two ever-elongating leaves. We present a chromosome-level assembly of its genome (6.8 Gb/1 C) together with methylome and transcriptome data to explore its astonishing biology. We also present a refined, high-quality assembly of Gnetum montanum to enhance our understanding of gnetophyte genome evolution. The Welwitschia genome has been shaped by a lineage-specific ancient, whole genome duplication (~86 million years ago) and more recently (1-2 million years) by bursts of retrotransposon activity. High levels of cytosine methylation (particularly at CHH motifs) are associated with retrotransposons, whilst long-term deamination has resulted in an exceptionally GC-poor genome. Changes in copy number and/or expression of gene families and transcription factors (e.g. R2R3MYB, SAUR) controlling cell growth, differentiation and metabolism underpin the plant’s longevity and tolerance to temperature, nutrient and water stress.
  4. Cao, Y.-L., Li, Y., Fan, Y.-F., Li, Z., Yoshida, K., Wang, J.-Y., … Liu, Z.-J. (2021). Wolfberry genomes and the evolution of Lycium (Solanaceae). COMMUNICATIONS BIOLOGY, 4(1). https://doi.org/10.1038/s42003-021-02152-8
    Wolfberry Lycium, an economically important genus of the Solanaceae family, contains approximately 80 species and shows a fragmented distribution pattern among the Northern and Southern Hemispheres. Although several herbaceous species of Solanaceae have been subjected to genome sequencing, thus far, no genome sequences of woody representatives have been available. Here, we sequenced the genomes of 13 perennial woody species of Lycium, with a focus on Lycium barbarum. Integration with other genomes provides clear evidence supporting a whole-genome triplication (WGT) event shared by all hitherto sequenced solanaceous plants, which occurred shortly after the divergence of Solanaceae and Convolvulaceae. We identified new gene families and gene family expansions and contractions that first appeared in Solanaceae. Based on the identification of self-incompatibility related-gene families, we inferred that hybridization hotspots are enriched for genes that might be functioning in gametophytic self-incompatibility pathways in wolfberry. Extremely low expression of LOCULE NUBER (LC) and COLORLESS NON-RIPENING (CNR) orthologous genes during Lycium fruit development and ripening processes suggests functional diversification of these two genes between Lycium and tomato. The existence of additional flowering locus C-like MADS-box genes might correlate with the perennial flowering cycle of Lycium. Differential gene expression involved in the lignin biosynthetic pathway between Lycium and tomato likely illustrates woody and herbaceous differentiation. We also provide evidence that Lycium migrated from Africa into Asia, and subsequently from Asia into North America. Our results provide functional insights into Solanaceae origins, evolution and diversification. Cao, Li, et al. sequence 13 perennial woody plant species of Lycium, and specifically provide a draft assembly of L. ruthenicum and a chromosome-level assembly of L. barbarum, the wolfberry or Goji berry. From a phylogenetic tree the authors identify an ancient hexaploidization event, and report the evolution of gene families including fruit ripening, fruit coloration, polysaccharide synthesis and self-incompatibility within Solanaceae and the general biogeography of L. barbarum.
  5. Li, L., Wang, S., Wang, H., Sahu, S. K., Marin, B., Li, H., … Liu, H. (2020). The genome of Prasinoderma coloniale unveils the existence of a third phylum within green plants. NATURE ECOLOGY & EVOLUTION, 4(9), 1220–1231. https://doi.org/10.1038/s41559-020-1221-7
    Genome analysis of the pico-eukaryotic marine green algaPrasinoderma colonialeCCMP 1413 unveils the existence of a novel phylum within green plants (Viridiplantae), the Prasinodermophyta, which diverged before the split of Chlorophyta and Streptophyta. Structural features of the genome and gene family comparisons revealed an intermediate position of theP. colonialegenome (25.3 Mb) between the extremely compact, small genomes of picoplanktonic Mamiellophyceae (Chlorophyta) and the larger, more complex genomes of early-diverging streptophyte algae. Reconstruction of the minimal core genome of Viridiplantae allowed identification of an ancestral toolkit of transcription factors and flagellar proteins. Adaptations ofP. colonialeto its deep-water, oligotrophic environment involved expansion of light-harvesting proteins, reduction of early light-induced proteins, evolution of a distinct type of C(4)photosynthesis and carbon-concentrating mechanism, synthesis of the metal-complexing metabolite picolinic acid, and vitamin B-1, B(7)and B(12)auxotrophy. TheP. colonialegenome provides first insights into the dawn of green plant evolution. Genome analysis of the pico-eukaryotic marine green algaPrasinoderma colonialeCCMP 1413 unveils the existence of a novel phylum within green plants (Viridiplantae), the Prasinodermophyta, which diverged before the split of Chlorophyta and Streptophyta.
  6. Zhang, L., Chen, F., Zhang, X., Li, Z., Zhao, Y., Lohaus, R., … Tang, H. (2020). The water lily genome and the early evolution of flowering plants. NATURE, 577(7788), 79–84. https://doi.org/10.1038/s41586-019-1852-5
    Water lilies belong to the angiosperm order Nymphaeales. Amborellales, Nymphaeales and Austrobaileyales together form the so-called ANA-grade of angiosperms, which are extant representatives of lineages that diverged the earliest from the lineage leading to the extant mesangiosperms1,2,3. Here we report the 409-megabase genome sequence of the blue-petal water lily (Nymphaea colorata). Our phylogenomic analyses support Amborellales and Nymphaeales as successive sister lineages to all other extant angiosperms. The N. colorata genome and 19 other water lily transcriptomes reveal a Nymphaealean whole-genome duplication event, which is shared by Nymphaeaceae and possibly Cabombaceae. Among the genes retained from this whole-genome duplication are homologues of genes that regulate flowering transition and flower development. The broad expression of homologues of floral ABCE genes in N. colorata might support a similarly broadly active ancestral ABCE model of floral organ determination in early angiosperms. Water lilies have evolved attractive floral scents and colours, which are features shared with mesangiosperms, and we identified their putative biosynthetic genes in N. colorata. The chemical compounds and biosynthetic genes behind floral scents suggest that they have evolved in parallel to those in mesangiosperms. Because of its unique phylogenetic position, the N. colorata genome sheds light on the early evolution of angiosperms.
  7. Chen, Y.-C., Li, Z., Zhao, Y.-X., Gao, M., Wang, J.-Y., Liu, K.-W., … Wang, Y.-D. (2020). The Litsea genome and the evolution of the laurel family. NATURE COMMUNICATIONS, 11. https://doi.org/10.1038/s41467-020-15493-5
    The laurel family within the Magnoliids has attracted attentions owing to its scents, variable inflorescences, and controversial phylogenetic position. Here, we present a chromosome-level assembly of the Litsea cubeba genome, together with low-coverage genomic and transcriptomic data for many other Lauraceae. Phylogenomic analyses show phylogenetic discordance at the position of Magnoliids, suggesting incomplete lineage sorting during the divergence of monocots, eudicots, and Magnoliids. An ancient whole-genome duplication (WGD) event occurred just before the divergence of Laurales and Magnoliales; subsequently, independent WGDs occurred almost simultaneously in the three Lauralean lineages. The phylogenetic relationships within Lauraceae correspond to the divergence of inflorescences, as evidenced by the phylogeny of FUWA, a conserved gene involved in determining panicle architecture in Lauraceae. Monoterpene synthases responsible for production of specific volatile compounds in Lauraceae are functionally verified. Our work sheds light on the evolution of the Lauraceae, the genetic basis for floral evolution and specific scents.
  8. Pu, X., Li, Z., Tian, Y., Gao, R., Hao, L., Hu, Y., … Song, J. (2020). The honeysuckle genome provides insight into the molecular mechanism of carotenoid metabolism underlying dynamic flower coloration. NEW PHYTOLOGIST, 227(3), 930–943. https://doi.org/10.1111/nph.16552
    Lonicera japonica is a wide-spread member of the Caprifoliaceae (honeysuckle) family utilized in traditional medical practices. This twining vine honeysuckle is also a much-sought ornamental, in part due to its dynamic flower coloration, which changes from white to gold during development. The molecular mechanism underlying dynamic flower coloration in L. japonica was elucidated by integrating whole genome sequencing, transcriptomic analysis, and biochemical assays. Here, we report a chromosome-level genome assembly of L. japonica, comprising nine pseudo-chromosomes with a total size of 843.2 Mb. We also provide evidence for a whole genome duplication event in the lineage leading to L. japonica, which occurred after its divergence from Dipsacales and Asterales. Moreover, gene expression analysis not only revealed correlated expression of the relevant biosynthetic genes with carotenoid accumulation, but also suggested a role for carotenoid degradation in L. japonica's dynamic flower coloration. The variation of flower color is consistent with not only the observed carotenoid accumulation pattern, but also with the release of volatile apocarotenoids that presumably serve as pollinator attractants. Beyond novel insights into the evolution and dynamics of flower coloration, the high-quality L. japonica genome sequence also provides a foundation for molecular breeding to improve desired characteristics.
  9. Verlinden, H., Sterck, L., Li, J., Li, Z., Yssel, A., Gansemans, Y., … Vanden Broeck, J. (2020). First draft genome assembly of the desert locust, Schistocerca gregaria. F1000RESEARCH, 9. https://doi.org/10.12688/f1000research.25148.1
    Background: At the time of publication, the most devastating desert locust crisis in decades is affecting East Africa, the Arabian Peninsula and South-West Asia. The situation is extremely alarming in East Africa, where Kenya, Ethiopia and Somalia face an unprecedented threat to food security and livelihoods. Most of the time, however, locusts do not occur in swarms, but live as relatively harmless solitary insects. The phenotypically distinct solitarious and gregarious locust phases differ markedly in many aspects of behaviour, physiology and morphology, making them an excellent model to study how environmental factors shape behaviour and development. A better understanding of the extreme phenotypic plasticity in desert locusts will offer new, more environmentally sustainable ways of fighting devastating swarms. Methods: High molecular weight DNA derived from two adult males was used for Mate Pair and Paired End Illumina sequencing and PacBio sequencing. A reliable reference genome of Schistocerca gregaria was assembled using the ABySS pipeline, scaffolding was improved using LINKS. Results: In total, 1,316 Gb Illumina reads and 112 Gb PacBio reads were produced and assembled. The resulting draft genome consists of 8,817,834,205 bp organised in 955,015 scaffolds with an N50 of 157,705 bp, making the desert locust genome the largest insect genome sequenced and assembled to date. In total, 18,815 protein-encoding genes are predicted in the desert locust genome, of which 13,646 (72.53%) obtained at least one functional assignment based on similarity to known proteins. Conclusions: The desert locust genome data will contribute greatly to studies of phenotypic plasticity, physiology, neurobiology, molecular ecology, evolutionary genetics and comparative genomics, and will promote the desert locust’s use as a model system. The data will also facilitate the development of novel, more sustainable strategies for preventing or combating swarms of these infamous insects.
  10. de María, N., Guevara, M. Á., Perdiguero, P., Vélez, M. D., Cabezas, J. A., López‐Hinojosa, M., … Cervera, M. T. (2020). Molecular study of drought response in the Mediterranean conifer Pinus Pinaster Ait. : differential transcriptomic profiling reveals constitutive water deficit‐independent drought tolerance mechanisms. ECOLOGY AND EVOLUTION, 10(18), 9788–9807. https://doi.org/10.1002/ece3.6613
    Adaptation of long‐living forest trees to respond to environmental changes is essential to secure their performance under adverse conditions. Water deficit is one of the most significant stress factors determining tree growth and survival. Maritime pine (Pinus pinaster Ait.), the main source of softwood in southwestern Europe, is subjected to recurrent drought periods which, according to climate change predictions for the years to come, will progressively increase in the Mediterranean region. The mechanisms regulating pine adaptive responses to environment are still largely unknown. The aim of this work was to go a step further in understanding the molecular mechanisms underlying maritime pine response to water stress and drought tolerance at the whole plant level. A global transcriptomic profiling of roots, stems, and needles was conducted to analyze the performance of siblings showing contrasted responses to water deficit from an ad hoc designed full‐sib family. Although P. pinaster is considered a recalcitrant species for vegetative propagation in adult phase, the analysis was conducted using vegetatively propagated trees exposed to two treatments: well‐watered and moderate water stress. The comparative analyses led us to identify organ‐specific genes, constitutively expressed as well as differentially expressed when comparing control versus water stress conditions, in drought‐sensitive and drought‐tolerant genotypes. Different response strategies can point out, with tolerant individuals being pre‐adapted for coping with drought by constitutively expressing stress‐related genes that are detected only in latter stages on sensitive individuals subjected to drought.
  11. Tyrmi, J. S., Vuosku, J., Acosta, J. J., Li, Z., Sterck, L., Cervera, M. T., … Pyhäjärvi, T. (2020). Genomics of clinal local adaptation in Pinus sylvestris under continuous environmental and spatial genetic setting. G3-GENES GENOMES GENETICS, 10(8), 2683–2696. https://doi.org/10.1534/g3.120.401285
    Understanding the consequences of local adaptation at the genomic diversity is a central goal in evolutionary genetics of natural populations. In species with large continuous geographical distributions the phenotypic signal of local adaptation is frequently clear, but the genetic basis often remains elusive. We examined the patterns of genetic diversity inPinus sylvestris, a keystone species in many Eurasian ecosystems with a huge distribution range and decades of forestry research showing that it is locally adapted to the vast range of environmental conditions. MakingP. sylvestrisan even more attractive subject of local adaptation study, population structure has been shown to be weak previously and in this study. However, little is known about the molecular genetic basis of adaptation, as the massive size of gymnosperm genomes has prevented large scale genomic surveys. We generated a both geographically and genomically extensive dataset using a targeted sequencing approach. By applying divergence-based and landscape genomics methods we identified several loci contributing to local adaptation, but only few with large allele frequency changes across latitude. We also discovered a very large (ca. 300 Mbp) putative inversion potentially under selection, which to our knowledge is the first such discovery in conifers. Our results call for more detailed analysis of structural variation in relation to genomic basis of local adaptation, emphasize the lack of large effect loci contributing to local adaptation in the coding regions and thus point out the need for more attention toward multi-locus analysis of polygenic adaptation.
  12. Li, Z., & Van de Peer, Y. (2020). ’Winter is coming’ : how did polyploid plants survive? MOLECULAR PLANT. https://doi.org/10.1016/j.molp.2019.12.003
  13. Roodt, D., Li, Z., Van de Peer, Y., & Mizrachi, E. (2019). Loss of wood formation genes in monocot genomes. GENOME BIOLOGY AND EVOLUTION, 11(7), 1986–1996.
    Woodiness (secondary xylem derived from vascular cambium) has been gained and lost multiple times in the angiosperms, but has been lost ancestrally in all monocots. Here, we investigate the conservation of genes involved in xylogenesis in fully sequenced angiosperm genomes, hypothesising that monocots have lost some essential orthologs involved in this process. We analysed the conservation of genes preferentially expressed in the developing secondary xylem of two eudicot trees in the sequenced genomes of 26 eudicot and seven monocot species, and the early-diverging angiosperm Amborella trichopoda. We also reconstructed a regulatory model of early vascular cambial cell identity and differentiation and investigated the conservation of orthologs across the angiosperms. Additionally, we analysed the genome of the aquatic seagrass Zostera marina for additional losses of genes otherwise essential to, especially, secondary cell wall formation. Despite almost complete conservation of orthology within the early cambial differentiation gene network, we show a clear pattern of loss of genes preferentially expressed in secondary xylem in the monocots that are highly conserved across eudicot species. Our study provides candidate genes that may have led to the loss of vascular cambium in the monocots, and, by comparing terrestrial angiosperms to an aquatic monocot, highlights genes essential to vasculature on land.
  14. Zwaenepoel, A., Li, Z., Lohaus, R., & Van de Peer, Y. (2019). Finding evidence for whole genome duplications : a reappraisal. MOLECULAR PLANT, 12(2), 133–136.
  15. Li, Z. (2018). The study of plant genome evolution by means of phylogenomics. Ghent University. Faculty of Sciences, Ghent, Belgium.
  16. Wan, T., Liu, Z.-M., Li, L.-F., Leitch, A. R., Leitch, I. J., Lohaus, R., … Wang, X.-M. (2018). A genome for gnetophytes and early evolution of seed plants. NATURE PLANTS, 4(2), 82–89. https://doi.org/10.1038/s41477-017-0097-2
    Gnetophytes are an enigmatic gymnosperm lineage comprising three genera, Gnetum, Welwitschia and Ephedra, which are morphologically distinct from all other seed plants. Their distinctiveness has triggered much debate as to their origin, evolution and phylogenetic placement among seed plants. To increase our understanding of the evolution of gnetophytes, and their relation to other seed plants, we report here a high-quality draft genome sequence for Gnetum montanum, the first for any gnetophyte. By using a novel genome assembly strategy to deal with high levels of heterozygosity, we assembled >4 Gb of sequence encoding 27,491 protein-coding genes. Comparative analysis of the G. montanum genome with other gymnosperm genomes unveiled some remarkable and distinctive genomic features, such as a diverse assemblage of retrotransposons with evidence for elevated frequencies of elimination rather than accumulation, considerable differences in intron architecture, including both length distribution and proportions of (retro) transposon elements, and distinctive patterns of proliferation of functional protein domains. Furthermore, a few gene families showed Gnetum-specific copy number expansions (for example, cellulose synthase) or contractions (for example, Late Embryogenesis Abundant protein), which could be connected with Gnetum's distinctive morphological innovations associated with their adaptation to warm, mesic environments. Overall, the G. montanum genome enables a better resolution of ancestral genomic features within seed plants, and the identification of genomic characters that distinguish Gnetum from other gymnosperms.
  17. De Smet, R., Sabaghian, E., Li, Z., Saeys, Y., & Van de Peer, Y. (2017). Coordinated functional divergence of genes after genome duplication in Arabidopsis thaliana. PLANT CELL, 29(11), 2786–2800. https://doi.org/10.1105/tpc.17.00531
    Gene and genome duplications have been rampant during the evolution of flowering plants. Unlike small-scale gene duplications, whole-genome duplications (WGDs) copy entire pathways or networks, and as such create the unique situation in which such duplicated pathways or networks could evolve novel functionality through the coordinated sub-or neofunctionalization of its constituent genes. Here, we describe a remarkable case of coordinated gene expression divergence following WGDs in Arabidopsis thaliana. We identified a set of 92 homoeologous gene pairs that all show a similar pattern of tissue-specific gene expression divergence following WGD, with one homoeolog showing predominant expression in aerial tissues and the other homoeolog showing biased expression in tip-growth tissues. We provide evidence that this pattern of gene expression divergence seems to involve genes with a role in cell polarity and that likely function in the maintenance of cell wall integrity. Following WGD, many of these duplicated genes evolved separate functions through subfunctionalization in growth/development and stress response. Uncoupling these processes through genome duplications likely provided important adaptations with respect to growth and morphogenesis and defense against biotic and abiotic stress.
  18. Cañas, R. A., Li, Z., Pascual, M. B., Castro-Rodríguez, V., Ávila, C., Sterck, L., Van de Peer, Y., et al. (2017). The gene expression landscape of pine seedling tissues. PLANT JOURNAL, 91(6), 1064–1087.
    Conifers dominate vast regions of the Northern hemisphere. They are the main source of raw materials for timber industry as well as a wide range of biomaterials. Despite their inherent difficulties as experimental models for classical plant biology research, the technological advances in genomics research are enabling fundamental studies on these plants. The use of laser capture microdissection followed by transcriptomic analysis is a powerful tool for unravelling the molecular and functional organization of conifer tissues and specialized cells. In the present work, 14 different tissues from 1-month-old maritime pine (Pinus pinaster) seedlings have been isolated and their transcriptomes analysed. The results increased the sequence information and number of full-length transcripts from a previous reference transcriptome and added 39 841 new transcripts. In total, 2376 transcripts were ubiquitously expressed in all of the examined tissues. These transcripts could be considered the core 'housekeeping genes' in pine. The genes have been clustered in function to their expression profiles. This analysis reduced the number of profiles to 38, most of these defined by their expression in a unique tissue that is much higher than in the other tissues. The expression and localization data are accessible at ConGenIE.org (http://v22.popgenie.org/microdisection/). This study presents an overview of the gene expression distribution in different pine tissues, specifically highlighting the relationships between tissue gene expression and function. This transcriptome atlas is a valuable resource for functional genomics research in conifers.
  19. Zhang, G.-Q., Liu, K.-W., Li, Z., Lohaus, R., Hsiao, Y.-Y., Niu, S.-C., … Liu, Z.-J. (2017). The Apostasia genome and the evolution of orchids. NATURE, 549(7672), 379–383. https://doi.org/10.1038/nature23897
    Constituting approximately 10% of flowering plant species, orchids (Orchidaceae) display unique flower morphologies, possess an extraordinary diversity in lifestyle, and have successfully colonized almost every habitat on Earth(1-3). Here we report the draft genome sequence of Apostasia shenzhenica(4), a representative of one of two genera that form a sister lineage to the rest of the Orchidaceae, providing a reference for inferring the genome content and structure of the most recent common ancestor of all extant orchids and improving our understanding of their origins and evolution. In addition, we present transcriptome data for representatives of Vanilloideae, Cypripedioideae and Orchidoideae, and novel third-generation genome data for two species of Epidendroideae, covering all five orchid subfamilies. A. shenzhenica shows clear evidence of a whole-genome duplication, which is shared by all orchids and occurred shortly before their divergence. Comparisons between A. shenzhenica and other orchids and angiosperms also permitted the reconstruction of an ancestral orchid gene toolkit. We identify new gene families, gene family expansions and contractions, and changes within MADS-box gene classes, which control a diverse suite of developmental processes, during orchid evolution. This study sheds new light on the genetic mechanisms underpinning key orchid innovations, including the development of the labellum and gynostemium, pollinia, and seeds without endosperm, as well as the evolution of epiphytism; reveals relationships between the Orchidaceae subfamilies; and helps clarify the evolutionary history of orchids within the angiosperms.
  20. Causier, B., Li, Z., De Smet, R., Lloyd, J. P., Van de Peer, Y., & Davies, B. (2017). Conservation of nonsense-mediated mRNA decay complex components throughout eukaryotic evolution. SCIENTIFIC REPORTS, 7.
    Nonsense-mediated mRNA decay (NMD) is an essential eukaryotic process regulating transcript quality and abundance, and is involved in diverse processes including brain development and plant defenses. Although some of the NMD machinery is conserved between kingdoms, little is known about its evolution. Phosphorylation of the core NMD component UPF1 is critical for NMD and is regulated in mammals by the SURF complex (UPF1, SMG1 kinase, SMG8, SMG9 and eukaryotic release factors). However, since SMG1 is reportedly missing from the genomes of fungi and the plant Arabidopsis thaliana, it remains unclear how UPF1 is activated outside the metazoa. We used comparative genomics to determine the conservation of the NMD pathway across eukaryotic evolution. We show that SURF components are present in all major eukaryotic lineages, including fungi, suggesting that in addition to UPF1 and SMG1, SMG8 and SMG9 also existed in the last eukaryotic common ancestor, 1.8 billion years ago. However, despite the ancient origins of the SURF complex, we also found that SURF factors have been independently lost across the Eukarya, pointing to genetic buffering within the essential NMD pathway. We infer an ancient role for SURF in regulating UPF1, and the intriguing possibility of undiscovered NMD regulatory pathways.
  21. Tasdighian, S., Van Bel, M., Li, Z., Van de Peer, Y., Carretero-Paulet, L., & Maere, S. (2017). Reciprocally retained genes in the angiosperm lineage show the hallmarks of dosage balance sensitivity. PLANT CELL, 29(11), 2766–2785.
    In several organisms, particular functional categories of genes, such as regulatory and complex-forming genes, are preferentially retained after whole-genome multiplications but rarely duplicate through small-scale duplication, a pattern referred to as reciprocal retention. This peculiar duplication behavior is hypothesized to stem from constraints on the dosage balance between the genes concerned and their interaction context. However, the evidence for a relationship between reciprocal retention and dosage balance sensitivity remains fragmentary. Here, we identified which gene families are most strongly reciprocally retained in the angiosperm lineage and studied their functional and evolutionary characteristics. Reciprocally retained gene families exhibit stronger sequence divergence constraints and lower rates of functional and expression divergence than other gene families, suggesting that dosage balance sensitivity is a general characteristic of reciprocally retained genes. Gene families functioning in regulatory and signaling processes are much more strongly represented at the top of the reciprocal retention ranking than those functioning in multiprotein complexes, suggesting that regulatory imbalances may lead to stronger fitness effects than classical stoichiometric protein complex imbalances. Finally, reciprocally retained duplicates are often subject to dosage balance constraints for prolonged evolutionary times, which may have repercussions for the ease with which genome multiplications can engender evolutionary innovation.
  22. Li, Zhen, De La Torre, A. R., Sterck, L., Cánovas, F. M., Avila, C., Merino, I., Cabezas, J. A., et al. (2017). Single-copy genes as molecular markers for phylogenomic studies in seed plants. GENOME BIOLOGY AND EVOLUTION, 9(5), 1130–1147.
    Phylogenetic relationships among seed plant taxa, especially within the gymnosperms, remain contested. In contrast to angio-sperms, for which several genomic, transcriptomic and phylogenetic resources are available, there are few, if any, molecular markers that allow broad comparisons among gymnosperm species. With few gymnosperm genomes available, recently obtained transcriptomes in gymnosperms are a great addition to identifying single-copy gene families as molecular markers for phylogenomic analysis in seed plants. Taking advantage of an increasing number of available genomes and transcriptomes, we identified single-copy genes in a broad collection of seed plants and used these to infer phylogenetic relationships between major seed plant taxa. This study aims at extending the current phylogenetic toolkit for seed plants, assessing its ability for resolving seed plant phylogeny, and discussing potential factors affecting phylogenetic reconstruction. In total, we identified 3,072 single-copy genes in 31 gymnosperms and 2,156 single-copy genes in 34 angiosperms. All studied seed plants shared 1,469 single-copy genes, which are generally involved in functions like DNA metabolism, cell cycle, and photosynthesis. A selected set of 106 single-copy genes provided good resolution for the seed plant phylogeny except for gnetophytes. Although some of our analyses support a sister relationship between gnetophytes and other gymnosperms, phylogenetic trees from concatenated alignments without 3rd codon positions and amino acid alignments under the CAT + GTR model, support gnetophytes as a sister group to Pinaceae. Our phylogenomic analyses demonstrate that, in general, single-copy genes can uncover both recent and deep divergences of seed plant phylogeny.
  23. Unver, T., Wu, Z., Sterck, L., Turktas, M., Lohaus, R., Li, Z., Yang, M., et al. (2017). Genome of wild olive and the evolution of oil biosynthesis. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 114(44), E9413–E9422.
    Here we present the genome sequence and annotation of the wild olive tree (Olea europaea var. sylvestris), called oleaster, which is considered an ancestor of cultivated olive trees. More than 50,000 protein-coding genes were predicted, a majority of which could be anchored to 23 pseudochromosomes obtained through a newly constructed genetic map. The oleaster genome contains signatures of two Oleaceae lineage-specific paleopolyploidy events, dated at similar to 28 and similar to 59 Mya. These events contributed to the expansion and neo-functionalization of genes and gene families that play important roles in oil biosynthesis. The functional divergence of oil biosynthesis pathway genes, such as FAD2, SACPD, EAR, and ACPTE, following duplication, has been responsible for the differential accumulation of oleic and linoleic acids produced in olive compared with sesame, a closely related oil crop. Duplicated oleaster FAD2 genes are regulated by an siRNA derived from a transposable element-rich region, leading to suppressed levels of FAD2 gene expression. Additionally, neofunctionalization of members of the SACPD gene family has led to increased expression of SACPD2,3, 5, and 7, consequently resulting in an increased desaturation of steric acid. Taken together, decreased FAD2 expression and increased SACPD expression likely explain the accumulation of exceptionally high levels of oleic acid in olive. The oleaster genome thus provides important insights into the evolution of oil biosynthesis and will be a valuable resource for oil crop genomics.
  24. De La Torre, A. R., Li, Z., Van de Peer, Y., & Ingvarsson, P. K. (2017). Contrasting rates of molecular evolution and patterns of selection among gymnosperms and flowering plants. MOLECULAR BIOLOGY AND EVOLUTION, 34(6), 1363–1377.
    The majority of variation in rates of molecular evolution among seed plants remains both unexplored and unexplained. Although some attention has been given to flowering plants, reports of molecular evolutionary rates for their sister plant clade (gymnosperms) are scarce, and to our knowledge differences in molecular evolution among seed plant clades have never been tested in a phylogenetic framework. Angiosperms and gymnosperms differ in a number of features, of which contrasting reproductive biology, life spans, and population sizes are the most prominent. The highly conserved morphology of gymnosperms evidenced by similarity of extant species to fossil records and the high levels of macrosynteny at the genomic level have led scientists to believe that gymnosperms are slow-evolving plants, although some studies have offered contradictory results. Here, we used 31,968 nucleotide sites obtained from orthologous genes across a wide taxonomic sampling that includes representatives of most conifers, cycads, ginkgo, and many angiosperms with a sequenced genome. Our results suggest that angiosperms and gymnosperms differ considerably in their rates of molecular evolution per unit time, with gymnosperm rates being, on average, seven times lower than angiosperm species. Longer generation times and larger genome sizes are some of the factors explaining the slow rates of molecular evolution found in gymnosperms. In contrast to their slow rates of molecular evolution, gymnosperms possess higher substitution rate ratios than angiosperm taxa. Finally, our study suggests stronger and more efficient purifying and diversifying selection in gymnosperm than in angiosperm species, probably in relation to larger effective population sizes.
  25. Kerchev, P., Waszczak, C., Lewandowska, A., Willems, P., Shapiguzov, A., Li, Z., … Van Breusegem, F. (2016). Lack of GLYCOLATE OXIDASE1, but not GLYCOLATE OXIDASE2, attenuates the photorespiratory phenotype of CATALASE2-deficient Arabidopsis. PLANT PHYSIOLOGY, 171(3), 1704–1719.
    The genes coding for the core metabolic enzymes of the photorespiratory pathway that allows plants with C3-type photosynthesis to survive in an oxygen-rich atmosphere, have been largely discovered in genetic screens aimed to isolate mutants that are unviable under ambient air. As an exception, glycolate oxidase (GOX) mutants with a photorespiratory phenotype have not been described yet in C3 species. Using Arabidopsis (Arabidopsis thaliana) mutants lacking the peroxisomal CATALASE2 (cat2-2) that display stunted growth and cell death lesions under ambient air, we isolated a second-site loss-of-function mutation in GLYCOLATE OXIDASE1 (GOX1) that attenuated the photorespiratory phenotype of cat2-2. Interestingly, knocking out the nearly identical GOX2 in the cat2-2 background did not affect the photorespiratory phenotype, indicating that GOX1 and GOX2 play distinct metabolic roles. We further investigated their individual functions in single gox1-1 and gox2-1 mutants and revealed that their phenotypes can be modulated by environmental conditions that increase the metabolic flux through the photorespiratory pathway. High light negatively affected the photosynthetic performance and growth of both gox1-1 and gox2-1 mutants, but the negative consequences of severe photorespiration were more pronounced in the absence of GOX1, which was accompanied with lesser ability to process glycolate. Taken together, our results point toward divergent functions of the two photorespiratory GOX isoforms in Arabidopsis and contribute to a better understanding of the photorespiratory pathway.
  26. Li, Z., Defoort, J., Tasdighian, S., Maere, S., Van de Peer, Y., & De Smet, R. (2016). Gene duplicability of core genes is highly consistent across all angiosperms. PLANT CELL, 28(2), 326–344. https://doi.org/10.1105/tpc.15.00877
    Gene duplication is an important mechanism for adding to genomic novelty. Hence, which genes undergo duplication and are preserved following duplication is an important question. It has been observed that gene duplicability, or the ability of genes to be retained following duplication, is a nonrandom process, with certain genes being more amenable to survive duplication events than others. Primarily, gene essentiality and the type of duplication (small-scale versus large-scale) have been shown in different species to influence the (long-term) survival of novel genes. However, an overarching view of "gene duplicability" is lacking, mainly due to the fact that previous studies usually focused on individual species and did not account for the influence of genomic context and the time of duplication. Here, we present a large-scale study in which we investigated duplicate retention for 9178 gene families shared between 37 flowering plant species, referred to as angiosperm core gene families. For most gene families, we observe a strikingly consistent pattern of gene duplicability across species, with gene families being either primarily single-copy or multicopy in all species. An intermediate class contains gene families that are often retained in duplicate for periods extending to tens of millions of years after whole-genome duplication, but ultimately appear to be largely restored to singleton status, suggesting that these genes may be dosage balance sensitive. The distinction between single-copy and multicopy gene families is reflected in their functional annotation, with single-copy genes being mainly involved in the maintenance of genome stability and organelle function and multicopy genes in signaling, transport, and metabolism. The intermediate class was overrepresented in regulatory genes, further suggesting that these represent putative dosage-balance-sensitive genes.

Other publications

  1. Li, Z., Zhang, Z., Yan, P., Huang, S, Fei, Z., & Lin, K. (2011). RNA-Seq improves annotation of protein-coding genes in the cucumber genome. BMC GENOMICS, 12(1), 540.