Xiao Ma

Xiao Ma — PhD student
Joined the group in 2019

Seagrasses (Alismatales) colonized the sea on at least three independent occasions to form the basis of one of the most productive and widespread coastal ecosystems. They are a crucial functional group along the coasts of all continents except Antarctica and are the only fully marine angiosperms, playing a crucial role in the functioning of coastal marine ecosystems and global carbon sequestration. To my knowledge, only the Zostera marina genome is available. It lost all of its genes involved in stomatal differentiation and also lost entire pathways encoding volatiles synthesis and sensing in relation to its marine environment. Moreover, Zostera has also regained functions enabling them to adjust to full salinity. We have sequenced more seagrasses from different lineages to reveal new insights into the genomic losses and gains involved in achieving the structural and physiological adaptations required for its marine lifestyle and their convergence evolution.

Ghent Unversity
September 2019 - present: PhD student, Bioinformatics & Evolutionary Genomics, Department of Plant Systems Biology, VIB, Gent, Belgium.
Annoroad Gene Technology
July 2018 - May 2019: Bioinformatician, the key Responsibilities is genome assembly and annotation.
Institute of Botany, Chinese Academy of Sciences
September 2014 - June 2018: Master of Science in Biology, State Key Laboratory of Systematic and Evolutionary Botany.
Northwest A&F University
September 2010 - June 2014: Bachelor of Science in Biology, College of Life Science.

Publications

  1. Xu, Z., Li, Z., Ren, F., Gao, R., Wang, Z., Zhang, J., … Song, J. (2022). The genome of            Corydalis            reveals the evolution of benzylisoquinoline alkaloid biosynthesis in Ranunculales. The Plant Journal. https://doi.org/10.1111/tpj.15788
    Species belonging to the order Ranunculales have attracted much attention because of their phylogenetic position as a sister group to all other eudicot lineages and their ability to produce unique yet diverse benzylisoquinoline alkaloids (BIAs). The Papaveraceae family in Ranunculales is often used as a model system for studying BIA biosynthesis. Here, we report the chromosome-level genome assembly of Corydalis tomentella, a species of Fumarioideae-one of the two subfamilies of Papaveraceae. Based on the comparisons of sequenced Ranunculalean species, we present clear evidence of a shared whole-genome duplication (WGD) event that has occurred before the divergence of Ranunculales but after its divergence from other eudicot lineages. The C. tomentella genome enabled us to integrate isotopic labelling and comparative genomics to reconstruct the BIA biosynthetic pathway for both sanguinarine biosynthesis shared by papaveraceous species and the cavidine biosynthesis specific to Corydalis. Also, our comparative analysis revealed that gene duplications, especially tandem gene duplications, underlie the diversification of BIA biosynthetic pathways in Ranunculales. In particular, tandemly duplicated berberine bridge enzyme-like genes appear to be involved in cavidine biosynthesis. In conclusion, our study of the C. tomentella genome provides important insights into the occurrence of WGDs during the early evolution of eudicots as well as into the evolution of BIA biosynthesis in Ranunculales.
  2. Wang, X., Chen, S., Ma, X., Yssel, A. E. J., Chaluvadi, S. R., Johnson, M. S., … Van Deynze, A. (2021). Genome sequence and genetic diversity analysis of an under-domesticated orphan crop, white fonio (Digitaria exilis). GIGASCIENCE, 10(3). https://doi.org/10.1093/gigascience/giab013
    Background: Digitaria exilis, white fonio, is a minor but vital crop of West Africa that is valued for its resilience in hot, dry, and low-fertility environments and for the exceptional quality of its grain for human nutrition. Its success is hindered, however, by a low degree of plant breeding and improvement. Findings: We sequenced the fonio genome with long-read SMRT-cell technology, yielding a similar to 761 Mb assembly in 3,329 contigs (N50, 1.73 Mb; L50, 126). The assembly approaches a high level of completion, with a BUSCO score of >99%. The fonio genome was found to be a tetraploid, with most of the genome retained as homoeologous duplications that differ overall by similar to 4.3%, neglecting indels. The 2 genomes within fonio were found to have begun their independent divergence similar to 3.1 million years ago. The repeat content (>49%) is fairly standard for a grass genome of this size, but the ratio of Gypsy to Copia long terminal repeat retrotransposons (similar to 6.7) was found to be exceptionally high. Several genes related to future improvement of the crop were identified including shattering, plant height, and grain size. Analysis of fonio population genetics, primarily in Mali, indicated that the crop has extensive genetic diversity that is largely partitioned across a north-south gradient coinciding with the Sahel and Sudan grassland domains. Conclusions: We provide a high-quality assembly, annotation, and diversity analysis for a vital African crop. The availability of this information should empower future research into further domestication and improvement of fonio.
  3. Ma, X., Vaistij, F. E., Li, Y., Jansen van Rensburg, W. S., Harvey, S., Bairu, M. W., … Denby, K. J. (2021). A chromosome‐level Amaranthus cruentus genome assembly highlights gene family evolution and biosynthetic gene clusters that may underpin the nutritional value of this traditional crop. PLANT JOURNAL, 107(2), 613–628. https://doi.org/10.1111/tpj.15298
    Traditional crops historically provided accessible and affordable nutrition to millions of rural dwellers but have been neglected, with most modern agricultural systems over reliant on a small number of internationally-traded crops. Traditional crops are typically well-adapted to local agro-ecological conditions and many are nutrient-dense. They can play a vital role in local food systems through enhanced nutrition (especially where diets are dominated by starch crops), food security and livelihoods for smallholder farmers, and a climate-resilient and biodiverse agriculture. Using short-read, long-read and phased sequencing technologies we generated a high-quality chromosome-level genome assembly for Amaranthus cruentus, an under-researched crop with micronutrient- and protein-rich leaves and gluten-free seed, but lacking improved varieties, with respect to productivity and quality traits. The 370.9 MB genome demonstrates a shared whole genome duplication with a related species, Amaranthus hypochondriacus. Comparative genome analysis indicates chromosomal loss and fusion events following genome duplication that are common to both species, as well as fission of chromosome 2 in A. cruentus alone, giving rise to a haploid chromosome number of 17 (versus 16 in A. hypochondriacus). Genomic features potentially underlying the nutritional value of this crop include two A. cruentus-specific genes with a likely role in phytic acid synthesis (an anti-nutrient), expansion of ion transporter gene families, and identification of biosynthetic gene clusters conserved within the amaranth lineage. The A. cruentus genome assembly will underpin much-needed research and global breeding efforts to develop improved varieties for economically viable cultivation and realisation of the benefits to global nutrition security and agrobiodiversity.
  4. Ma, X., Olsen, J. L., Reusch, T. B. H., Procaccini, G., Kudrna, D., Williams, M., … Van de Peer, Y. (2021). Improved chromosome-level genome assembly and annotation of the seagrass, Zostera marina (eelgrass). F1000RESEARCH, 10. https://doi.org/10.12688/f1000research.38156.1
    Background: Seagrasses (Alismatales) are the only fully marine angiosperms. Zostera marina (eelgrass) plays a crucial role in the functioning of coastal marine ecosystems and global carbon sequestration. It is the most widely studied seagrass and has become a marine model system for exploring adaptation under rapid climate change. The original draft genome (v.1.0) of the seagrass Z. marina (L.) was based on a combination of Illumina mate-pair libraries and fosmid-ends. A total of 25.55 Gb of Illumina and 0.14 Gb of Sanger sequence was obtained representing 47.7× genomic coverage. The assembly resulted in ~2000 unordered scaffolds (L50 of 486 Kb), a final genome assembly size of 203MB, 20,450 protein coding genes and 63% TE content. Here, we present an upgraded chromosome-scale genome assembly and compare v.1.0 and the new v.3.1, reconfirming previous results from Olsen et al. (2016), as well as pointing out new findings. Methods: The same high molecular weight DNA used in the original sequencing of the Finnish clone was used. A high-quality reference genome was assembled with the MECAT assembly pipeline combining PacBio long-read sequencing and Hi-C scaffolding. Results: In total, 75.97 Gb PacBio data was produced. The final assembly comprises six pseudo-chromosomes and 304 unanchored scaffolds with a total length of 260.5Mb and an N50 of 34.6 MB, showing high contiguity and few gaps (~0.5%). 21,483 protein-encoding genes are annotated in this assembly, of which 20,665 (96.2%) obtained at least one functional assignment based on similarity to known proteins. Conclusions: As an important marine angiosperm, the improved Z. marina genome assembly will further assist evolutionary, ecological, and comparative genomics at the chromosome level. The new genome assembly will further our understanding into the structural and physiological adaptations from land to marine life.
  5. Hale, I., Ma, X., Melo, A. T. O., Padi, F. K., Hendre, P. S., Kingan, S. B., … Van Deynze, A. (2021). Genomic resources to guide improvement of the shea tree. FRONTIERS IN PLANT SCIENCE, 12. https://doi.org/10.3389/fpls.2021.720670
    A defining component of agroforestry parklands across Sahelo-Sudanian Africa (SSA), the shea tree (Vitellaria paradoxa) is central to sustaining local livelihoods and the farming environments of rural communities. Despite its economic and cultural value, however, not to mention the ecological roles it plays as a dominant parkland species, shea remains semi-domesticated with virtually no history of systematic genetic improvement. In truth, shea's extended juvenile period makes traditional breeding approaches untenable; but the opportunity for genome-assisted breeding is immense, provided the foundational resources are available. Here we report the development and public release of such resources. Using the FALCON-Phase workflow, 162.6 Gb of long-read PacBio sequence data were assembled into a 658.7 Mbp, chromosome-scale reference genome annotated with 38,505 coding genes. Whole genome duplication (WGD) analysis based on this gene space revealed clear signatures of two ancient WGD events in shea's evolutionary past, one prior to the Astrid-Rosid divergence (116-126 Mya) and the other at the root of the order Ericales (65-90 Mya). In a first genome-wide look at the suite of fatty acid (FA) biosynthesis genes that likely govern stearin content, the primary determinant of shea butter quality, relatively high copy numbers of six key enzymes were found (KASI, KASIII, FATB, FAD2, FAD3, and FAX2), some likely originating in shea's more recent WGD event. To help translate these findings into practical tools for characterization, selection, and genome-wide association studies (GWAS), resequencing data from a shea diversity panel was used to develop a database of more than 3.5 million functionally annotated, physically anchored SNPs. Two smaller, more curated sets of suggested SNPs, one for GWAS (104,211 SNPs) and the other targeting FA biosynthesis genes (90 SNPs), are also presented. With these resources, the hope is to support national programs across the shea belt in the strategic, genome-enabled conservation and long-term improvement of the shea tree for SSA
  6. Tien, N. Q. D., Ma, X., Man, L. Q., Chi, D. T. K., Huy, N. X., Nhut, D.-T., … Loc, N. H. (2021). De novo whole-genome assembly and discovery of genes involved in triterpenoid saponin biosynthesis of Vietnamese ginseng (Panax vietnamensis Ha et Grushv.). PHYSIOLOGY AND MOLECULAR BIOLOGY OF PLANTS, 27(10), 2215–2229. https://doi.org/10.1007/s12298-021-01076-1
    Vietnamese ginseng (Panax vietnamensis Ha et Grushv.), also known as Ngoc Linh ginseng, is a high-value herb in Vietnam. Vietnamese ginseng has been proven to be effective in enhancing the immune system, human memory, anti-stress, anti-inflammatory, anti-cancer, and prevent aging. The present study reports the first draft whole-genome of Vietnamese ginseng and the identification of potential genes involved in the triterpenoid metabolic pathway. De novo whole-genome assembly was performed successfully from a data of approximately 139 Gbps of 394,802,120 high quality reads to generate 9815 scaffolds with an N50 value of 572,722 bp from the leaf of Vietnamese ginseng. The assembled genome of Vietnamese ginseng is 3,001,967,204 bp long containing 79,374 gene models. Among them, there are 55,012 genes (69.30%) were annotated by various public molecular biology databases. The potential genes involved in triterpenoid saponin biosynthesis in Vietnamese ginseng and their metabolic pathway were also predicted." Three genes encoding squalene monooxygenase isozymes in Vietnamese ginseng were cloned, sequenced and characterized. Moreover, expression levels of several key genes involved in terpenoid biosynthesis in different parts of Vietnamese ginseng were also analyzed. The SSR markers were detected by various programs from both of assembly full dataset of Vietnamese ginseng genome and predicted genes. The present work provided important data of the draft whole-genome of Vietnamese ginseng for further studies to understand the role of genes involved in ginsenoside biosynthesis and their metabolic pathway at the molecular level of this rare medicinal species.