Xiao Ma

Xiao Ma — PhD student
Joined the group in 2019

Ghent Unversity
September 2019 - present: PhD student, Bioinformatics & Evolutionary Genomics, Department of Plant Systems Biology, VIB, Gent, Belgium.
Annoroad Gene Technology
July 2018 - May 2019: Bioinformatician, the key Responsibilities is genome assembly and annotation.
Institute of Botany, Chinese Academy of Sciences
September 2014 - June 2018: Master of Science in Biology, State Key Laboratory of Systematic and Evolutionary Botany.
Northwest A&F University
September 2010 - June 2014: Bachelor of Science in Biology, College of Life Science.


  1. Wang, X., Chen, S., Ma, X., Yssel, A. E. J., Chaluvadi, S. R., Johnson, M. S., … Van Deynze, A. (2021). Genome sequence and genetic diversity analysis of an under-domesticated orphan crop, white fonio (Digitaria exilis). GIGASCIENCE, 10(3). https://doi.org/10.1093/gigascience/giab013
    Background: Digitaria exilis, white fonio, is a minor but vital crop of West Africa that is valued for its resilience in hot, dry, and low-fertility environments and for the exceptional quality of its grain for human nutrition. Its success is hindered, however, by a low degree of plant breeding and improvement. Findings: We sequenced the fonio genome with long-read SMRT-cell technology, yielding a similar to 761 Mb assembly in 3,329 contigs (N50, 1.73 Mb; L50, 126). The assembly approaches a high level of completion, with a BUSCO score of >99%. The fonio genome was found to be a tetraploid, with most of the genome retained as homoeologous duplications that differ overall by similar to 4.3%, neglecting indels. The 2 genomes within fonio were found to have begun their independent divergence similar to 3.1 million years ago. The repeat content (>49%) is fairly standard for a grass genome of this size, but the ratio of Gypsy to Copia long terminal repeat retrotransposons (similar to 6.7) was found to be exceptionally high. Several genes related to future improvement of the crop were identified including shattering, plant height, and grain size. Analysis of fonio population genetics, primarily in Mali, indicated that the crop has extensive genetic diversity that is largely partitioned across a north-south gradient coinciding with the Sahel and Sudan grassland domains. Conclusions: We provide a high-quality assembly, annotation, and diversity analysis for a vital African crop. The availability of this information should empower future research into further domestication and improvement of fonio.
  2. Ma, X., Olsen, J. L., Reusch, T. B. H., Procaccini, G., Kudrna, D., Williams, M., … Van de Peer, Y. (2021). Improved chromosome-level genome assembly and annotation of the seagrass, Zostera marina (eelgrass). F1000RESEARCH, 10. https://doi.org/10.12688/f1000research.38156.1
    Background: Seagrasses (Alismatales) are the only fully marine angiosperms. Zostera marina (eelgrass) plays a crucial role in the functioning of coastal marine ecosystems and global carbon sequestration. It is the most widely studied seagrass and has become a marine model system for exploring adaptation under rapid climate change. The original draft genome (v.1.0) of the seagrass Z. marina (L.) was based on a combination of Illumina mate-pair libraries and fosmid-ends. A total of 25.55 Gb of Illumina and 0.14 Gb of Sanger sequence was obtained representing 47.7× genomic coverage. The assembly resulted in ~2000 unordered scaffolds (L50 of 486 Kb), a final genome assembly size of 203MB, 20,450 protein coding genes and 63% TE content. Here, we present an upgraded chromosome-scale genome assembly and compare v.1.0 and the new v.3.1, reconfirming previous results from Olsen et al. (2016), as well as pointing out new findings. Methods: The same high molecular weight DNA used in the original sequencing of the Finnish clone was used. A high-quality reference genome was assembled with the MECAT assembly pipeline combining PacBio long-read sequencing and Hi-C scaffolding. Results: In total, 75.97 Gb PacBio data was produced. The final assembly comprises six pseudo-chromosomes and 304 unanchored scaffolds with a total length of 260.5Mb and an N50 of 34.6 MB, showing high contiguity and few gaps (~0.5%). 21,483 protein-encoding genes are annotated in this assembly, of which 20,665 (96.2%) obtained at least one functional assignment based on similarity to known proteins. Conclusions: As an important marine angiosperm, the improved Z. marina genome assembly will further assist evolutionary, ecological, and comparative genomics at the chromosome level. The new genome assembly will further our understanding into the structural and physiological adaptations from land to marine life.
  3. Ma, X., Vaistij, F. E., Li, Y., Jansen van Rensburg, W. S., Harvey, S., Bairu, M. W., … Denby, K. J. (2021). A chromosome‐level Amaranthus cruentus genome assembly highlights gene family evolution and biosynthetic gene clusters that may underpin the nutritional value of this traditional crop. PLANT JOURNAL, 107(2), 613–628. https://doi.org/10.1111/tpj.15298
    Traditional crops historically provided accessible and affordable nutrition to millions of rural dwellers but have been neglected, with most modern agricultural systems over reliant on a small number of internationally-traded crops. Traditional crops are typically well-adapted to local agro-ecological conditions and many are nutrient-dense. They can play a vital role in local food systems through enhanced nutrition (especially where diets are dominated by starch crops), food security and livelihoods for smallholder farmers, and a climate-resilient and biodiverse agriculture. Using short-read, long-read and phased sequencing technologies we generated a high-quality chromosome-level genome assembly for Amaranthus cruentus, an under-researched crop with micronutrient- and protein-rich leaves and gluten-free seed, but lacking improved varieties, with respect to productivity and quality traits. The 370.9 MB genome demonstrates a shared whole genome duplication with a related species, Amaranthus hypochondriacus. Comparative genome analysis indicates chromosomal loss and fusion events following genome duplication that are common to both species, as well as fission of chromosome 2 in A. cruentus alone, giving rise to a haploid chromosome number of 17 (versus 16 in A. hypochondriacus). Genomic features potentially underlying the nutritional value of this crop include two A. cruentus-specific genes with a likely role in phytic acid synthesis (an anti-nutrient), expansion of ion transporter gene families, and identification of biosynthetic gene clusters conserved within the amaranth lineage. The A. cruentus genome assembly will underpin much-needed research and global breeding efforts to develop improved varieties for economically viable cultivation and realisation of the benefits to global nutrition security and agrobiodiversity.
  4. Hale, I., Ma, X., Melo, A. T. O., Padi, F. K., Hendre, P. S., Kingan, S. B., … Van Deynze, A. (2021). Genomic resources to guide improvement of the shea tree. FRONTIERS IN PLANT SCIENCE, 12. https://doi.org/10.3389/fpls.2021.720670
    A defining component of agroforestry parklands across Sahelo-Sudanian Africa (SSA), the shea tree (Vitellaria paradoxa) is central to sustaining local livelihoods and the farming environments of rural communities. Despite its economic and cultural value, however, not to mention the ecological roles it plays as a dominant parkland species, shea remains semi-domesticated with virtually no history of systematic genetic improvement. In truth, shea's extended juvenile period makes traditional breeding approaches untenable; but the opportunity for genome-assisted breeding is immense, provided the foundational resources are available. Here we report the development and public release of such resources. Using the FALCON-Phase workflow, 162.6 Gb of long-read PacBio sequence data were assembled into a 658.7 Mbp, chromosome-scale reference genome annotated with 38,505 coding genes. Whole genome duplication (WGD) analysis based on this gene space revealed clear signatures of two ancient WGD events in shea's evolutionary past, one prior to the Astrid-Rosid divergence (116-126 Mya) and the other at the root of the order Ericales (65-90 Mya). In a first genome-wide look at the suite of fatty acid (FA) biosynthesis genes that likely govern stearin content, the primary determinant of shea butter quality, relatively high copy numbers of six key enzymes were found (KASI, KASIII, FATB, FAD2, FAD3, and FAX2), some likely originating in shea's more recent WGD event. To help translate these findings into practical tools for characterization, selection, and genome-wide association studies (GWAS), resequencing data from a shea diversity panel was used to develop a database of more than 3.5 million functionally annotated, physically anchored SNPs. Two smaller, more curated sets of suggested SNPs, one for GWAS (104,211 SNPs) and the other targeting FA biosynthesis genes (90 SNPs), are also presented. With these resources, the hope is to support national programs across the shea belt in the strategic, genome-enabled conservation and long-term improvement of the shea tree for SSA