STRUCTURAL GENES ENCODING F3’5’H IN 4D CHROMOSOME CONTROL THE BLUE GRAIN TRAIT IN WHEAT

Blue grained wheat contained a higher quantity of natural anthocyanin compounds while normal commercial wheat does not have. Though the genes related to several colours of wheat have been identified, the major genes in blue wheat are still unrevealed. Hence, combining the SNP mapping, and transcriptome analysis, pivotal genes regulating blue grain trait was identified. SNP genotyping was carried out in an F2 blue and white wheat population. The blue trait was controlled by a gene/locus located between two SNP markers of IWB 18525 and IWB16381 on the 4D chromosome. Comparative transcriptome analysis revealed that 40 structural differentially expressed genes (DEGs) related to anthocyanin biosynthesis had significant expression differences between blue and non-blue samples. Among them, 12 DEGs expressed only in blue samples while 2 DEGs were specific to blue wheat. Only two F3’5’H genes located in 4D (Traes_4DL_27C195FDE, Traes_4DL_5A3D8F519) were consistent with the location results performed in SNP genotyping. Further, F3’5’H is considered as the main enzyme for Delphinidin compounds, cause for blue colouration. Hence, two genes encoding F3’5’H in the 4D chromosome preferentially account for the blue pigmentation in wheat.

The blue wheat lines were usually developed from distant hybridization, in which alien chromosome or fragments were integrated into the wheat genome (Zeven 1991). Genes for the blue aleurone (Ba) colour have been identified in several wheat-related species such as Thinopirum ponticum (Ba1) (Keppenne and Baenziger 1990), Triticum monococcum (Ba2) (Dubcovsky et al. 1996), T. boeoticum (Ba2) (Singh et al. 2007) and Thinopirum bucranium (bathb) (Shen et al. 2013). Though the genes and expression related to the anthocyanin biosynthesis pathway in many plants have been identified in detail, the exact genes and expression of blue colour in wheat have not been detected and verified.
Red or purple wheat was controlled by MYB or bHLH transcription factors. It is still unknown if the blue seed colour is similarly controlled by transcription factors or differently. For the huge genome size and gene redundancy, map-based cloning and functional characterizing some genes related to the anthocyanin biosynthesis in wheat are complex and find it difficult to understand the molecular mechanism of the blue grain phenotype.
Genetic maps are useful for discovering, dissecting, and manipulating the genes responsible for simple and complex traits in crop plants (Tanksley et al. 1992). Single nucleotide polymorphisms (SNPs) are the most common type of genetic variation in the genome (Rafalski 2002). Accordingly, they are well-suited for genomics approaches such as genomic selection (Meuwissen et al. 2001) and association mapping (Myles et al. 2009). A newly discovered high-throughput SNP genotyping platform make a revolution in wheat genotyping by allowing large-scale utilization of genotypic data in a costeffective manner (Akhunov et al. 2009). Transcriptome analysis based on high throughput RNA sequencing technologies (RNA-seq) is also a newly identified powerful approach to be used for characterizing the whole profile of gene expression in a given sample (Wang et al. 2009;Wolf 2013).
In this study, using a high throughput SNP mapping, the candidate regions and related molecular markers for wheat blue grain trait were identified. The expression profiles of anthocyanin biosynthesis-related genes were investigated using transcriptome analysis between blue and white wheat at two different physiological stages of colour development and the final aim was to find out candidate genes responsible for the blue grained wheat.

MATERIALS AND METHODS Plant materials
An F 2 population, consisting of 101 individuals, developed from the blue wheat line ‗Zhiluowumai' crossed with a white wheat line ‗No.4045' was used as the SNP mapping population. For the transcriptome analysis, two blue grain wheat lines (‗Zhiluowumai' and ‗Xiaoyanwumai') and two white wheat lines (‗A14' and ‗Xiaoyan mutant') were selected as the plant materials. All the plants were grown in the experimental field of Northwest A & F University (Shaanxi, China) during the growing season of 2015. The altitude of the area was 525 m and the climate was semi-humid prone to semi-arid with a mean temperature of 13 0 C and average annual rainfall of 600 mm.
Colour development of wheat grains was visually observed during seed development in both blue and white lines used for transcriptome analysis. According to the observations of the blue colour deposition, sampling dates were determined (Fig.1). Hence, 10 days post-anthesis (dpa) and 20 dpa grains were used to extract RNA samples for transcriptome RNA-seq and Quantitative Real-time PCR (qRT-PCR) analyses. Note: D-10: ‗A14' at 10 dpa, D-20: ‗A14' at 20 dpa, G-10: ‗Xiaoyan mutant' at 10 dpa, G-20: ‗Xiaoyan mutant' at 20 dpa, E-10: ‗Zhiluowumai' at 10 dpa, E-20: ‗Zhiluowumai' at 20 dpa, F-10: ‗Xiaoyanwumai' at 10 dpa, F-20: ‗Xiaoyanwumai' at 20 dpa Genetic mapping of the blue grain gene Genomic DNA of ‗Zhiluowumai' and ‗No.4045' parental lines and 101 individuals of the F 2 population derived from the above two parents were extracted from the young leaves using the CTAB method and were used for the genotyping. Genotyping was carried out using the recently developed wheat 90k iSelect SNP array comprised of nearly 90,000 gene-associated SNPs (Wang et al. 2014). This was performed at China golden marker company (cgmb.com.cn) adapting the manufacturer's recommendations as described in (Akhunov et al. 2009). The genotyping assays were done in Illumina iScan reader using Genome Studio software version 2011.1 (Illumina). The linkage map was constructed using the MAP function in software QTL IciMapping, version 4.1 (Wang et al. 2012) and Kosambi mapping function. The ordering markers to the chromosomes referred to 90K SNP maps that were developed in bread wheat (Cavanagh et al. 2013;Wang et al. 2014). QTL mapping was performed with QTL IciMapping version 4.1 (Wang et al. 2012) using inclusive composite interval mapping of additive (ICIM-ADD) module. The map distances were calculated in Centimorgan (cM). Additive QTL was detected using a 1.0 cm step in scanning. The probability used in stepwise regression was 0.001.

RNA extraction and cDNA library preparation for transcriptome sequencing
Seeds of two blue grain wheat lines (‗Zhiluowumai' and ‗Xiaoyanwumai') and two white wheat lines (‗A14' and ‗Xiaoyan mutant') at 10 dpa and 20 dpa were collected and squeezed to remove the starch portion of the seed. They were immediately placed in RNA free Eppendorf tubes, which were kept in liquid nitrogen. All samples were collected in triplicate. Total RNA was extracted using the RNA Prep Pure Plant Kit (Tiangen, China) according to the manufacture's protocol. RNA degradation and contamination were detected by electrophoresis in 1% agarose gels and RNA purity was checked by the Nano Photometer® spectrophotometer (IMPLEN, CA, USA). RNA concentration was measured using the Qubit® RNA Assay Kit in Qubit® 2.0 Fluorimeter (Life Technologies, CA, USA). RNA Nano 6000 Assay Kit of the Bioanalyzer 2100 system (Agilent Technologies, CA, USA) was used to measure the integrity of RNA.
The mRNA libraries were prepared separately for each sample. Sequencing libraries were created using nebnext® Ultra™ RNA Library Prep Kit for Illumina® (NEB, USA) following the manufacturer's recommendations. The various steps including mRNA enrichment, fragmentation, second-strand cDNA synthesis, PCR amplification, size selection, clustering and sequencing, using Illumina hiseq tm 2500 platform were performed by the Novogene Bioinformatics Technology Co. Ltd, China. Finally, 150bp paired-end reads were generated.

Sequencing data processing and annotation
After completion of the sequencing, resulted image data were transformed to raw reads and those raw reads were stored in a FASTQ format. Then the data in FASTQ format were cleaned using in-house perl scripts. In this step, reads with adapter, poly-N and other low -quality reads were removed from the raw data. At the same time Phred quality score (Q), which is defined as a property that is logarithmically related to the base calling error probabilities (P) 2 (www.illumina.com) and GC content were calculated. Low-quality data of Q <20 were removed and higher quality, clean reads from each wheat transcriptome library were used for the downstream analyses.

Reads mapping to the reference genome
Reference genome and gene model annotation files were downloaded from the wheat genome website directly (https:// www.wheatgenome.org). The index of the reference genome was built using Bowtie v2.2.3 and paired-end clean reads were aligned to the reference genome using TopHat v2.0.12. We selected TopHat as the mapping tool for that TopHat can generate a database of splice junctions based on the gene model annotation file and thus a better mapping result than other non-splice mapping tools.

Quantification of gene expression level
The expression levels of the mapped genes were estimated from the wheat transcriptome sequencing data based on the number of raw reads. Version 0.6.1 of htseq was used to count the read numbers mapped to each gene (Anders et al. 2015). Then the reads for each gene were normalized by using fragments per kilobase of transcript per million mapped reads (FPKM). The FPKM of each gene was calculated based on the length of the gene and reads count mapped to this gene. This FPKM was considered the effect of sequencing depth and gene length for the read count at the same time and is currently the most commonly used method for estimating gene expression levels (Trapnell et al. 2010).

Functional analysis of differentially expressed genes
Differential expression analysis of two conditions/groups (two biological replicates per condition) was performed using the Deseq R package (1.18.0). Deseq provides statistical routines for determining differential expression in digital gene expression data using a model based on the negative binomial distribution. The resulting P-values were adjusted using Benjamini and Hochberg's approach for controlling the false discovery rate. Genes with the adjusted P-value <0.05 found by Deseq were assigned as differentially expressed.
GO (Gene Ontology) enrichment analysis of differentially expressed genes was carried out using the Goseq R package. This package can verify the bias of gene length. GO terms with corrected P-value less than 0.05 were considered significantly enriched by differentially expressed genes.

Quantitative Real-time PCR analysis
To validate the accuracy and repeatability of the gene expression data obtained from RNA seq, qRT-PCR was carried out on a few randomly selected DEGs that were prepared from the total RNA. Total RNA from the seed coat of all 4 varieties at two different physiological stages were extracted as described above and cDNA was synthesized using the PrimeScript™ RT Reagent kit with gDNA Eraser (TaKaRa, China). The specific real-time PCR primers were designed using the Primer 5.0 program. qRT-PCRs were carried out on a Step One Plus tm real-Time PCR System (USA) in a 20 μl reaction volume containing 1:10 diluted cDNA templates, 0.4 μm of each primer, deionized water, Real star green power mixture and ROX reference dye (GenStar, China). The qrt-PCR conditions were as follows: denaturing at 95℃ for 10 min, followed by 40 cycles of 95℃ for 15 s, 60℃ for 45 s and 72℃ for 15 s. The relative gene expression level of each target gene was analyzed using the comparative CT method (2 -△△ C T method) (Schmittgen and Livak, 2008). The qrt-PCR reactions were normalized to the Ct values for TaActin in wheat. All the reactions were measured in three replicates.

RESULTS AND DISCUSSION
Wheat grain colouring is caused by anthocyanin reside in the pericarp (purple) or aleurone layer (blue) (Trojan et al. 2014). At present, major genes related to red and purple wheat have been mapped and identified into several chromosomes (Liu et al. 2016;Shoeva and Khlestkina 2015). However, the genes that control the blue grain trait of wheat have not been identified yet. The blue wheat lines were usually developed from distant hybridization, in which alien chromosome or fragments were integrated into the wheat genome (Zeven 1991).

Genetic mapping of the blue grain trait
The analysis with the 90k iSelect Infinium assay produced data for 81,587 SNP markers. Out of that, a total of 10,305 (12.6%), were polymorphic between the parental wheat lines of ‗No.  (Fig. 2). A high-density SNP map had not been used for mapping blue grain genes in wheat so far. Furthermore, the blue aleurone trait in hexaploid wheat has not been mapped previously using any other mapping methods and this research first time reports it.

Strategy for screening candidate genes involved in the blue grains
With the seed development, major colour changes were visually observed in both blue lines by changing the colour from green to blue at 20 dpa while both white lines remained green in colour. This inferred the first screening condition of candidate genes, in which the blue grain-related genes were differentially expressed before and after 20 dpa in the blue line. The second selection was to exclude the differentially expressed genes (DEGs) that commonly occurred in all 4 lines. Therefore, the samples collected at 10 dpa and 20 dpa were used for screening candidate genes with transcriptome analysis. As a result, we had two blue grain samples and 6 non-blue samples as the background controls for transcriptome analysis. The third selection of candidates would focus on the DEGs, involved in the anthocyanin biosynthesis pathway based on the transcriptome analysis.
The blue aleurone lines used in the research were translocation lines, developed from common wheat crossed with Agropyron elongatum. Because of the crossed in alien chromosome segments, comparative transcriptome analysis of blue and white wheat could be effective to identify the candidate genes related to the blue grain trait in wheat. It has been reported that large numeric DEGs could be screened out based on transcriptome analysis.
It is critical to screen down the unrelated DEGs. As we know that the blue grain colour was affected by genetic background and development stages, combing the factors two blue samples and six non-blue samples were used for the transcriptome analyses. It provides an effective decrease in the background DEG numbers.

Transcriptome sequencing and bioinformatics analysis of differentially expressed genes
The sum of 397,450,110 raw reads was generated by RNA sequencing, which consisted of the averages of 50,129,381 and 49,233,147 raw reads for white and blue groups, respectively. The number of reads from each pool ranged from 44 to 57 million (Table 2). After removing the lower quality reads such as Phred quality scores <20, N reads (The reads, which could not determine the base information) and adapter sequences, 367 million cleaned reads with 55.14G of cleaned bases were selected to use for subsequent analysis. The percentage of total raw reads to total clean reads was 92.6% (185,671,940) and 92.4% (181,941,724) for white and blue groups, respectively. These data were used for further bioinformatics analysis to screen out differential expressed genes between the groups of blue and non-blue samples.
A total of 126,792 genes was found in blue and white wheat pools. Gene expression differences were assessed by pairwise comparison of two physiological stages of each variety. It indicated that among the total DEGs, 3855 were found to be up-regulated while 5585 were down-regulated compared to each stage in each variety (Fig. 3).
A total of 4946 DEGs was screened out as relatively high level expressed (FPKM val-ue>1) genes in whole data pools, out of which 376 DEGs were commonly expressed in all four lines (Fig. 4a). At first, the differential expression was compared between 10 dpa and 20 dpa samples within each line. Then the DEGs were compared separately within white and blue line groups. As a result, 500 DEGs were commonly expressed in the blue wheat lines (Fig. 4b) and 1316 DEGs were common in white lines (Figure 4c). Combining the DEGs between blue and white groups 376 DEGs were common to all lines, while 124 DEGs were only in blue lines (Fig. 4d)  GO annotation was done for the blue specific 124 DEGs and they were further categorized into three main categories of biological process, cellular component and molecular function (Fig. 5). We obtained 37 clusters, which consist of 17, 9 and 11 clusters for biological process, molecular function and cellular component, respectively. The topmost score recording clusters in cellular component were ‗cell' and ‗cell part'. ‗Metabolic process' and ‗catalytic activity' expressed the highest number of GO terms of the biological process and molecular function respectively. Go graphs were created using WEGO software (Ye et al. 2006).

Screening candidate transcription factor genes
Blue grain trait visually started to occur on seed coat at 20 dpa, regulated by different transcription factors. The structural genes and Regulatory genes are capable of controlling the developmental or tissue-specific expressions exhibited by anthocyanin structural genes. A set of proteins, including MYB, bHLH and WD40 form a regulatory complex, which controls the transcription of anthocyanin structural genes (Ramsay and Glover 2005). Based on the transcriptome analysis 410 transcription factor genes were annotated as DEGs, in which 34 MYB, 12 bHLH and 7 WD40 DEGs were detected among the 8 samples. Only one MYB (Traes_1AL_01E24503F) and one bHLH (Traes_2AL_67BC3AFBE) were screened out as the candidate transcription factor genes which differentially expressed between blue samples and non-blue samples.
The MYB gene (Traes_1AL_01E24503F) showed 2.7 to 2.8 times higher expression in 20 dpa samples than in 10 dpa in blue wheat lines. However, the bHLH gene had an 8.6 to 23.8 times higher expression level in 20 dpa samples than in 10 dpa samples.
Of the five CHS expressed only in blue wheat, 2 CHS (Traes_2AS_0EA2792B8 and Traes_2AL_ED4D3BEC1) had alternative splicing events at 5' first exon (TSS) and 3' last exon (TTS) while another two CHS, (Traes_2BL_07EC87598 and Traes_2BS_4AC3D17E8) had SNP with Cytosine into Adenine and Thymine, respectively.
Two genes annotated as Flavonoid 3', 5'hydroxylase (Traes_4DL_5A3D8F519 and Traes_4DL_27C195FDE) were remarkably highly up-regulated during blue colouration in blue wheat varieties at 20 dpa. Those two DEGs had increased relative expression of 72 -128 fold in blue samples. Traes_4DL_27C195FDE gene had two alternative splicing events at TSS and TTS.
Up-regulation of two DFR encoding genes (TRAES3BF026100070CFD_g and Traes_3AL_197871859) was evident in blue samples. TRAES3BF026100070CFD_g had shown alternative splicing at TSS and TTS, while Traes_3AL_197871859 exhibited a SNP. Two ANS of Traes_6AS_BF6B4581A1, Traes_6DS_2F7DE1EAB and one Anthocyanin 5-aromatic acyltransferase gene (Traes_3AS_60F4494FC) showed very high expression in both blue samples while not expressed in other samples. No alternative splicing and SNPs were detected in the three genes.
To confirm the expression levels of these genes involved in the anthocyanin biosynthesis pathway, quantitative Real-Time PCR analyses were performed in three biological replicates on six selected genes including CHS, F3H, F3'5'H, DFR and ANS. The results were consistent with the findings obtained by the RNA seq (Fig. 7).
The flavonoid biosynthetic pathway has been extensively studied at the genetic, molecular and biochemical levels in various plant spe-cies, including maize, snapdragon, petunia (Holton and Cornish 1995), strawberry (Almeida et al. 2007), litchi (Lai et al. 2015) and Arabidopsis (Shirley et al. 1995;Bharti and Khurana 1997). In this research, most of the genes involved in anthocyanin biosynthesis of blue grain were identified. It displayed a profile of gene networks involved in the blue seed pigment biosynthesis.
Genetic research showed that the blue grain was controlled by one or two genes. Transcription factors and structural genes screened out in this research are located on 8 different chromosomes (2A, 2B, 2D, 3A, 3B, 4D, 6A, 6D-TraesS3BF026100070CFDg,Traes3AL19 7871859,Traes2AS0EA2792B8,Traes2BL07E C87598,Traes2BS4AC3D17E8,Traes2ALED 4D3BEC1,Traes2DS8827E95F0,Traes4DL27 C195FDE,Traes4DL5A3D8F519,Traes_6AS_ BF6B4581A1,Traes6DS2F7DE1EAB,Traes3 AS60F4494FC). The SNP genotyping performed using wheat 90k SNP chips identified the genetic location of the blue trait in the 4D chromosome. It has been reported purple grain was caused by transcription factors (Jiang et al. 2018). Based on our mapping result, the two-transcription factor DEGs were excluded as they are located in 1A and 2A chromosomes, and two F3'5'H DEGs were suggested as the candidate control genes for the blue trait. It potentially displayed a novel control mechanism of the blue grain trait. There are many useful genes such as several disease-resistant genes of Sr31 and Sr38 resistant to stem rust (Knott 1989), Bdv2 gene resistance to barley yellow dwarf virus (Crasta et al. 2000), Lr41 resistance to leaf rust (Sun et al. 2009), Yr26 gene resistant to all important races of Puccinia striiformis f. sp. Tritici, (Zhang et al. 2013) is transferred from distant hybridization and many elite lines are translocation lines in wheat. It causes a very difficult problem that the inserted fragment usually unpaired in meiosis and as a result, no cross events will occur in the fragments. Due to this reason, the fine-mapping is impossible in this translocated area and cloning of the gene is not practicable. It raises a common problem of wheat that how to clone the particular translocated gene. This research provides a good example to solve this problem. By mapping, we can identify the location of the particular gene and by transcriptome analysis, we can find the genes related to that chromosome and trait. This could be applied to similar cases.
The blue wheat lines we used for this research were translocated lines and it was based on the evidence of fluorescence in situ hybridization (Jia et al. 2001). By comparing our mapping results with the wheat consensus map, we can further confirm that they are truly translocated lines. The functional analysis of the F3'5'H genes is needed to be carried out. Transgenic expression assay of these genes should provide clear evidence for understanding the molecular mechanism of blue grain trait in wheat.

CONCLUSION
The SNP genotyping performed using 90k iSelect Infinium assay identified the location of the gene/locus, which control the wheat blue grain trait in the 4D chromosome flanked by two SNP markers IWB18525 and IWB16381, within a 2.16 cM genetic distance. In our transcriptome analysis revealed that 40 structural DEGs were related to anthocyanin biosynthesis and had obvious expression differences between blue and white wheat. Among them, 12 DEGs expressed only in blue wheat at 20 dpa, highlight the importance of them to produce blue colouration. This research successfully found the molecular processes involved in the biosynthesis of antho-cyanin in blue wheat. Delphinidin compounds are predominant in blue wheat and F3'5'H is the crucial enzyme that leads to the production of Delphinidin. Hence, two DEGs, which were located in 4D and annotated as F3'5'H are the most important and key structural gene for the blue colouration in wheat. This was further confirmed by the results of qRT-PCR and mapping. It has been reported that purple grain was caused by transcription factors. Based on our mapping result, the twotranscription factor DEGs were omitted, and two F3'5'H DEGs were suggested as the candidate control genes for the blue trait. It potentially displayed a novel control mechanism of the blue grain trait. Network of blue grain identified. Our research provides a good example of how to find out the translocated genes in a particular trait. This study filled the several gaps that remained unrevealed on anthocyanin biosynthesis of blue wheat and will favour the development of new biotechnological tools for crop improvement.