search for




 

Genomics Approach to Identify the Cause of the Missing Omega-5 Gliadin Protein in O-Free Wheat
Plant Breeding and Biotechnology 2018;6:413-425
Published online December 31, 2018
© 2018 Korean Society of Breeding Science.

Yun Gyeong Lee1,†, Sang Chul Choi1,†, Yuna Kang1, Chon-Sik Kang2,*, and Changsoo Kim1,*

1Department of Crop Science, College of Agriculture and Life Sciences, Chungnam National University, Daejeon 34134, Korea, 2National Institute of Crop Science, Rural Development Administration (RDA), Wanju 55365, Korea
Corresponding author: *Changsoo Kim, changsookim@cnu.ac.kr, Tel: +82-42-821-5729, Fax: +82-42-821-7822
Received October 2, 2018; Revised October 16, 2018; Accepted November 5, 2018.
This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract

A previous work developed and identified a new omega-5 gliadin deficient wheat line named O-free by crossing Keumkang and Olgeuru, which is nutritionally quite meaningful in that omega-5 gliadin is one of the known wheat allergens. To verify the characteristics of the O-free, we performed RNA sequencing (RNAseq) analysis of the O-free and the two parent lines (Keumkang and Olgeuru). The results of the similarity analysis with the ESTs for gliadins and glutenins showed that the O-free ESTs had no similarity with the omega-5 gliadin sequences but had similarity to other gliadins and glutenins. Furthermore, mapping results between the raw RNAseq data from the O-free and the omega-5 gliadin sequence showed a clear deletion of the N-terminal sequences which are an important signature of omega-5 gliadin. We also designed specific PCR primers that could identify omega-5 gliadin in the genomic DNA. The results showed that no omega-5 gliadin fragments were detected in the O-free. According to these results, we confirmed that the deficiency of omega-5 gliadin in the O-free is not caused by post-transcriptional or post-translational regulations such as epigenetic phenomena but by a simple deletion in the chromosome. Furthermore, we showed that the low-molecular weight glutenin subunit (LMW-GS) gene in the O-free had a single nucleotide polymorphism (SNP) causing a premature stop codon, resulting in a truncated polypeptide. We expect that the O-free line may serve as an excellent source of wheat that could prevail in the hypo-allergen wheat market, which has recently gained interest world-wide.

Keywords : Gliadin, Gluten, Glutenin, Omega-5 gliadin, SNP, Wheat
INTRODUCTION

Wheat gluten proteins are the major storage proteins that are deposited in the starchy endosperm cells of developing grain (Shewry et al. 2002). Gluten is a protein complex that accounts for 75 to 85% of the total protein in bread wheat. Glutens in Triticeae are known for their viscoelastic properties (Shewry et al. 2002; Lamacchia et al. 2014). One type of gluten proteins, glutenins, provide the elasticity to dough and consist of high-molecular weight (HWM) and low-molecular weight (LMW) subunits. Gliadins are another member of the gluten proteins. They are present as monomers in flour and provide extensibility to wheat flour dough. Gliadins are a highly complex and polymorphic group of proteins. Acid polyacrylamide gel electrophoresis (A-PAGE) separates gliadins into four different groups (α−, β−, γ−, and ω−) based on electrophoretic mobility, and each has distinct primary sequences. Alpha-and beta-gliadins are structurally similar proteins while omega-gliadins consist of three subgroups: omega-1, 2, and 5 gliadins (Kasarda et al. 1984). Proteins of the alpha, beta, gamma, and omega groups differ in the composition of their polypeptide domains. Omega-gliadins are characterized by the highest contents of glutamine, proline, and phenylalanine which together account for around 80% of the total composition (Wieser 2007). They can be identified by signal peptide sequences located at the 19th amino acid position in the N-terminal. All omega-gliadins and many gamma-gliadins are encoded by the Gli-1 loci on the short arms of group 1 chromosomes, and all alpha-gliadins, many beta-gliadins, and some gamma-gliadins are encoded by the Gli-2 loci on the short arms of homoeologous group 6 chromosomes (Rasheed et al. 2014).

There are three wheat-related disorders, namely wheat allergy (WA), celiac disease (CD), and non-celiac gluten sensitivity (NCGS). In particular, omega-5 gliadin is thought to be associated with some allergic diseases such as wheat-dependent exercise-induced anaphylaxis (WDEIA) (Morita et al. 2003; Battais et al. 2005). In the case of WDEIA, exercise acts as a cofactor, triggering anaphylaxis after wheat ingestion (Scherf et al. 2016). WDEIA is a life-threatening allergy. Mild symptoms for anaphylaxis may include widespread flushing of the skin, nettle rash, swelling of the skin or lips and in some severe cases, swollen tongue, feeling faint, and even loss of consciousness. For those patients, it would be safer to avoid wheat foods altogether even if it is normally not a problem without exercise. However, wheat is one of the most common food crops consumed by people. Almost half of the calories consumed by the human population worldwide come from bread, muesli, pasta, cereals, and pastries. Therefore, developing a new wheat line with reduced or removed allergens is a very important issue worldwide. Specially, the U.S. gluten-free market is projected to be valued at 7.59 billion U.S. dollars by 2020.

In fact, a set of wheat genotypes lacking all omega-gliadins has been developed by cumulating inactive gene variants in the three gliadin coding loci (Gli A1, Gli B1, and Gli D1), using traditional plant breeding methods (Waga and Skoczowski 2014). However, the development of only an omega-5 gliadin free wheat line using traditional breeding methods is unusual. To improve wheat quality and reduce allergies, we had developed a new omega-5 gliadin free wheat line (O-free) by crossing the Korean wheat cultivars Keumkang and Olgeuru and producing a homozygous wheat plant using doubled haploid (DH) breeding methods (Lee et al. 2017). Keumkang is a hard white winter wheat used for breads and noodles. Olgeuru is a high yielding soft red cultivar used only for noodles. While the previous work confirmed that omega-5 gliadins are not detected in the O-free line using sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) and ultra-high performance liquid chromatography (UPLC) analyses, we do not know if the absence of omega-5 gliadins is caused by simple chromosomal deletions or by other regulatory factors during transcription or translation.

In the present study, we used omega-5 gliadin specific PCR primers and RNA sequencing (RNAseq) data to reveal the cause of the absence of omega-5 gliadins in the O-free line. We also revealed the specific characteristics of the O-free line different from its parents. Hopefully, our efforts will be helpful for people who are suffering from wheat allergies.

MATERIALS AND METHODS

Plant materials and sample preparation

Three wheat accessions (Keumkang, Olgeuru, and O-free) were obtained from the National Institute of Crop Science of the Rural Development Administration (RDA), Wanju, Republic of Korea. Among them, O-free was created by crossing Keumkang and Olgeuru and analyzing the allelic variations (Lee et al. 2017). The wheat plants were grown in a glass greenhouse located on the crop experimental farm of the RDA (32°34′27.381″N and 124°52′17.840″E). The immature seeds (milky stage: 14 days after flowering) were sampled by pooling only the endosperm and embryo parts except for the pericarp from about 100 grains in three individual plants of each accession. The samples were frozen in liquid nitrogen and then stored at −80°C until use.

RNA extraction

The pooled samples of the immature seeds were subjected to total RNA isolation. The samples were ground into a fine powder in liquid nitrogen with a mortar and pestle. Approximately 100 mg of the sample powder were used to isolate the total RNA. The total RNA was extracted with the Ribospin Seed/Fruit RNA mini prep kit (GeneAll, Republic of Korea) following the manufacturer’s instructions. The quality and quantity of the RNA were measured with a PDA UV-Vis Spectrophotometer (Scinco, Republic of Kora).

Library construction and sequencing

All the samples were used for the RNAseq analysis. The libraries were prepared for 101 bp paired-end sequencing with the TruSeq RNA Sample Preparation Kit (Illumina, USA). First, mRNA molecules were purified and fragmented from 2 μg of total RNA using oligo (dT) magnetic beads. The fragmented mRNAs were synthesized as single-stranded cDNAs through random hexamer priming. By using this as a template for second strand synthesis, double-stranded cDNA was prepared. After the sequential process of end repair, A-tailing and adapter ligation, cDNA libraries were amplified with PCR. The quality of these cDNA libraries was evaluated with the Agilent 2100 BioAnalyzer (Agilent, USA). They were quantified with the KAPA library quantification kit (Kapa Biosystems, USA) according to the manufacturer’s library quantification protocol. Following cluster amplification of the denatured templates, sequencing was done as paired-end (2 × 101 bp) with the Illumina HiSeq2500 (Illumina, USA). To obtain high-quality clean data for de novo assembly, a quality check was performed which removed low-quality reads and adapters using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/).

Genomic DNA extraction

For the three wheat accessions (Keumkang, Olgeuru, and O-free), the leaves of each accession were harvested as materials for genomic DNA extraction. The procedure of the gDNA extraction was performed following the method previously described (Carlson et al. 1991) with some modifications. Wheat leaves were ground into a fine powder in liquid nitrogen using a mortar and pestle. Then, 100 mg of the sample powder were transferred into a 2 mL tube (Eppendorf) containing 600 μL of modified Carlson buffer (100 mM Tris-HCl, pH 8.0, 2% CTAB, 1.4 M NaCl, 1% PEG 8000, 20 mM EDTA, 2% PVP40, and 0.1% ascorbic acid) pre-warmed to 60°C and 20 μL of RNase A (20 mg/mL; Invitrogen). Immediately, the sample was homogenized by vortexing and incubated in water-bath at 60°C for 10 minutes. After the incubation, the sample was cooled-down to room temperature, and 600 μL of chloroform were added and vortexed. Afterward, the sample was centrifuged at 12,000 rpm for 10 minutes, and 400 μL of the supernatant were transferred to a new 2 mL tube. Then, 400 μL of Isopropanol were added to the sample and vortexed. The sample was centrifuged at 12,000 rpm for 10 minutes. The supernatant was removed, and the pellet was washed with 70% ethanol. The residual ethanol was removed by pipetting, and the pellet was air-dried. The pellet was resuspended in the proper amount of nuclease-free water.

PCR using omega-5 gliadin specific primers

The genomic DNA of the three accessions was used as a template for the amplification of the omega-5 gliadin genes. The primer sequences used were as follows: forword, 5′-AGTAGGCTGCTAAGCCCTAGA-3′ and reverse, 5′-ATATTGTTGGTATGGGGAAGG-3′. The PCR program was set as follows: the initial denaturing at 94°C for 5 minutes followed by 35 cycles of 94°C for 30 seconds, 60°C for 45 seconds, and 72°C for 60 seconds, and a final extension step at 72°C for 7 minutes. The program was used for the amplification of a 1,212-bp fragment. The PCR products were analyzed with the Qiaxcel genetic analyzer (QIAGEN, Germany), which uses a preassembled cartridge (cartridge type Qiaxcel DNA Fast Analysis kit) to run samples and collect data.

De novo transcriptome assembly and evaluation

Several well-designed de novo assemblers have been developed to assemble RNAseq reads; however, the performance of these tools for specific samples and conditions is not clear. Therefore, we compared two different tools, Assembling Nucleotide Sequences 1.2.4 (Platanus 1.2.4; Kajitani et al. 2014) and Trinity (version 2.6.6; Grabherr et al. 2011), according to several performance criteria including the N50 length, numbers of contigs, and length of total assembly for our samples. Platanus 1.2.4 specializes in reconstructing the genomic sequences of highly heterozygous diploids from massively pararllel shotgun sequencing data. Trinity was developed as a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNAseq data.

The high-quality reads of the three samples were assembled into contigs using both of the two de novo assemblers with their default parameters. There are several statistical measures such as the N50 length that can provide some indications of the quality of an assembly which reflects its contiguity. However, a key measure of quality is to assess the completeness of the genome assembly in terms of its expected gene content. Therefore, our assembled sequences were evaluated with Benchmarking Universal Single-Copy Orthologs (BUSCO; version 3.0.2; Waterhouse et al. 2018). The BUSCO assessment tool implements a computational pipeline to identify and classify BUSCO group matches from genome assemblies, annotated gene sets, or transcriptomes, using HMMER (Finn et al. 2011) hidden Markov models and de novo gene prediction with Augustus (Stanke et al. 2006). In this study, BUSCO used the ‘transcriptome’ mode.

Transcriptome functional annotation

General annotation for the assembled sequences was performed as part of the functional genomic analysis using TransDecorder (version 5.3.0; https://github.com/TransDecoder/) and Trinotate (version 3.1.1; https://trinotate.github.io/). TransDecorder identifies candidate protein-coding regions based on nucleotide composition, gene open reading frame (ORF) length, and the optional Pfam domain contents. A minimum length of ORFs from the de novo transcript assembled genomes were predicted, and then, the nucleotide sequences for the coding regions of the final candidate ORFs were predicted with the default parameters. All putative genes were searched against the non-redundant (nr) protein database of the National Center for Biotechnology Information (NCBI) using blastx with an e-value cut-off set to 1.0e-5 and one best match. Gene Ontology (GO) classification and eggNOG (Huerta-Cepas et al. 2016) annotation were performed using Trinonate with the BLAST databases, uniprot peptide, HMMER and Pfam protein families database. Trinonate is preferred for assigning functions to identified genes. The eggNOG database is a database of biological information hosted by the EMBL. eggNOG is a public resource that provides Orthologous Groups (OGs) of proteins at different taxonomic levels, each with integrated and summarized functional annotations. In this study, we provided GO enrichment proportions at the entire transcriptome level across all three accessions.

Similarity analysis of wheat gluten sequences by BLAST search

The NCBI Basic Local Alignment Search Tool (BLAST) program (version 2.6.0; Altschul et al. 1990) was used to identify homologous proteins and DNA sequences based on sequence similarity with the following options: 95% of identities, e-value cut-off less than 1.0e-10, and five maximum number of target sequences (max-target seqs).

Similarity analysis with Omega-5 gliadin

A customized database for omega-5 gliadin (GenBank accession no. AB181300.1: Triticum aestivum gene for omega-5 gliadin, complete coding sequence) was built using the makeblastdb option from the NCBI BLAST program. Similarity analysis was performed on each assembled sample as a query against the local omega-5 gliadin database using the blastn option.

Similarity analysis with Omega-2 gliadin

To verify only omega-5 gliadin was deleted from the O-free, blastx was performed on an omega-2 gliadin (GenBank accession no. AAG17702.1: omega gliadin storage protein, T. aestivum) database with each assembled sample as a query.

Similarity analysis with gliadin EST

A total of 6,927 gliadin ESTs (α/β−, γ−, and ω-gliadins) from the NCBI database were collected and compared with the assembled Keumkang, Olgeuru, and O-free using the blastn option.

Similarity analysis with glutenin EST

A total of 9,797 glutenin ESTs (for LMW and HMW glutenins) from the NCBI database were collected and compared with the genes assembled from the three samples using the blastn option.

Reads mapping to the omega-5 gliadin sequence

Paired-end raw reads obtained from RNAseq were aligned to the omega-5 gliadin sequence using Burrows-Wheeler Aligner-MEM (BWA-MEM; version 0.7.15; Li and Durbin 2009) with the default options. To convert and visualize the mapping results, the text alignment viewer (tview) option from SAMtools (version 1.3.1; Li et al. 2009) was used.

Read mapping to the reference genome

Genes encoding storage proteins in wheat do not have intron sequences, therefore it is very accurate to map RNAseq reads to the reference genome. For SNP calling workflow, International Wheat Genome Sequencing Consortium (IWGSC) RefSeq assembly v1.0 (Chinese spring version 1.0 pseudomolecules parts.fasta) was used as the reference, downloaded from IWGSC (https://wheaturgi.versailles.inra.fr/Seq-Repository/Assemblies/; IWGSC 2014). The parts of the pseudomolecules contain 21 bread wheat chromosomes, and unanchored scaffolds (chrUn) have been split into two parts to make the sequence shorter than 512 mega base pairs (Mb). Each of the three transcript sequences was aligned to the IWGSC reference genome using Bowtie2 (version 2.3.2; Langmead and Salzberg 2012). Variants calling was done with the mpileup function in SAMTools (version 1.3.1; Li 2011), and BCFtools was used to filter the SNPs.

SNP annotation

The gene functional annotation file, IWGSC RefSeq Functional annotation v1.0, was downloaded from https://urgi.versailles.inra.fr/download/iwgsc/IWGSC_RefSeq_Annotations/v1.0/. The IWGSC RefSeq Functional annotation v1.0 contains gene functions assigned by the Automated Assignment of Human Readable Descriptions (AHRD) tool (version 3.3.3; https://github.com/groupschoof/AHRD/).

SNP effect predictor (SnpEff; version 4.3p; Cingolani et al. 2012) was used to annotate and categorize the genetic polymorphisms based on their effects. To build a customized IWGSC RefSeq v1.0 functional annotation database, the database function in SnpEff was used. The default parameters of SnpEff were used to perform the variants effects and to calculate the transition vs. transversion ratio (Ts/Tv) for each sample. Variants only assigned to the HIGH impact were considered for further analysis: HIGH impact includes “stop gained” and “frame shift”. The output contains impact predictions and an annotation of each variant in variant calling format (VCF).

RESULTS

RNA sequencing

After trimming low-quality reads and sequence adapters with FastQC, a total of 34.84 giga base pairs (Gb) were obtained which consisted of high-quality data corresponding to 344,915,150 raw reads (Q30 score > 85%). Using Illumina’s HiSeq2500 platform, 11.98 Gb, 12.34 Gb, and 10.52 Gb of data were generated for the Keumkang, Olgeuru, and O-free lines, respectively. The GC content was similar in all three samples (around 50%) (Table 1).

Sequence assembly

The de novo assembly was constructed with Trinity version 2.4.0 (Grabherr et al. 2011) and Platanus version 1.2.4 (Kajitani et al. 2014) (Table 2). In the case of using the Trinity assembler, the assembled genome consisted of 180,216 contigs for Keumkang, 188,299 contigs for Olgeuru, and 185,061 contigs for O-free. The Trinity assembler generated a greater number of contigs than that of Platanus in all three cases. However, in length, Trinity generated longer contigs and N50 than that of Platanus. Therefore, we decided to use the Trinity results for further downstream analysis.

The gene completeness of the assembled transcriptomes were assessed using the Embryophyta orthologous database of BUSCO (http://busco.ezlab.org/; Waterhouse et al. 2018). The Embryophyta orthologous database represents a collection of 1,440 well-annotated and conserved single-copy genes. Of the 1,440 BUSCO groups, 66.2%, 69.4%, and 70.4% of the genes were present and complete in the Keumkang, Olgeuru, and O-free lines, respectively. Overall, only 10.8–12.7% of the BUSCO genes were fragmented, and 18.0–21.1% were missing (Table 3).

Functional annotation results

Three de novo assembled sequences from the Trinity assembler were annotated with GO and eggNOG. The results of the three GO classification are visualized by pie charts with the GO enrichment proportion (Fig. 1). A total of 19,686 genes for O-free, 19,628 genes for Keumkang, and 20,165 genes for Olgeuru were assigned to GO terms. The majority of the GO terms were assigned to Biological Process (38%), followed by Cellular Component (32%) and Molecular Function (30%) in all three samples. Within Biological Process, the largest proportion was assigned as “cellular nitrogen compound metabolic process”. Within Molecular Function, the largest proportion was assigned to categories for binding and catalytic activity. Within Cellular Component, the majority was assigned to categories for cell and cell part (Fig. 1).

According to the eggNOG functional annotation results, the most abundant function in all three samples is “Translation, ribosomal structure and biogenesis” and the least abundant function is “Nuclear structure”. The three samples showed similar GO classifications and eggNOG functions (Table 4).

Similarity analysis

Similarity analysis between the omega-5 gliadin database and assembled contigs from the three samples showed that 126, 53, and 9 of the Keumkang Olgeuru, and O-free sequences, respectively, are similar to the omega-5 gliadin sequence.

For those nine sequences similar to the omega-5 gliadin sequences, they were searched against the nr database using blastp and blastx. Among the nine genes, only one sequence (sequence length 339 bp, TRINITY_DN40250_c2_g2_i1) showed a 93% similarity to omega-5 gliadin (GenBank accession no. AB181300.1); however, they were only 247 bp matches in the downstream region, which is discussed further in the next section.

To verify if omega-5 gliadin is deleted from the O-free, blastx against omega-2 gliadin was performed. From the results, all samples showed a similar level of similarity with omega-2 gliadin: 19,751, 21,257, and 20,714 cases for the Keumkang, Olgeuru, and O-free lines, respectively.

According to the similarity results for the gliadin ESTs, a unigene (contig) with similarity to omega-5 gliadin was not found in O-free, but sequences corresponding to omega-5 gliadin in Keumkang and Olgeuru were found. In addition, all other gliadins (α−, β−, γ−, and ω−) except for omega-5 gliadin were found in O-free. It was also confirmed that LMW and HMW glutenins exist at similar levels in all samples (Table 5).

PCR verification using omega-5 gliadin specific primers

To verify the similarity analysis results using the RNAseq results, we examined whether omega-5 gliadin in the O-free could be amplified by PCR compared with the cases of Keumkang and Olgeuru. Because omega-5 gliadin is considered as a single exon gene, we used genomic DNA extracted from the leaves. As a result, omega-5 gliadin fragments were only detected in the Keumkang and Olgeuru lines but not in the O-free line (Fig. 2).

Reads mapping to omega-5 gliadin

Previous reports classified omega gliadin groups into three types based on signal peptide sequences at their 19th amino acid position of the N-terminal. The omega-1 type begins with KEL (Lys-Glu-Leu); the omega-2 type begins with ARQ (Ala-Arg-Gln), ARE (Ala-Arg-Glu), or RQ (Arg-Gln), and the omega-5 type begins with SRL (Ser-Arg-Leu).

According to the tview result of the O-free, only five reads out of a total of 100 million reads were mapped from the upstream of 57 bp (signal peptide position). Therefore, it can be seen that the amino acid sequence portion of the N-terminal that can determine the omega-5 gliadin type is missing in O-free (Fig. 3).

SNP discovery and annotation

We performed SNP analysis to closely look at the variation in gene sequences related to gluten in Korean wheat cultivars and O-free line, when compared to the reference cultivar Chinese spring. Overall alignment rates by aligning sequences to the reference genome were 90.72%, 91.20%, and 91.45% for Keumkang, Olgeuru, and O-free, respectively. Comparison between the IWGSC reference genome and each of the three de novo assembled genomes provided an opportunity to identify variants such as SNPs and insertions and deletions (INDELs). The mpileup function in SAMtools (version 1.3.1; Li 2011) and BCFtools package (http://www.htslib.org/) was used to call SNPs and short INDELs. A total of 2,776, 2,727, and 2,789 SNPs for Keumkang, Olgeuru, and O-free were detected, respectively (Table 6).

SnpEff (v4.3p; Cingolani et al. 2012) was used to identify the transition (Ts) and transversion (Tv) values of the SNPs. Only SNPs were used for this statistics. The variants had a transition and transversion ratio (Ts/Tv) of 1.1078, 1.0365, and 1.1437 for Keumkang, Olgeuru, and O-free, respectively (Table 7). This ratio provides information on the sequence evolution, sequence distance, and natural breeding history of wheat.

SnpEff predicted five HIGH impact variants in the three samples: one for Keumkang, two for Olgeuru, and two for O-free (Table 8). The HIGH impact output means that the variant causes deleterious gene effects such as frame shifts or the addition or deletion of stop codons.

Among the five genes, only one gene in O-free was associated with the wheat seed storage protein glutenin (Gene ID TraesCS1A01G00800). In TraesCS1A01G00800, one variant located in chr1A part1 at position 4,203,458 was changed (Guanine to Thymine), resulting in a premature stop codon (stop gained) and thereby leading to a truncated polypeptide. According to the gff annotation file, this gene is the LMW-glutenin subunit (LMW-GS). Pfam (Finn et al. 2016) reported that this gene is the cysteine-rich N-terminal region of gliadin and avenin in the plant protein family (Table 8). The GO term (GO: 0045735) stated that it is related to a nutrient reservoir activity. Because the annotation result and Pfam results are different, NCBI blastn was performed between the TraesCS1A01G00800 and the nr database of the NCBI. The result showed that TraesCS1A01G00800 had an 80% identity with the T. urartu cultivar PI428196 LMW-GS gene (GenBank accession no. KM085319).

DISCUSSION

A previous study showed the deletion of omega-5 gliadin in O-free using SDS-PAGE and UPLC experiments (Lee et al. 2017). Our study is the first to show the deletion of omega-5 gliadin in O-free by using RNAseq data. Because the known omega-5 gliadin sequence (GenBank accession no. AB181300.1) was not included in the reference genome (IWGSC RefSeq assembly v1.0), we performed a transcriptome de novo assembly to obtain all the coding sequence (CDS) regions and compared them with the omega-5 gliadin sequence. We also compared the raw RNAseq reads with the omega-5 gliadin sequence. Finally, we performed PCR verification using omega-5 gliadin specific primers for all three samples. The results suggest three unique characteristics of the O-free.

First, it has a sequence deficiency in the omega-5 gliadin defined region. A simple alignment analysis was done with the raw RNAseq reads as a query and omega-5-gliadin sequence as a reference. The result clearly shows that the mRNA of omega-5 gliadin was not expressed in O-free. Furthermore, sequence similarity analysis between the gliadin/glutenin database and the O-free RNAseq result showed that omega-5 gliadin was not found in the O-free; however, all the other gliadins (α−, β−, γ−, and ω−) and the two types of glutenins (LMW and HMW) were found in the O-free (Table 5). Therefore, we concluded that only the O-free line did not have the omega-5 gliadin sequence.

Second, the LMW-GS gene in the O-free has a SNP, causing a premature stop codon. The results of the variant effect prediction tool (SnpEff version 4.3p) showed that one variant in the LMW-GS gene (TraesCS1A01G00800) in the O-free was predicted as a stop gained effect (Table 8). This SNP turned a particular codon in the LMW-GS into a stop codon, down-regulating the structural gene expression, resulting in pseudogenes. This SNP might affect the phenotypic expression of the O-free. It will be an interesting subject to investigate the glutenine contents between the O-free and its parents, Keumkang and Olgeuru. The LMW-GS from wheat is primarily responsible for the quality of breads and noodles. The previous research showed an example of glutamine codons, CAA or CAG, that may change to stop codons (UAA or UAG) by a cytosine (C) to thymine (T) transition (Hsia and Anderson 2001). Another study identified two stop codons among omega-5 gliadin DNA sequences in the Japanese wheat cultivar Norin 61 (Matsuo et al. 2005; Waga and Skoczowski 2014).

Third, the results of the PCR using omega-5 gliadin specific primers clearly show that omega-5 gliadin was not detected in the O-free. According to these results, we could exclude post-transcriptional regulation of the omega-5 gliadin transcript and conclude that the deficiency of omega-5 gliadin in the O-free is caused by a simple chromosomal deletion of target regions.

Wheat induced diseases are prevalent around the world. The number of people diagnosed with wheat induced diseases is increasing. Thus, there is a need for novel approaches to handle wheat-related disorders such as the development of new wheat lines without allergic reacted factors. Gliadins in the omega class are the most allergenic among the gluten proteins (Morita et al. 2003; Battais et al. 2005). Among them, the omega-5 subgroup is especially important because they are the major allergens causing WDEIA, which is the most dangerous and life-threatening wheat allergy. Many studies have been conducted to prove omega-5 gliadin related food allergies. Palosuo et al. (2001) found omega-5 gliadins to be important allergens in immediate allergic reactions to wheat while examining allergic children with symptoms of atopic dermatitis and gastrointestinal and respiratory symptoms. Therefore, the development of omega-5 gliadin free wheat lines for these patients is very important.

In the present paper, we used only one Keumkang, Olgeuru, and O-free sample, respectively, because we simply wanted to confirm the deficiency of omega-5 gliadin in O-free. Therefore, we could not investigate the differentially expressed genes between the three samples. The limitation of this study is that more sequence data were not available. However, finding SNPs from mapping results is trustworthy because the reference and annotation data were provided, and the number of reads were sufficient. To answer other questions, such as why and how omega-5 gliadin was deleted in the O-free line and what differences exist between the three lines in terms of genotypic and phenotypic aspects, more diverse data will be needed.

The characteristics of wheat genome such as its hexaploidy (AABBDD) and large size (around 16 Gb) with a high proportion of relatively long and near-identical repeats make it challenging to construct a de novo assembly. Therefore, two assembly tools (Trinity and Platanus) were used, and the results were compared. They showed different results even if they used the same de Bruijn graph algorithms. The Platanus is suited for highly heterozygous diploid genomes (heterozygosity > 0.5%). With our highly heterozygous hexaploid wheat genome, Platanus had poorer results compared to those of Trinity. Platanus might have combined variations between subgenomes in hexaploid wheat into the same contigs. Therefore, the number of contigs was fewer, but the N50 length is almost twice shorter than that of Trinity.

The characteristics of the O-free was not clearly known before at both the DNA and RNA level. Therefore, our study is good starting point to reveal distinct features of the O-free line from other wheat lines. Moreover, these findings will help to improve the quality of the O-free line.

ACKNOWLEDGEMENTS

This work is supported by the “Cooperative Research Program for Agriculture Science and Technology Development (Project No. PJ01252701)”, Rural Development Administration, Republic of Korea.

Figures
Fig. 1. Pie charts showing the summary of the Gene Ontology (GO) analysis of the total gene sets corresponding to Keumkang, Olgeuru, and O-free. The charts represent the GO terms for Biological Process (A), Cellular Component (B), and Molecular Function (C).
Fig. 2. PCR analysis of leaf DNA using the primers for omega-5 gliadin. The size of the band was 1,212 bp (red arrow) in Keumkang and Olgeuru, but it was not seen in O-free.
Fig. 3. The BWA-MEM alignment results. In O-free, only 5 out of 100 million sequencing reads were aligned to the signal peptide region of the omega-5 gliadin. In this alignment, the omega-5 gliadin sequence was used as a reference and the raw-RNAseq reads were used as the query.
Tables

Summary of the RNAseq data collected from Keumkang, Olgeuru, and O-free.

Sample Total bases (Gb) Total reads (bp) GC contents (%) Q30 score (%)
Keumkang 11.98 118,565,712 50.05 85.70
Olgeuru 12.34 122,188,346 50.22 86.34
O-free 10.52 104,161,092 50.79 87.28

Transcriptome de novo assembly statistics obtained from Trinity and Platanus.

Keumkang Olgeuru O-free



Trinity Platanus Trinity Platanus Trinity Platanus
Total length (bp) 166,773,951 25,426,840 173,092,759 42,348,545 175,535,566 39,106,233
Mean length (bp) 925.41 404.18 919.24 310.91 948.53 241.32
Longest read length (bp) 14,897 9,709 15,669 8,440 16,231 10,289
Shortest read length (bp) 201 100 201 100 201 100
No. of contigs 180,216 62,910 188,299 136,207 185,061 162,051
N50 (bp) 1,382 693 1,356 521 1,412 338
No. of N50 37,573 9,700 40,247 20,473 39,373 25,975

Summary of the search results of the orthologs using BUSCO.

Sample Complete BUSCOsz) Fragmented BUSCOsz) Missing BUSCOsz)



No. of genes Percent of genes (%) No. of genes Percent of genes (%) No. of genes Percent of genes (%)
Keumkang 954 66.2 183 12.7 303 21.1
Olgeuru 999 69.4 182 12.6 259 18.0
O-free 1014 70.4 155 10.8 271 18.8

The complete, duplicated, fragmented, and missing orthologs were inferred from a Benchmarking Universal Single-Copy Orthologs (BUSCO) search against the 1,440 single-copy orthologs for embryophyta orthologous genes.


The results of the eggNOG functional annotation for the three accessions.

eggNOG function Keumkang Olgeuru O-free
Translation, ribosomal structure and biogenesis 183 186 185
General function prediction only 150 155 154
Amino acid transport and metabolism 95 97 99
Posttranslational modification, protein turnover, chaperones 95 96 94
Function unknown 86 89 88
Energy production and conversion 75 76 81
Carbohydrate transport and metabolism 71 72 75
Coenzyme transport and metabolism 68 68 68
Replication, recombination and repair 64 64 67
Transcription 60 61 62
Inorganic ion transport and metabolism 48 50 53
Lipid transport and metabolism 48 47 47
Nucleotide transport and metabolism 45 46 45
Intracellular trafficking, secretion, and vesicular transport 38 38 39
Cell wall/membrane/envelope biogenesis 33 34 35
Signal transduction mechanisms 22 24 24
Secondary metabolites biosynthesis, transport and catabolism 21 21 21
RNA processing and modification 20 19 19
Cell cycle control, cell division, chromosome partitioning 11 11 11
Chromatin structure and dynamics 8 8 8
Defense mechanisms 6 6 8
Cytoskeleton 5 5 5
Cell motility 2 2 2
Nuclear structure 1 1 1

The similarity analysis of the gliadin and glutenin ESTs for each sample.

α/β-gliadin γ-gliadin ω-gliadin ω-5 gliadin Low-molecular weight High-molecular weight
Keumkang 4z) (24y)) 80 9 2 90 80
Olgeuru 25 (36) 105 9 5 172 63
O-free 17 (49) 61 10 0 109 61

The number indicates the frequency of the identified gliadins and glutenins. Identification of homologous proteins and DNA sequences were based on sequence similarity with the following parameters: identities > 95% and e-value < 1.0e–10.

The numbers in parentheses indicate only the number of α-gliadin.


A summary of the total variants and annotated variants number.

Keumkang Olgeuru O-free
Total variants number 2,937 2,879 2,929
SNP number 2,776 (2,776z)) 2,727 (2,727) 2,789 (2,789)
Insertion number 92 (92) 80 (80) 68 (68)
Deletion number 69 (69) 72 (72) 72 (72)

The numbers in parentheses indicate the number of annotated variants.


The results of the transition and transversion ratio (Ts/Tv) in each sample.

Keumkang Olgeuru O-free
Transition (Ts) 1,459 1,388 1,488
Transversion (Tv) 1,317 1,339 1,301
Ts/Tv ratio 1.1078 1.0365 1.1437

The predicted HIGH impact effect of the variants by SnpEff.

Accession Chromosome Position Ref. Alt. Type Effect impactz) Gene ID Gene Description Pfam ID GO ID
Keumkang chr1A part2 102,513,486 GCCCC GCCC frameshift variant HIGH TraesCS1A01G414300 Zinc finger CCCH domain protein PF00642 GO:0046872
Olgeuru chr1A part1 107,227,988 CA CAA splice acceptor variant/intron variant HIGH TraesCS1A01G151100LC Splicing factor u2af large subunit, putative none none
chr1A part1 458,506,985 AGGG AGGGG frameshift variant HIGH TraesCS1A01G263000 UPF0301 protein PF02622 none
O-free chr1A part1 4,203,458 G T stop gained HIGH TraesCS1A01G008000 Low molecular weight glutenin subunit PF13016 GO:0045735
chr1A part2 93,003,979 C A stop gained HIGH TraesCS1A01G400300 Phosphoinositide phosphatase family protein PF02383 GO:0042578

Effect impact indicates effect impact of this variant, and HIGH effect represents that the variant is assumed to have a high (disruptive) impact in the protein, probably causing a protein truncation, loss of function or triggering nonsense mediated decay. Ref.: Reference, Alt.: Alternative.


References
  1. Altschul, SF, Gish, W, Miller, W, Myers, EW, and Lipman, DJ (1990). Basic local alignment search tool. J Mol Biol. 215, 403-410.
    Pubmed CrossRef
  2. Battais, F, Mothes, T, Moneret-Vautrin, DA, Pineau, F, Kanny, G, and Popineau, Y (2005). Identification of IgE-binding epitopes on gliadins for patients with food allergy to wheat. Allergy. 60, 815-821.
    Pubmed CrossRef
  3. Carlson, JE, Tulsieram, LK, Glaubitz, JC, Luk, VW, Kauffeldt, C, and Rutledge, R (1991). Segregation of random amplified DNA markers in F1 progeny of conifers. Theor Appl Genet. 83, 194-200.
    Pubmed CrossRef
  4. Cingolani, P, Platts, A, Wang, LL, Coon, M, Nguyen, T, and Wang, L (2012). A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin). 6, 80-92.
    Pubmed KoreaMed CrossRef
  5. Finn, RD, Clements, J, and Eddy, SR (2011). HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 39, W29-W37.
    Pubmed KoreaMed CrossRef
  6. Finn, RD, Coggill, P, Eberhardt, RY, Eddy, SR, Mistry, J, and Mitchell, AL (2016). The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 44, D279-D285.
    Pubmed KoreaMed CrossRef
  7. Grabherr, MG, Haas, BJ, Yassour, M, Levin, JZ, Thompson, DA, and Amit, I (2011). Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 29, 644-652.
    Pubmed KoreaMed CrossRef
  8. Hsia, CC, and Anderson, OD (2001). Isolation and characterization of wheat omega-gliadin genes. Theor Appl Genet. 103, 37-44.
    CrossRef
  9. Huerta-Cepas, J, Szklarczyk, D, Forslund, K, Cook, H, Heller, D, and Walter, MC (2016). eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res. 44, D286-D293.
    Pubmed KoreaMed CrossRef
  10. IWGSC (International Wheat Genome Sequencing Consortium) (2014). A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome. Science. 345, 1251788.
    Pubmed CrossRef
  11. Kajitani, R, Toshimoto, K, Noguchi, H, Toyoda, A, Ogura, Y, and Okuno, M (2014). Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome Res. 24, 1384-1395.
    Pubmed KoreaMed CrossRef
  12. Kasarda, DD, Okita, TW, Bernardin, JE, Baecker, PA, Nimmo, CC, and Lew, EJ (1984). Nucleic acid (cDNA) and amino acid sequences of alpha-type gliadins from wheat (Triticum aestivum). Proc Natl Acad Sci USA. 81, 4712-4716.
    Pubmed KoreaMed CrossRef
  13. Lamacchia, C, Camarca, A, Picascia, S, Di Luccia, A, and Gianfrani, C (2014). Cereal-based gluten-free food: how to reconcile nutritional and technological properties of wheat proteins with safety for celiac disease patients. Nutrients. 6, 575-590.
    Pubmed KoreaMed CrossRef
  14. Langmead, B, and Salzberg, SL (2012). Fast gapped-read alignment with Bowtie 2. Nat Methods. 9, 357-359.
    Pubmed KoreaMed CrossRef
  15. Lee, JY, Kang, CS, Beom, HR, Jang, YR, Altenbach, SB, and Lim, SH (2017). Characterization of a wheat mutant missing low-molecular-weight glutenin subunits encoded by the B-genome. J Cereal Sci. 73, 158-164.
    CrossRef
  16. Li, H (2011). A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 27, 2987-2993.
    Pubmed KoreaMed CrossRef
  17. Li, H, and Durbin, R (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 25, 1754-1760.
    Pubmed KoreaMed CrossRef
  18. Li, H, Handsaker, B, Wysoker, A, Fennell, T, Ruan, J, and Homer, N (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics. 25, 2078-2079.
    Pubmed KoreaMed CrossRef
  19. Matsuo, H, Kohno, K, and Morita, E (2005). Molecular cloning, recombinant expression and IgE-binding epitope of omega-5 gliadin, a major allergen in wheat-dependent exercise-induced anaphylaxis. FEBS J. 272, 4431-4438.
    Pubmed CrossRef
  20. Morita, E, Matsuo, H, Mihara, S, Morimoto, K, Savage, AWJ, and Tatham, AS (2003). Fast omega-gliadin is a major allergen in wheat-dependent exercise-induced anaphylaxis. J Dermatol Sci. 33, 99-104.
    Pubmed CrossRef
  21. Palosuo, K, Varjonen, E, Kekki, OM, Klemola, T, Kalkkinen, N, and Alenius, H (2001). Wheat omega-5 gliadin is a major allergen in children with immediate allergy to ingested wheat. J Allergy Clin Immunol. 108, 634-638.
    Pubmed CrossRef
  22. Rasheed, A, Xia, X, Yan, Y, Appels, R, Mahmood, T, and He, Z (2014). Wheat seed storage proteins: Advances in molecular genetics, diversity and breeding applications. J Cereal Sci. 60, 11-24.
    CrossRef
  23. Scherf, KA, Brockow, K, Biedermann, T, Koehler, P, and Wieser, H (2016). Wheat-dependent exercise-induced anaphylaxis. Clin Exp Allergy. 46, 10-20.
    Pubmed CrossRef
  24. Shewry, PR, Halford, NG, Belton, PS, and Tatham, AS (2002). The structure and properties of gluten: an elastic protein from wheat grain. Philos Trans R Soc Lond B Biol Sci. 357, 133-142.
    Pubmed KoreaMed CrossRef
  25. Stanke, M, Keller, O, Gunduz, I, Hayes, A, Waack, S, and Morgenstern, B (2006). AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435-W439.
    Pubmed KoreaMed CrossRef
  26. Waga, J, and Skoczowski, A (2014). Development and characteristics of omega-gliadin-free wheat genotypes. Euphytica. 195, 105-116.
    CrossRef
  27. Waterhouse, RM, Seppey, M, Simão, FA, Manni, M, Ioannidis, P, and Klioutchnikov, G (2018). BUSCO applications from quality assessments to gene prediction and phylogenomics. Mol Biol Evol. 35, 543-548.
    Pubmed KoreaMed CrossRef
  28. Wieser, H (2007). Chemistry of gluten proteins. Food Microbiol. 24, 115-119.
    Pubmed CrossRef


March 2019, 7 (1)
Full Text(PDF) Free

Cited By Articles
  • CrossRef (0)

Funding Information

Social Network Service
Services
  • Science Central