
Genotyping-by-sequencing (GBS) has been used as a viable single nucleotide polymorphism (SNP) validation method that provides reduced representation sequencing by using restriction endonucleases. Although GBS makes it possible to perform marker discovery and genotyping simultaneously with reasonable costs and a simple molecular biology workflow, the standard TASSEL-GBS pipeline was designed for homozygous groups, and genotyping of heterozygous groups is more complicated. To addresses this problem, we developed a GBS pipeline for heterozygous groups that called KNU-GBS pipeline, specifically for apple (
Apple (
Genetic linkage maps allow the genomic structure to be studied, permitting the identification of quantitative trait loci (QTLs), localization of genes of interest, and providing a framework to understand the biological basis of complex traits. Construction of genetic linkage maps enables marker-assisted breeding and selection and it has proven to be especially useful for perennial tree crops such as apple, where many important traits are expressed only after years of costly field maintenance. Using marker-assisted breeding and selection, the presence of favorable alleles can be determined at the seedling stage, thereby drastically reducing the population size at the early stages of selection (Grattapaglia and Sederoff 1994; Liebhard
Several research groups have constructed high-density genetic linkage maps from both heterozygous and homozygous groups. Next-generation genotyping-by-sequencing (GBS) is now recognized as a viable single nucleotide polymorphism (SNP) validation method. GBS provides reduced representation sequencing by using restriction endonucleases to digest the genome into fragments. Using methylation-sensitive endonucleases, repetitive regions of the genome can be reduced. GBS holds great potential for plant researchers because it can be used for simultaneous marker discovery and genotyping at a reasonable cost and with a simple molecular biology workflow (Elshire
Here, we applied GBS methods to construct a genetic linkage map that consisted of 1,053 SNP markers over 17 linkage groups, and verified the potential application of GBS for heterozygous groups using a number of computational and statistical approaches. This novel GBS pipeline for heterozygous groups will be useful for genetic diversity analyses, phylogenetic studies, and genome wide association studies.
A segregating population from the cross between ‘Hongro’ and ‘Alps otome’ was used for the construction of the genetic linkage map. Young leaves of 93 individuals were selected from a progeny of 146 individuals developed at the Apple Research Station, National Institute of Horticultural and Herbal Science, Gunwi, Korea. Young leaves were collected and stored at −80°C until use. DNA was extracted through a modified cetyltrimethylammonium bromide method (Weeden
The fruits of ‘Hongro’ and ‘Alps otome’ are highly dissimilar; ‘Hongro’ fruits are large (300 – 350 g), while ‘Alps otome’ fruits are small (25 – 50 g). Therefore, a high-quality genetic linkage map of these two cultivars will offer great potential to study the genetic control of fruit quality by QTL analysis (Kenis and Keulemans 2005; Kenis and Keulemans 2007).
Typically, heterozygous individuals are generated as a hybrid between two relatively homozygous lines. However, in apples and many other outcrossing species, this initial step is impossible. Each apple variety is highly heterozygous owing to the clonal nature of the crop and the poor performance associated with inbred material. Instead, a pseudo testcross design is typically employed in which the variety of interest is crossed to a standard variety known not to segregate for the traits being investigated (Liebhard
For quality control of the DNA samples, we submitted the well-labeled gel image(s) to the Institute of Genomic Diversity, Cornell University. Initially, we ran 100 ng of each DNA sample on 1% agarose gels along with 500 ng of two λ
GBS was carried out at the Institute of Genomic Diversity, Cornell University. Briefly, DNA samples from the 91 F1 individuals and the parents were digested individually with
After sequencing, the raw reads were de-multiplexed according to the barcode sequences and trimmed using a Python script (Fig. 1). This script splits the raw Illumina FASTQ file into 93 separate FASTQ files based on the barcode sequences associated with each sample and filters out reads that contain any ambiguous bases in the barcodes. The reads that contained only the common adapter were also trimmed using the Cutadapt software (Martin 2011).
The de-multiplexed reads were trimmed using the Solexa QA package v.1.13 (Cox
We used the Burrows-Wheeler Aligner (BWA; 0.6.1-r104) program (Li and Durbin 2009) to align the clean reads to the apple (
SNP calls were filtered for quality by restricting the marker set to biallelic SNPs, and requiring genotype calls at each SNP to have a depth of three reads in each sample. Missing SNP allele data in individuals were imputed using a Perl script. The script was used to identify and genotype valid and high-quality SNPs in the 91 F1 mapping individuals along with the two parental genotypes using a sliding window approach. Haplotype phasing involved sliding a window along the chromosome, estimating haplotype phases within each window and piecing fragments together over the whole chromosome. The window size was set according to a major allele frequency of 0.7.
The marker code was determined using JoinMap 4.0 (Van Ooijen 2006) and population type CP (cross pollinators). CP indicates a cross between two heterozygous diploid parents, with linkage phases originally unknown (Van Ooijen 2006). Five segregation types of CP populations (lmxll, nnxnp, hkxhk, efxeg, abxcd) have been described (Van Ooijen 2006), but only three of the segregation types (lmxll, nnxnp, hkxhk) were genotyped in this study. Segregation type ‘nnxnp’ describes markers with first parent being homologous and second parent being heterozygous, ‘lmxll’ describes markers with first parent being heterozygous and second parent being homologous, and ‘hkxhk’ describes markers with both parents being heterozygous. Then, valid loci for genetic mapping were filtered using the following criteria. First, the expected segregation ratio for ‘lmxll’ and ‘nnxnp’ is 1:1, and including ‘hkxhk’ is 1:2:1. Only markers whose segregation ratios did not significantly deviate from the expected ratios were retained. The pattern of allelic segregation was tested using a chi-squared (χ2) test to determine deviations from expected Mendelian segregation ratios, and SNPs with significant segregation distortion (χ2
To increase mapping efficiency, pairs or groups of loci with identical genotypes (i.e., complete linkage) were identified and a single marker was chosen to represent the group. Linkage analysis was performed with JoinMap 4.0 using the parameters set for the regression mapping algorithm. Linkage groups were estimated by applying independence LOD (Logarithm of Odds) threshold ranges from 2 to 13, and constructed with a linkage LOD of at least 8.0, a recombination fraction of 0.35, a map LOD value of 1, a goodness-of-fit jump threshold of 3, and a ripple value of 1. The recombination frequencies were converted into map distances (cM) using the Kosambi mapping function (Kosambi 1943).
A total of 223 million sequence reads were obtained from the mapping population (Table 1). To ensure sufficient read depths at potential SNP loci and to increase the accuracy of the called SNPs, we performed deeper sequencing of the parental lines, obtaining 4,332,022 and 5,013,265 raw reads for ‘Hongro’ and ‘Alps otome’ respectively. We obtained reliable trimmed data by applying the Python script with strict criteria (probability value = 0.05, Phred quality score = 20, and minimum read length bp = 25 bp). From the standard TASSEL-GBS pipeline, we obtained 29 million sequence reads, which was insufficient to build an apple genetic linkage map (Table 1). However, from KNU-GBS pipeline, we obtained nearly 170 million sequence reads with an average of 1,827,937 reads per sample and used this trimmed data for further analysis (
A total of 64.08% of the trimmed reads were mapped to the reference genome, the average depth of the mapped regions was 5.37, and 2.25% of the reference genome was covered (
We verified the reliability of our filtering criteria by analyzing the segregation type ‘nnxnp’ from chromosome 13 (Fig. 3). Before imputation, the segregation types consisted dominantly of ‘nn’ and a recombination pattern could not be determined. However, after conducting the strict imputation, the natural ratio of the ‘nn’ and ‘np’ types with clear recombination regions was observed. Although some regions still showed an unnatural recombination pattern, we presumed that these regions could potentially be recombination hotspot regions in chromosome 13.
Total numbers of 547,131 SNPs were detected using the SAMtools varFilter command and 28,761 SNP were divided into three types of segregation type: 13,912 from nnxnp, 5,264 from lmxll, and 9,585 from hkxhk. Then we performed the chi-squared (χ2) test with significance level ≤ 0.01, total 9,843 SNPs (5,757 from nnxnp, 2,519 from lmxll, and 1,567 from hkxhk) were selected. As it mentioned in materials and methods, after applying different cutoff for missing data total number of 7,274 SNPs (4,612 from nnxnp, 1,775 from lmxll and 887 from hkxhk) were selected. After removing the redundant SNP markers, the final set of 2,590 SNPs remained for linkage map construction. Of these, 836 were present in the parental marker code ‘nnxnp’, 867 were in the marker code ‘lmxll’, and the remaining 887 were in the marker code ‘hkxhk’ (Table 2).
Based on the filtering criteria from JoinMap 4.0, described in the material and methods section, we constructed a genetic linkage map from the ‘Hongro’ × ‘Alps otome’ cross that consisted of 1,053 SNP markers over 17 linkage groups (Fig. 4). The map encompassed 1350.1 cM, with linkage groups ranging from 56.3 cM (LG14) to 97.2 cM (LG3). The numbers of SNP markers mapped to each linkage group varied from 34 in LG4 to 123 in LG2, with a mean of 61.9 SNPs per linkage group. The average marker density was 1.28 cM per marker and maximum gap size ranged from 5 cM in LG12 to 21.8 cM in LG17 (Table 3).
We obtained nearly 170 million sequence reads with an average of 1,827,937 reads per sample using KNU-GBS pipeline. This number is much higher than the average of 973,896 reads per sample reported previously from an apple population (Gardner
F1 population mapping with GBS provides many advantages such as simple, less laborious, and highly reproducible (Elshire
Several inversions of SNP order from their expected physical locations were observed from the GBS analysis. One reason for this observed order inversion is the characteristics of the reference genome. For example, the apple genome contains a high frequency of repeated regions; repetitive elements correspond to 500.7 Mb (67%) and 98% of the unassembled genome sequences are repetitive (Velasco
Although the number of SNPs in our final set was higher than in the previous study (Gardner
In this study, we describe the construction of a genetic map for
This work was supported financially by the Next-Generation BioGreen21 Program (PJ01311503), Rural Development Administration, Republic of Korea.
Sequence pre-processing using the TASSEL-GBS pipeline and KNU-GBS pipeline.
No. of reads | Avg. length (bp) | Max. length (bp) | Total length (bp) | Raw/Trimmed | |
---|---|---|---|---|---|
Raw | 223061273 | 101.00 | 101 | 22529188573 | - |
TASSEL-GBS | 29538602 | 61.19 | 64 | 1807469866 | 8.02% |
KNU-GBS | 169998214 | 66.98 | 97 | 11386908585 | 50.54% |
Allelic segregation types used to construct the genetic linkage map.
Segregation type | No. of SNPs | No. of selected SNPs |
---|---|---|
nnxnpz) | 13912 | 836 |
lmxlly) | 5264 | 867 |
hkxhkx) | 9585 | 887 |
Total | 28761 | 2,590 |
z)‘Hongro’ homologous; ‘Alps otome’ heterozygous.
y)‘Hongro’ heterozygous; ‘Alps otome’ homologous.
x)‘Hongro’ heterozygous; ‘Alps otome’ heterozygous.
Distribution of SNP markers on the genetic linkage map derived from a ‘Hongro’ × ‘Alps otome’ segregating population.
Linkage group | No. of selected SNPs | No. of used SNPs | Length (cM) | Average marker interval (cM) | Maximum interval (cM) |
---|---|---|---|---|---|
1 | 115 | 62 | 64.9 | 1.05 | 6.7 |
2 | 230 | 123 | 82.3 | 0.67 | 7.3 |
3 | 162 | 47 | 97.2 | 2.07 | 18.8 |
4 | 114 | 34 | 57.7 | 1.70 | 9.3 |
5 | 173 | 68 | 77.4 | 1.14 | 7.7 |
6 | 102 | 58 | 80.2 | 1.38 | 13.7 |
7 | 168 | 64 | 77.1 | 1.20 | 10.5 |
8 | 131 | 57 | 85.4 | 1.50 | 9.5 |
9 | 174 | 67 | 77.6 | 1.16 | 7 |
10 | 205 | 85 | 92.2 | 1.08 | 8.2 |
11 | 231 | 72 | 81.2 | 1.13 | 11.2 |
12 | 139 | 80 | 83.5 | 1.04 | 5 |
13 | 123 | 44 | 88.7 | 2.02 | 12 |
14 | 126 | 40 | 56.3 | 1.41 | 6.4 |
15 | 199 | 48 | 81.8 | 1.70 | 5.7 |
16 | 100 | 54 | 73.6 | 1.36 | 13.2 |
17 | 98 | 50 | 93 | 1.86 | 21.8 |
Average | 152.35 | 61.94 | 79.42 | 1.28 | - |
Total | 2590 | 1053 | 1350.1 | - | - |
![]() |
![]() |