Korean ginseng (
Korean ginseng (Asian ginseng, ginseng;
Recently, the genomes of various organisms have been analyzed rapidly and accurately with the advent of various high-throughput sequencing technologies (Pareek
BAC clones harbor around 100-kb long DNA fragments derived from genome of a specific organism (O’Connor
Ten BAC clones were randomly selected from BAC library of
The same amount of each DNA sample of the ten BACs was pooled into a single tube and provided to DNA Link (Seoul, Korea) for the construction of a sequencing library. A 10-kb library was generated and sequenced using PacBio RS platform installed MagBead protocol. Two sequencing runs were performed using P4-C2 and P6-C4 chemistry versions. For production of long read, one SMRT cell was employed for each run. The generated reads were corrected and pre-assembled through the hierarchical genome assembly process (HGAP) method (Chin
For preliminary prediction of genic regions, FGENESH program (Salamov and Solovyev 2000; http://www.softberry.com/) was used with tomato as reference model on default parameters. The gene structures were confirmed by BLASTP searches against the NCBI non-redundant (nr) protein sequence database and our customized transcriptome data. Repeat regions were predicted on the basis of NCBI Conserved Domain Search (CD-Search) (Marchler-Bauer
To classify the repetitive elements, reverse transcriptase (RT) domain of each LTR-RT was extracted from BAC sequences and compared with core database of GyDB (Llorens
Whole genome sequence (WGS) reads from
Ten BAC clones were randomly selected and pooled for sequencing using PacBio SMRT sequencing platform. Two independent reactions were conducted using P4-C2 and P6-C4 chemistry. Among two sequencing reactions, P4-C2 chemistry produced 44,736 reads with an average length of 3,605 bp, while P6-C4 chemistry produced 100,844 reads with an average length of 4,813 bp (Table 1). Approximately, 9.7% of reads were longer than 10 kb. Short and low quality reads (<50 bp and <0.75 respectively) were removed for optimal assembly. Initial assemblies using each read set from P4-C2 and P6-C4 chemistries produced good quality of contig sequences with only seven gaps and one gap, respectively. By combining of both data, we obtained complete sequences without gap for nine BACs (BAC IDs 8L14, 9P08, 6B09, 8G13, 5N01, 8P22, 8C22, 5B21 and 10M13) and with one gap for BAC ID 7P20. Overall, a total 1,163,364 bp of assembled sequences were generated (Table 2).
Annotations of ten BAC clone sequences allow us to predict complex genome structure of ginseng, which is mainly composed of various repetitive components with relatively small portion of genic regions (Fig. 1). The six BAC sequences (BAC IDs 8L14, 8G13, 5N01, 8P22, 8C22, and 5B21) harbor only repeat elements without any gene. Among repeat elements,
Among the six novel LTR-RTs, three are classified to
To estimate insertion time of the six novel LTR-RTs, the ratios of transition to transversion (T
Mapping of 10× WGS reads revealed that approximately 10% of BAC sequences were non-repetitive and the remaining 90% regions were repeat-replete regions (Table 2, Fig. 1). The non-repetitive regions showed less than 50× coverage mapping depth, while repeat-replete regions showed high mapping depth with more than 10,000× coverage. The repetitive regions were mainly occupied by various LTR-RTs with complex and nested insertion patterns. For calculation of actual repeat proportion in the ten BACs, we conducted repeat masking with previously reported and newly identified LTR-RTs. Homology-based search revealed that over 60% of the BAC sequences comprised various LTR-RTs (Table 5). Most of the regions were occupied by
We estimated the proportion of LTR-RTs in whole ginseng genome by mapping 30 Gbp of WGS reads onto each element. Approximately 36% of the ginseng genome was estimated to be occupied by 14 LTR-RTs including eight reported and six novel subfamilies. The estimated genome proportion of LTR-RTs is lower than the proportion found in the BAC sequences. However, both results show similar distribution pattern. For example, both result show that
Our previous FISH analysis revealed that
The sensational development of sequencing technology enabled assembly of nearly complete genomes in various species (Pareek
The ten randomly selected BAC clone sequences showed brief overview of a ginseng genome. Among the ten BACs, two BACs, 6B09 and 10M13, contained four genes with less repetitive elements, two BACs, 7P20 and 9P08, contained two and one gene, respectively, with moderate repetitive elements, whereas the other six BACs were entirely composed of only repetitive elements without any gene. In particular, many LTR-RTs occupied the largest portion of the BAC sequences with complex nested insertion patterns (Table 5, Fig. 1). The results suggest that LTR-RTs might play a major role in the increase of the genome size such as many other plants with large genomes size (SanMiguel
Our previous study identified eight LTR-RTs which occupied a third of the ginseng genome (Choi
We estimated the copy numbers of all the components in the 1,163 kb sequences by mapping of 10× WGS reads on the homologous regions based on more than 80% sequence similarity. Approximately 10% of BAC sequences were non-repetitive, while, the remaining 90% show repeat regions with more than 10,000× coverage from the 10× WGS. Overall, 90% of the 1,163 kb are composed of various repetitive elements with over 1,000 copies in the ginseng genome. Repeat masking tool revealed that 60% of 1,163 kb sequences are components of 14 LTR-RTs. However, 36.34% of WGS reads were mapped on the 14 LTR-RTs that is much lower than the proportion of LTR-RTs (60%) occupying in the 1,163 kb sequences (Table 5) that is similar phenomena found in our previous research (Choi
We proposed that recent WGD in ginseng genome was caused by an allotetraploidization event based on finding of subgenome-specific distribution of transposable elements,
Insertion time of the LTR-RT has been calculated by estimating the sequence divergence between two LTRs that independently accumulate point mutations at each LTR sequences (Dangel
The structure and characteristic of the 1,163 kb ginseng genome sequences analyzed in this study will expand our understanding about the complex genome structure of ginseng. Furthermore, our data provide valuable resources for understanding of genome structure and evolution, as well as for breeding and related researches in the genus
This research was supported by “Cooperative Research Program for Agriculture Science & Technology Development (Project No. PJ01100801)”, Rural Development Administration, Republic of Korea.
Summary of sequence statistics.
|Read size||Sequencing chemistry|
|Total read length (bp)||Number of reads||Average read length (bp)||Total read length (bp)||Number of reads||Average read length (bp)|
|5 kb~10 kb||58,026,397||8,263||7,022||164,293,017||22,992||7,145|
Assembly statistics of ten BAC clones.
|BAC ID||Assembled length (bp)||Number of contigs||Non-repetitive region (bp, <50 mapping depth)||GenBank Accession number|
z)Two BAC clones, 8P22 and 10M13, were not included in the second sequencing run using P6-P4 chemistry.
Annotation summary of identified eleven genes in BAC sequences.
|Gene annotation based on BLASTP searches||BAC ID||Position (bp)||# of exon||Accession no. (E-value)|
|Cellulose synthase A catalytic subunit 3 [UDP-forming]||9P08||215–9,773||14||XP_017226278.1 (0.0)|
|Calcium-dependent protein kinase 28||6B09||1–5,383 (partial)||9||XP_017251289.1 (0.0)|
|Uncharacterized protein||6B09||18,057–19,415||3||XP_017251291.1 (8e-115)|
|Transformation/transcription domain-associated protein||6B09||21,538–59,489||35||XP_017217620.1 (0.0)|
|Transformation/transcription domain-associated protein||6B09||94,283–120,564||35||XP_017217620.1 (0.0)|
|Acyl carrier protein 1||10M13||1,127–3,315||4||XP_017257036.1 (1e-55)|
|Superoxide dismutase||10M13||33,367–36,315||7||O22668.1 (1e-100)|
|Bifunctional 3-dehydroquinate dehydratase/shikimate dehydrogenase||10M13||44,804–50,967||10||XP_017220070.1 (0.0)|
|Uncharacterized protein||10M13||55,540–58,599||1||XP_017218710.1 (0.0)|
|Uncharacterized protein||7P20||2,245–7,217||5||XP_017238997.1 (0.0)|
|Protein FAR1-related sequence 5-like||7P20||53,287–55,512||2||XP_015866013.1 (0.0)|
List of novel LTR retrotransposons identified in BAC sequences.
|Type||BAC ID||Position (bp)||TSDz)||Length / LTR length (bp)||Tsy)||Tvx)||Ts/Tv||Kw)||Insertion timev) (MYA)|
z)Target site duplication.
y)Number of transition mutations.
x)Number of transversion mutations.
v)Insertion times were estimated by adopting the substitution rate of 1.22 × 10−8.
Proportion of LTR-RTs in the ginseng genome calculated from repeat masking and WGS read mapping.
|Type||Proportion in BACs (%)||Expected proportion in genome (%)|