Korean Reference Genome(KRG)
detail content area
Korean Genome Analysis
The raw sequences were trimmed by Sickle-quality-based-trimming which is a tool that uses sliding windows along with quality and length thresholds. The human reference genome build 19 (hg19) was downloaded from UCSC ftp server (ftp://hgdownload.cse.ucsc.edu/goldenPath/), and the sequencing reads produced by HiSeqTM 2000 sequencing system were aligned to the hg19 using Burrows-Wheeler Aligner (BWA) at default settings. We specified the quality threshold for read trimming to ensure the high quality reads for alignments. Thereafter, ‘bwa sampe’ was used to generate alignments in the SAM format. PICARD was used to sorting, removing duplicated reads, converting from SAM to BAM format (http://picard.sourceforge.net/). Finally, SNVs and short indels were called by using SAMtools ‘mpileup’ and 'varFilter' command with the –D 1000 option for specifying the coverage depth min/max cutoffs of 3 and 1000, as well as options to disqualify SNPs that are too close to each other.
To compare the allele frequency differences between Korean and other populations, we used the HapMap3 and 1000 Genome population alternative allele frequency downloaded from UCSC genome browser (ftp://hgdownload.cse.ucsc.edu/). Also, functional annotations were conducted by ANNOVAR software (http://www.openbioinformatics.org/annovar/). The genomic locations of the variants were annotated by the gene-based annotation implemented in ANNOVAR. The genomic variant risk to Korean public diseases (diabetes, hypertension, and metabolic syndrome) were analyzed by the PLINK (version 1.07). The association study was conducted by the linear regression analysis with controlling the age, sex, and body mass index.