Sample Identification Capacity of 20 Single Nucleotide Polymorphisms in Blood-Derived Genomic DNA Samples of Korean Individuals

Abstract

Introduction

Short tandem repeats (STRs) have been routinely used as DNA markers for sample identification. However, STRs analysis has some limitations such as high mutation rate and difficulty of analysis in degraded DNA samples due to large amplicon products.¹ Single nucleotide polymorphisms (SNPs) have been introduced as DNA markers for sample identification. SNPs have low mutation rates and can be analyzed in degraded DNA samples due to small amplicon products.^2,3 Kim et al. identified 30 candidate SNP markers for sample identification using Affymetrix 500 K SNP genotype data and a HapMap database.⁴ Their combined mean match probability, calculated using Affymetrix Genome-Wide Human SNP Array 5.0 data, was 1.78 × 10⁻¹³ in 8842 Korean individuals. Lee et al. reported 22 putative SNP markers for human identification, showing the combined mean match probability of 1.91 × 10⁻¹⁰ in Korean individuals.⁵ In this study, we assessed the sample identification capacity of 20 SNPs, unlike SNP markers for sample identification reported in previous studies,^4,5 in blood-derived genomic DNA samples obtained from Korean individuals.

Methods

Sample preparation

Genomic DNA samples (n = 380) that were used for this study were obtained from the National Biobank of Korea. These were randomly selected in genomic DNA samples collected through the Korean Genome and Epidemiology Study. Genomic DNA was isolated from blood samples using the Gentra Puregene Blood Kit (Qiagen, Chatsworth, CA) in accordance with the manufacturer's instructions.

SNPs genotyping

We selected 20 SNPs included in the Korean Chip (produced through the Korea Biobank Array Project) from among 169 SNP markers for sample identification selected from 1000 genome data.⁶ SNPs genotyping was performed using the SNPtype Assay (STA) (Fluidigm, San Francisco, CA), according to the manufacturer's instructions. In brief, the genomic DNA (50 ng) was amplified using a polymerase chain reaction (PCR) with Qiagen 2 × Mutiplex PCR Master Mix (Qiagen) and STA primer set (in final volume of 2.5 μL). The PCRs were carried out as follows: 15 minutes at 95°C for 1 cycle and 14 cycles on 95°C for 15 seconds and 60°C for 4 minutes. After amplification, 1.9 μL of the diluted STA products was added to a Sample Pre-Mix (containing 2.25 μL of 2 × Fast Probe Master Mix, 0.225 μL of the SNPtype 20 × Sample Loading Reagent, 0.075 μL of the SNPtype Reagent, and 0.027 μL of the ROX™). After the Assay Pre-Mix and the Sample Pre-Mix were loaded into the 192.24 Genotyping Dynamic Array, the SNPtype Assay reaction was performed. Analysis was performed through Fluidigm SNP Genotyping Analysis software (version 4.0.1; Fluidigm). We assessed genotype frequency, allele frequency, call rate, match probability, and heterozygosity for each SNP. The match probability of each SNP was calculated as previously described.^7,8

Results

We performed genotyping of 22 SNPs using 380 blood-derived genomic DNA samples. SNPs included 20 candidate SNP markers for sample identification (rs10055677, rs2470209, rs6785504, rs11208131, rs6796688, rs12286769, rs6840524, rs12876644, rs2613019, rs7307697, rs12965342, rs2736966, rs9268831, rs1433811, rs4027132, rs970022, rs1487602, rs6106856, rs1790875, and rs6596805) and 2 SNP markers for sex determination (rs2563387 and rs2571795). Call rates of 2 SNP markers for sex determination were 1.0, and 244 of 380 samples were analyzed as samples of female donors (data not shown). Call rates of 20 candidate SNP markers for sample identification were >0.99 (Table 1). The match probability of each SNP ranged from 0.364 to 0.425 and the heterozygosity ranged from 0.419 to 0.500. The combined mean match probability of 20 SNPs was 4.51 × 10⁻⁹.

Table 1.

Characterization of 20 Putative Single Nucleotide Polymorphisms for Human Identification

No.	SNP	Genotype (n)			Genotype frequency			Allele frequency		Call rate	Match probability	Heterozygosity
1	rs10055677	AA	AG	GG	AA	AG	GG	A	G	1.000	0.374	0.500
1	rs10055677	93	189	98	0.245	0.497	0.258	0.493	0.507	1.000	0.374	0.500
2	rs2470209	AA	AG	GG	AA	AG	GG	A	G	1.000	0.388	0.500
2	rs2470209	89	199	92	0.234	0.524	0.242	0.496	0.504	1.000	0.388	0.500
3	rs6785504	GG	GT	TT	GG	GT	TT	G	T	1.000	0.384	0.498
3	rs6785504	104	195	81	0.274	0.513	0.213	0.530	0.470	1.000	0.384	0.498
4	rs11208131	AA	AC	CC	AA	AC	CC	A	C	1.000	0.378	0.499
4	rs11208131	104	191	85	0.274	0.503	0.224	0.525	0.475	1.000	0.378	0.499
5	rs6796688	CC	CT	TT	CC	CT	TT	C	T	1.000	0.385	0.500
5	rs6796688	87	197	96	0.229	0.518	0.253	0.488	0.512	1.000	0.385	0.500
6	rs12286769	CC	CT	TT	CC	CT	TT	C	T	1.000	0.398	0.487
6	rs12286769	122	197	61	0.321	0.518	0.161	0.580	0.420	1.000	0.398	0.487
7	rs6840524	AA	AC	CC	AA	AC	CC	A	C	1.000	0.367	0.497
7	rs6840524	84	181	115	0.221	0.476	0.303	0.459	0.541	1.000	0.367	0.497
8	rs12876644	CC	CT	TT	CC	CT	TT	C	T	1.000	0.382	0.499
8	rs12876644	103	194	83	0.271	0.511	0.218	0.526	0.474	1.000	0.382	0.499
9	rs2613019	CC	CT	TT	CC	CT	TT	C	T	1.000	0.372	0.499
9	rs2613019	105	187	88	0.276	0.492	0.232	0.522	0.478	1.000	0.372	0.499
10	rs7307697	AA	AG	GG	AA	AG	GG	A	G	0.997	0.364	0.499
10	rs7307697	90	179	110	0.237	0.472	0.290	0.474	0.526	0.997	0.364	0.499
11	rs12965342	CC	CT	TT	CC	CT	TT	C	T	0.997	0.376	0.500
11	rs12965342	99	190	90	0.261	0.501	0.237	0.512	0.488	0.997	0.376	0.500
12	rs2736966	CC	CT	TT	CC	CT	TT	C	T	0.995	0.425	0.419
12	rs2736966	66	94	218	0.175	0.249	0.577	0.299	0.701	0.995	0.425	0.419
13	rs9268831	CC	CT	TT	CC	CT	TT	C	T	1.000	0.397	0.487
13	rs9268831	61	196	123	0.161	0.516	0.324	0.418	0.582	1.000	0.397	0.487
14	rs1433811	CC	CT (TC)	TT	CC	CT (TC)	TT	C	T	1.000	0.381	0.496
14	rs1433811	112	191	77	0.295	0.503	0.203	0.546	0.454	1.000	0.381	0.496
15	rs4027132	AA	AG	GG	AA	AG	GG	A	G	1.000	0.378	0.500
15	rs4027132	88	192	100	0.232	0.505	0.263	0.484	0.516	1.000	0.378	0.500
16	rs970022	CC	CT	TT	CC	CT	TT	C	T	1.000	0.378	0.492
16	rs970022	73	186	121	0.192	0.489	0.318	0.437	0.563	1.000	0.378	0.492
17	rs1487602	CC	CT	TT	CC	CT	TT	C	T	1.000	0.388	0.498
17	rs1487602	102	198	80	0.268	0.521	0.211	0.529	0.471	1.000	0.388	0.498
18	rs6106856	AA	AG	GG	AA	AG	GG	A	G	1.000	0.383	0.500
18	rs6106856	90	196	94	0.237	0.516	0.247	0.495	0.505	1.000	0.383	0.500
19	rs1790875	CC	CT	TT	CC	CT	TT	C	T	1.000	0.372	0.500
19	rs1790875	102	187	91	0.268	0.492	0.239	0.514	0.486	1.000	0.372	0.500
20	rs6596805	AA	AG	GG	AA	AG	GG	A	G	1.000	0.389	0.490
20	rs6596805	67	193	120	0.176	0.508	0.316	0.430	0.570	1.000	0.389	0.490
Combined mean match probability											4.51418E-09

SNPs, single nucleotide polymorphisms.

Discussion

Blood-derived genomic DNA samples are increasingly utilized for genetic approaches using next-generation sequencing and array-based genotyping technologies. These technologies enable the production of high-throughput data for the discovery of disease markers and analysis of disease mechanisms.⁹ However, preanalytical errors can occur because of DNA contamination or mistakes in labeling during sample processing. Genetic sample identification through analysis of SNPs will minimize or remove these preanalytical errors. For example, the results of analysis for sex-linked SNP markers using genomic DNA sample can be compared with sex information in donor's epidemiological questionnaire and report for clinical examination. Genetic sample identification through analysis of SNPs can also be used for quality control of next-generation sequencing and SNP genotyping data. Twenty SNPs tested in this study show slightly higher performance (combined mean match probability, 4.51 × 10⁻⁹) in sample identification, than other SNP markers. The combined mean match probability is <3.20 × 10⁻⁹ and 4.18 × 10⁻⁹ when each of the 20 SNPs are selected in SNPs for sample identification reported by Kim et al.⁴ and Lee et al.,⁵ respectively. We propose these SNPs as sample identification markers for quality control of genomic DNA samples obtained from Korean individuals.

Footnotes

Acknowledgments

This work was supported by the Korea Biobank Project (Grant No. 4851-307-210-13) in the Korea National Institute of Health, Korea Centers for Disease Control and Prevention.

Author Disclosure Statement

No conflicting financial interests exist.

References

Gill

. Role of short tandem repeat DNA in forensic casework in the UK—past, present, and future perspectives. Biotechniques, 2002; 32:366–368, 370, 372, passim.

Huang

, Xu

, Shen

, et al. Mutation patterns at dinucleotide microsatellite loci in humans. Am J Hum Genet, 2002; 70:625–634.

Reich

, Schaffner

, Daly

, et al. Human genome sequence variation and the influence of gene history, mutation and recombination. Nat Genet, 2002; 32:135–142.

Kim

, Han

, Lee

, et al. Development of SNP-based human identification system. Int J Legal Med, 2010; 124:125–131.

Lee

, Park

, Yoo

, et al. Selection of twenty-four highly informative SNP markers for human identification and paternity analysis in Koreans. Forensic Sci Int, 2005; 148:107–112.

Cho

, Yu

, Han

, et al. Forensic application of SNP-based resequencing array for individual identification. Forensic Sci Int Genet, 2014; 13:45–52.

Fan

, Gunderson

, Bibikova

, et al. Illumina universal bead arrays. Methods Enzymol, 2006; 410:57–73.

Jones

: Blood samples: Probability of discrimination. J Forensic Sci Soc, 1972; 12:355–359.

Jun

, Flickinger

, Hetrick

, et al. Detecting and estimating contamination of human DNA samples in sequencing and array-based genotype data. Am J Hum Genet, 2012; 91:839–848.