Review of rare variants in disease (Bomba 2017)
Key Points
The Bomba paper:
Reviewed major techniques used to detect and analyze rare variants
Gave an overview of statistical methods used to analyze rare variants
Gave examples of studies that successfully studied rare variants
Review Questions
(Answers at the bottom of the page)
What is population stratification?
What three major methods are used to detect rare variants?
What is the difference between SKAT and SKAT-O?
Explain the major limitation of burden tests (as compared to variance tests)
Overview
To detect rare variants, researchers can use (1) imputation, (2) custom genotyping arrays, and (3) next-generation sequencing. Imputation is a computational technique that infers genotypes from public databases, whereas custom genotyping arrays and next generation sequencing are wet-lab techniques.
To analyze rare variants, researchers use (1) burden tests such as CAST, which combine rare variants to increase power but assume that all of those variants have the same effect on the phenotype of interest (2) Variant component tests such as SKAT, which improve on the standard burden test by allowing for bidirectional effects and applying other weighting techniques. (3) Tests that combine approaches from burden tests and variant component tests, such as SKAT-O, and (4) Some tests that do not fall into the previous tests but are not discussed in the paper, such as LASSO and EG.
The paper also talks about study design. A common trend called population stratification is that a variant may be rare in the general population, but more common in certain smaller populations. Studies can take advantage of this principle, such as the deCODE study, which used Icelandic populations to identify rare variants and a potential drug target.
More Detail
This study aimed to provide a resource for continuing rare variant research. That's important because there are lots of interactions between rare variants
To computationally improve power of a study, you can use a technique called imputation. First, phased haplotypes are used to build reference panel. Then, a study that detects fewer variants can computationally infer new variants. WGS projects like the 1000 Genomes Project provide public reference panel.
To
Diseased-focused
Immunochip (autoimmune, inflammatory)
Exomechip (high-confidence unique, protein-altering variants)
WGS
Increasingly used due to lowered cost
WES cost less
Computational methods
Regular statistics doesn't work
Simple regression doesn't have enough power
More rare variants means high multiple testing penalty
Burden Test (e.g. CAST, which is binary, also ARIEL and WSS)
Simple or complex (0 or 1) or (apply weight based on frequency)
Score a functional region of the genome
Add up all rare variants and score that as one score
Regress against phenotype
Limitation: assume that all variants have the same effect on the phenotype
Variance Component Test (SKAT)
Assume: there are risk and protective alleles
More flexible with different data sets
More consideration to weighted statistics
Accounts for different effects
Score each SNP individually, then compute a region-level p value
E.g. SKAT
Combined tests
Generally more realistic
Fisher method or SKAT-O
Combine variance component and burden, get a more powerful model. E.g. SKAT-O (combine SKAT with burden test)
Other
Designed to account for signal sparsity
Often take longer
LASSO, EG, Baysian approaches
Analyses for rare variant association tests
Which method will be the best one for the data set?
Using stimulated type 2 diabetes genomes:
Combined tests (SKAT-O, MiST) have more power to detect rare variants
Linkage disequilibrium should be reflected (haplotypes are not represented as expected)
Problems: tests are under powered
Solution: More replication, control for confounding variables.
Bigger samples help
Combining tests help power
How to increase replication?
Genotype the same cohort again
Incorporate geographical and family-based models
Study design is also important
Isolated/inbred populations (e.g. APOC3 relevance in lipid levels)
Dutch population, Sardinia.
A variant that is rare in general pop may not be as rare in this population (population stratification)
E.g. iceland
Rare in global studies but common in sample
Complete knockout alleles
Loss of function
ExAC used to discover new low-frequency variants
But may not actually cause disease state, needs some skepticism
PROMIS study
Inactivated 961 genes in at least 1 participant
Initial results
25 different studies --> 31 genes with some conditions
Great table in powerpoint (group?) for UK10K, SardiNIA, deCode, goNL, and CHARGE
Non-synonymous SNVs in Denmark
UK10K Project
deCODE
Generated pop-specific refernece panel from Iceland
Applied that reference panel to 90,000 people with available SNP arrays
Infer over 250,000 more people from genealogical records
Takeaways: pop-specific reference panel, and massive study size
Results: 7 rare variants and a potential drug target
Future prospects
Few GWAS have been able to be translated
May become more common
Additional Considerations
Videos!
Last updated