Q&A 16 How do you apply multiple testing correction to GWAS results?

16.1 Explanation

In GWAS, thousands of SNPs are tested for association with a trait. This increases the chance of false positives. To control this, we apply multiple testing correction methods such as:

  • Bonferroni correction: Very strict; divides the alpha level (e.g., 0.05) by the number of tests
  • False Discovery Rate (FDR): A more flexible method that controls the proportion of false positives among significant results (e.g., Benjamini-Hochberg)

This helps identify statistically significant SNPs while accounting for the large number of tests.

16.2 R Code

# Load required packages
library(tidyverse)

# Step 1: Load GWAS results
gwas_df <- read_csv("data/gwas_results.csv")

# Step 2: Add Bonferroni-corrected threshold
n_tests <- nrow(gwas_df)
bonf_threshold <- 0.05 / n_tests

# Step 3: Apply FDR correction using p.adjust
gwas_df <- gwas_df %>%
  mutate(FDR = p.adjust(P_value, method = "BH"))

# Step 4: Extract significant SNPs
significant_bonf <- gwas_df %>%
  filter(P_value < bonf_threshold)

significant_fdr <- gwas_df %>%
  filter(FDR < 0.05)

# Step 5: Output summary
cat("Bonferroni threshold:", bonf_threshold, "\n")
Bonferroni threshold: 1.331558e-05 
cat("Number of SNPs passing Bonferroni:", nrow(significant_bonf), "\n")
Number of SNPs passing Bonferroni: 250 
cat("Number of SNPs passing FDR < 0.05:", nrow(significant_fdr), "\n")
Number of SNPs passing FDR < 0.05: 1265 

✅ Takeaway: Multiple testing correction is essential in GWAS. Bonferroni is strict but conservative, while FDR balances sensitivity and specificity. Always report how significance was determined.

16.3 Interpretation

After applying multiple testing correction to the GWAS results:

  • Bonferroni threshold: 1.331558e-05
    This is the genome-wide significance level calculated by:

alpha_bonf = 0.05 / 3755 ≈ 1.33 × 10⁻⁵

Any SNP with a raw p-value below this threshold is considered highly significant, even under the strictest correction method (controlling the family-wise error rate).

  • Number of SNPs passing Bonferroni: 250
    These are the strongest associations, with extremely low p-values. They’re unlikely to be false positives and are good candidates for follow-up analysis or functional validation.

  • Number of SNPs passing FDR < 0.05: 1265
    These SNPs are considered statistically significant under a False Discovery Rate (FDR) of 5%. This means that, on average, only 5% of these hits are expected to be false positives. It’s a more permissive method that helps capture broader signals.

16.3.1 Summary Table

Correction Method Threshold Significant SNPs Interpretation
Bonferroni 1.33e-05 250 Very strict; strong confidence
FDR (BH) adjusted < 0.05 1265 Balanced; allows more discovery, some risk

Takeaway: Use Bonferroni to identify high-confidence SNPs and FDR to explore additional signals while controlling the expected proportion of false positives.