Q&A 18 How do you identify genome-wide significant SNP hits and save them for downstream analysis?
18.1 Explanation
Once GWAS results are generated and corrected for multiple testing, the next step is to identify statistically significant SNPs. A common threshold is the Bonferroni-adjusted p-value, which accounts for the number of tests.
You can:
- Filter SNPs below the Bonferroni threshold
- Annotate them with chromosome and base-pair position using the
.mapfile - Save the output for use in downstream steps like gene annotation or reporting
18.2 R Code
# Load required libraries
library(tidyverse)
# Step 1: Load GWAS results
gwas_df <- read_csv("data/gwas_results.csv")
# Step 2: Calculate Bonferroni threshold
n_tests <- nrow(gwas_df)
bonf_threshold <- 0.05 / n_tests # ≈ 1.33e-05 for 3755 SNPs
# Step 3: Filter significant hits
significant_hits <- gwas_df %>%
filter(P_value < bonf_threshold) %>%
arrange(P_value)
# Step 4: Load SNP position data
map_df <- read_tsv("data/sativas413.map",
col_names = c("CHR", "SNP", "GEN_DIST", "BP"),
show_col_types = FALSE)
# Step 5: Annotate significant SNPs with position
annotated_hits <- left_join(significant_hits, map_df, by = "SNP") %>%
select(SNP, CHR, BP, Estimate, P_value)
# Step 6: Save for downstream analysis
write_csv(annotated_hits, "data/significant_snps_bonferroni.csv")
# Step 7: Preview top hits
head(annotated_hits, 10)# A tibble: 10 × 5
SNP CHR BP Estimate P_value
<chr> <dbl> <dbl> <dbl> <dbl>
1 id1024159 1 38111539 -16.6 7.55e-19
2 ud10000717 10 11129237 19.6 2.02e-14
3 id10003000 10 11508505 17.7 1.10e-13
4 id6000791 6 905567 11.9 1.40e-12
5 id3010870 3 24492231 -12.2 1.50e-12
6 id2014431 2 32420895 19.3 3.05e-12
7 id1024488 1 38618456 11.5 3.51e-12
8 id11001027 11 3296722 14.5 3.53e-12
9 id1020565 1 33058921 -14.8 5.70e-12
10 id2001618 2 2873552 14.3 1.04e-11
✅ Takeaway: Identifying and saving genome-wide significant SNPs ensures a clean input for downstream analysis such as gene annotation, pathway mapping, or publication reporting.